Object recognition algorithms sold by tech companies, including Google, Microsoft, and Amazon, perform worse when asked to identify items from lower-income countries.
These are the findings of a new study conducted by Facebook’s AI lab, which shows that AI bias can not only reproduce inequalities within countries, but also between them.
In the study (which we spotted via Jack Clark’s Import AI newsletter), researchers tested five popular off-the-shelf object recognition algorithms — Microsoft Azure, Clarifai, Google Cloud Vision, Amazon Rekognition, and IBM Watson — to see how well each program identified household items collected from a global dataset.
The dataset included 117 categories (everything from shoes to soap to sofas) and a diverse array of household incomes and geographic locations (from a family in Burundi making $27 a month to a family in Ukraine with a monthly income of $10,090).
The researchers found that the object recognition algorithms made around 10 percent more errors when asked to identify items from a household with a $50 monthly income compared to those from a household making more than $3,500. The absolute difference in accuracy was even greater: the algorithms were 15 to 20 percent better at identifying items from the US compared to items from Somalia and Burkina Faso.
These findings were “consistent across a range of commercial cloud services for image recognition,” write the authors.
This sort of bias is a well-known problem in AI and has a number of root causes. One of the most common is that the training data used to create algorithms often reflects the life and background of the engineers responsible. As these individuals are often white men from high-income countries, so too is the world they teach their programs to identify.
One of the most well-known examples of AI bias is with facial recognition algorithms, which regularly perform worse when identifying female faces, particularly women of color. This sort of bias can worm its way into all sorts of systems, from algorithms designed to calculate parole to those assessing your CV ahead of an upcoming job interview.
In the case of object recognition algorithms, the authors of this study say that there are a few likely causes for the errors: first, the training data used to create the systems is geographically constrained, and second, they fail to recognize cultural differences.
Training data for vision algorithms, write the authors, is taken largely from Europe and North America and “severely undersample[s] visual scenes in a range of geographical regions with large populations, in particular, in Africa, India, China, and South-East Asia.”
Similarly, most image datasets use English nouns as their starting point and collect data accordingly. This might mean entire categories of items are missing or that the same items simply look different in different countries. The authors give the example of dish soap, which is a bar of soap in some countries and container of liquid in another, and weddings, which look very different in the US and India.
Why is this important? Well, for a start, it means that any system created using these algorithms is going to perform worse for people from lower-income and non-Western countries. Because US tech companies are world leaders in AI, that could affect everything from photo storage services and image search functionality to more important systems like automated security cameras and self-driving cars.
But this is probably only the tip of the iceberg. Vision algorithms are relatively easy to evaluate for these sorts of biases, but the pipeline that creates these programs is also feeding an entire industry full of algorithms that will never receive the same scrutiny.
Silicon Valley often promotes its products — and, particularly in recent years, its AI products — as egalitarian and accessible to all. Studies like this show that tech companies continue to evaluate, define, and shape the world in their own image.