Whether you know it or not, artificial intelligence has become a pervasive force in our everyday lives. When your favorite online retailer shows you “products related to this item,” or the GPS system in your car guides you along the fastest route to your destination, you’re seeing the results of artificial intelligence.
But this is just the visible surface of AI. What you don’t see is the vast, planetary supply chain of AI that extracts and consumes natural resources, energy, data, and human labor.
Atlas of AI: Power, Politics, and the Planetary Costs of Artificial Intelligence by Kate Crawford is a harshly critical book that maps out the world of AI, piercing its myths and exposing some ugly realities about the political, social, economic, and environmental costs of AI.
Kate Crawford studies theses costs as a Research Professor at USC Annenberg and a Senior Principal Researcher at Microsoft Research in New York. She has advised policy makers at the White House, the FTC, the UN, and the European Parliament.
Before we go any further, a disclosure: I’m personally involved with the subject matter of this book. In my career working at some leading technology companies, I’ve managed the development of artificial intelligence systems, specifically machine learning systems, for nearly twenty years. The systems I’ve worked on have mainly been aimed at detecting spam and fraud and other malicious behavior on the internet. I strongly believe there are huge benefits to AI, so I’ll admit that large parts of this book made me feel quite uncomfortable. Nonetheless Crawford has done valuable work by making us all more aware of some highly problematic aspects of AI that need to be addressed.
OK, let’s look at Atlas of AI in more detail.
Atlas of AI
By Kate Crawford
Yale University Press, New Haven, 2021
The main argument in Atlas of AI is that far from being an abstract, mathematical endeavor that produces a bounty of societal benefits, AI is just as messy as many other industries. A full understanding of AI must include its political, social, and environmental costs.
AI and the Environment
The book opens with a visit to a lithium mine in Nevada. Lithium, of course, is a critical element in the batteries that power our cell phones, laptops and, increasingly, cars, all of which are heavy users of AI. Mining has huge impacts on the environment and on the health of miners, but we rarely consider these factors or their costs when we think about the phones in our pockets.
Likewise for energy. These days, most AI systems run in the “cloud.” It sounds light and fluffy, but the cloud is actually composed of giant interconnected data centers all over the world each housing thousands upon thousands of energy-hungry computers. When you type a query into Google, say, your query is transmitted to one of those data centers where dozens, possibly hundreds, of computers might be involved, perhaps for just fractions of a second, in finding and returning the answer. Answering your query consumes energy and produces a carbon footprint that we consumers are mostly unaware of.
But internet companies are acutely aware of these costs. Crawford criticizes Google for extracting tax concessions and other benefits from the town of The Dales, Oregon several years ago, in return for locating one of its data centers there.
Now you could argue that plenty of industries rely on raw materials and energy, not just AI. Similarly, technology companies are not the only ones to get tax breaks from local and state governments. I think Crawford would say that this behavior runs counter to the myths of AI as an abstract, high paying, and environmentally friendly activity. In fact, AI is part of and supports powerful corporate and government interests.
“The corporate imaginaries of Al fail to depict the lasting costs and long histories of the materials needed to build computational infrastructures or the energy required to power them. The rapid growth of cloud-based computation, portrayed as environmentally friendly, has paradoxically driven an expansion of the frontiers of resource extraction. It is only by factoring in these hidden costs, these wider collections of actors and systems, that we can understand what the shift toward increasing automation will mean.” [p. 47-48]
Labels and Classifiers
For me, the most interesting parts of the book are the middle chapters that focus on how data is collected and used to build models known as classifiers, and how those models are then used by governments and corporations to make decisions that can profoundly affect our lives.
To explain this, I need to get a little technical for a few minutes – bear with me. A classifier is a mathematical algorithm that is used to put things – email messages, photographs, credit card transactions – into specific categories or classes.
The sorting hat from Harry Potter is a great example of a classifier though it uses magic (I assume) rather than mathematics to assign students to their houses.
An email classifier might decide whether a particular email message is spam or legitimate. An image classifier might decide whether a photograph contains a cat or not.
There are dozens of algorithms that perform this sort of classification. They have exotic names like boosted trees, support vector machines, and convolutional neural networks. But they’re all just generic algorithms. Before they can actually classify stuff, they need to be trained. This training process is at the heart of machine learning, probably the most important branch of artificial intelligence today. The result of training is a model – a numeric representation of the characteristics that can be used to identify spam, or cats, or whatever. To train one of these machine-learned models, you need to feed it lots of examples. An email classifier might be trained using a million messages that are already known to be either spam or not spam. Training a cat detection model might require hundreds of thousands of photographs which have already been identified as cat photos or something else. These pre-classified training examples are known as labeled data.
Model training is done by highly skilled researchers and engineers who work for corporations, universities, governments, and militaries.
To summarize, engineers build machine learned models to classify things by training them on massive quantities of labeled data. With me so far?
Power, Politics, and AI
Now identifying spam or cat photos seems pretty innocuous and might even be helpful. But, as Crawford points out, there are serious problems lurking just below the surface.
First, who decides what the classes are? Crawford argues that selecting the classes that a model looks for is a profound exercise of political power. To illustrate, suppose we’re attempting to classify photos of people into male and female. Why only those two classes? What about transgender and non-binary people? Who decides? What are their biases?
It gets even worse if the model is trying to classify people into racial groups. Who gets to define what those groups are? What sort of oversight is there? Attributes such as gender and race are widely recognized to be constructed rather than inherent, so are such models doing anything useful at all? Or do they just reflect the biases of the model engineers or the people who pay them?
Next, who labels the training data? You can imagine it might be expensive and time consuming to label all those cat photos. Typically, this work is outsourced. Sometimes we consumers participate in the labeling process. When you click the “Junk” button on an item in your mailbox, you’re contributing a label to your email provider’s training data. But very often, labeling is done by workers in developing countries who are paid a fraction of a penny for each item. Are they paid fairly? What are their working conditions like?
This leads to immediately to another issue: How do we determine the accuracy of the labels? Does anyone consult with the subject in the photo to confirm whether they agree with the assigned label? Crawford asks what if the person self-identifies differently, or self-identifies as a class that’s not one of the classes selected for model training? What if you wanted the sorting hat to put you in a fifth house?
And by the way, did anyone ever grant consent for their photo to be used for model training? Sometimes these images are public, voluntarily published by people on social media, who may or may not be aware their images could be used for model training. Crawford points out that one of the most commonly used public datasets for training facial recognition models is the National Institute of Standards and Testing (NIST) Special Dataset 32 – Multiple Encounter Dataset. It consists of millions of police and FBI mugshots of deceased people who had multiple encounters with law enforcement. In this case the subjects are all dead so perhaps the issue of consent is moot. But Crawford rightly asks, should we really be training facial recognition models based heavily on mugshots without any consideration of the context within which they were taken? Or the extent to which these images fairly represent the population as a whole?
Finally, to what uses are we putting these models? Blocking spam seems like a good idea, at least I think so. But what if the classifier determines whether your loan application is accepted by the bank, or your job application gets past automated screening? What if the image classifier at airport security decides your face has the characteristics of someone likely to be a criminal or a terrorist? In all these cases, the questions about accuracy, consent, context, representation, and bias are critically important. And they’re not getting the attention they deserve.
Crawford quotes Anna Lauren Hoffman, a professor at the University of Washington’s Information School, who says:
“The problem here isn’t only one of biased datasets or unfair algorithms and of unintended consequences. It’s also indicative of a more persistent problem of researchers actively reproducing ideas that damage vulnerable communities and reinforce current injustices.” [p. 117]
Atlas of AI dives deeply into these profound and uncomfortable questions.
Unfortunately, the book offers nothing concrete in the way of solutions or even approaches to solutions.
Crawford calls for a “politics of refusal”, one that questions our current “technology first” approach to solving problems. We could be seeing some inklings of this, she says, in the growing resistance to the use of facial recognition software by police here in the US. She calls for an holistic approach to climate, labor, and racial justice issues, along with data protection, that would challenge the power structures that AI helps to reinforce.
Sorry, too vague and too wooly for me. Yes, it’s true we need to consider the intersectionality of the many forms of injustice confronting our society today. But the book doesn’t contain a single specific proposal or policy recommendation. Crawford doesn’t even call for something as obvious as carbon pricing to reflect the true environmental costs of running data centers or cargo ships. Nor does she discuss any legal reforms that could make people the owners of their images or the data collected about them, or that might require affirmative consent for that data to be used by governments and corporations.
I still think AI brings huge benefits. I wear glasses and have trouble reading street and highway signs from a distance, so turn-by-turn GPS navigation helps me enormously. No one likes spam. AI is sometimes better at identifying cancerous tumors than doctors. These are just a few examples.
But clearly there are serious problems with AI too.
I think Crawford deserves credit for helping us to see the end-to-end systems of resource, labor, and data extraction that power artificial intelligence today, and for raising important questions about the real social and political impacts of AI.
We’ll just have to look elsewhere for solutions.
This one was a little long, and a little technical — thanks for reading.