Mining for Data: The Extractive Economy Behind AI

ChatGPT and other large language models are triggering a profound industrial reorganisation in the digital economy. Leading AI expert Kate Crawford looks at the risks posed by the extractive practices that lie behind these technologies.

Green European Journal: The release of ChatGPT for public use has sparked worldwide hype around artificial intelligence. What are the politics and interests driving this and other similar large language models (LLMs)?

Kate Crawford: The first thing to note is that large language models are a very significant inflection point. In the AI field, I haven’t seen anything like this since the creation of the web, and possibly further back. This is causing a profound industrial reorganisation, where LLMs are not just a new interface, but the new medium through which we will receive and create information in the years to come. It is a very meaningful change, because it comes with a variety of technical and political questions. For me, the most important aspect is the material one.

In my book Atlas of AI, I look at the three main components that drive artificial intelligence: data, human labour, and environmental resources. Generative AI (referring to technology that can produce text, imagery, audio, and other kinds of content) increases the use of each of those components. It requires a vast amount of data, more people working behind the scenes as clickworkers and in factories, and much more energy. This leads to a larger carbon footprint and greater water consumption than ever before.

Would there be a way to create large language models without this level of extraction?

Part of the reason I describe AI as an extractive industry is that it has always relied on extracting data, labour, and natural resources at scale. Evidence from the first phase of this “generative turn” in the AI field suggests that it is heavily dependent on extraction. There is ongoing research trying to create generative AI models with less data and make them more energy efficient, but there is a widespread belief in the field that scale is the solution to making LLMs work better. This entails billions of hyperparameters involved in the machine learning process and extraordinary amounts of energy to sustain LLM-based searches. For this reason, rather than asking whether the AI industry could be different, we should look at how it works right now. And it is raising a lot of environmental red flags.

You argue that AI is neither artificial nor intelligent. Can you explain why?

AI is often described in terms that make it sound like science fiction, as if it were just immaterial data in the cloud. But in fact, AI has enormous material implications connected to its energy use, water consumption, the extraction of minerals, the construction of data centres, and so on. In this sense, AI is not artificial at all, but deeply rooted in the material world. As for the intelligence part, I find it problematic that people approach LLM systems as if they were interacting with other humans. This anthropomorphisation generates the belief that there is a form of consciousness emerging from AI, but this is simply not the case.

It is important that we demystify the way these models function. And that means looking at the technical and social layers, the hidden labour, and all the components that make a system work. They are forms of statistical probability and analysis at scale – something very different from human intelligence. In many cases, it requires thousands of humans to prop these systems up all along the supply chain, from dataset labelling to content moderation. In some cases, we even see people pretending to be AI. For all these reasons, AI is neither artificial nor intelligent.

In the book, you also discuss AI’s almost colonial logic, its attempt “to capture the planet in a computationally legible form”. Can you explain how that works?

The large training sets behind LLMs are scraping the entire internet, using data from sensors in cities, robotic vacuum cleaners, and every other possible source, which will all become a form of data input to train ever-larger models. In this sense, LLMs systems harvest everything that can be made digital, and then use it to train corporate AI models.

Why are environmental aspects still downplayed in the discussion about the possible harms of AI?

Because the true environmental costs of AI are among some of the most closely guarded corporate secrets. Despite the lack of reliable data from the industry, several studies have modelled the impact of AI. One study estimated searches through LLMs to be five times more energy consuming than traditional search engines. Another claims that every ChatGPT query is the equivalent of pouring out half a litre of water onto the ground. That’s very disturbing, considering that fresh water is such a scarce resource.

The environmental costs of AI are closely guarded corporate secrets.

So there are studies looking at the empirical evidence, but it is difficult for this story to really catch on because it takes time to do this investigative work. I spent five years researching around the mining that goes into the creation of these systems, and to assess the true environmental cost of Bitcoin. It will take time until we know the true carbon footprint and environmental costs of generative AI. But we already know that it is more energy intensive than it should be, and that it is going in the wrong direction at a time when we urgently need to curtail our energy use and carbon footprint.

What could be done to move this issue up the agenda?

I think it has to be a regulatory issue. We should have transparency laws that force companies to release reliable data around the environmental cost of their systems, similarly to what has happened in the automotive industry. It is also very important that we start to have a public debate around where and when AI systems are useful. What we are seeing now is a situation in which generative AI is the hammer, and everything is a nail. We see generative AI being built into everything from search to document writing, image generation, video editing, and so on. We need to ask whether systems that are so energy intensive should be used so pervasively.

Part of what is happening comes from a condition of ignorance: people are not aware that every time they use generative AI, they contribute to the pollution of the environment. But to make users more aware, we should first have some transparency.

It is often said that AI could be useful – or even essential – in the fight against climate change, but people who take this position rarely explain why they think this is the case.

Taking action on climate change will require governments to do much more than they’ve done so far. We know that a small number of companies is responsible for a big part of the total carbon footprint. So more than artificial intelligence, the solution to climate change is political action, including some real regulation around what is currently happening. AI is commonly described as a solution to everything, when often it is just part of a much more complex picture.

What kinds of developments can we expect within AI in the near future? Could it be approaching a plateau?

There is a lot of debate around whether the current techniques for building LLMs will plateau at a certain point. The answer will depend on whether the industry can sustain the extraordinary amount of computing necessary to build these systems. Currently, there is a global shortage of specialised chips for AI training rounds due to extraordinarily high demand. This shows that the choke points concern the materials and infrastructure needed to do AI at scale. The conversation about AI needs to look beyond algorithmic approaches and ask whether we are going to hit a plateau in terms of what the planet can sustain.

Great powers have their different approaches towards AI and are actively trying to manipulate each other’s access to the value chain. What lies ahead for the geopolitics of AI?

The geopolitics of AI is already heating up. The EU is now creating its own policies on the production of semiconductors. In April, the European Council and Parliament reached a political agreement on the so-called Chips Act. In the United States, the Biden administration has an increased focus on localising the production of technological infrastructure and is looking to break its reliance on China for rare earth minerals and other components. At the same time, Taiwan is a major exporter in the digital economy, so AI is already an important phenomenon in today’s geopolitical conflicts. It remains to be seen how this is going to play out with the additional infrastructure demands linked to generative AI.

Related articles

Most popular