Tor Björn Minde
Enhetschef
Contact Tor BjörnIn a previous blog post, we posed the question, "Does generative AI run on thin air?" and discussed the associated challenges and strategies for mitigating energy use. We even put this question to ChatGPT, and it couldn't provide a clear answer.
In this blog post, we aim to delve deeper into the energy use involved in training and using large language models. It's important to note that there is limited detailed research available on this topic, and our analysis is based on data collected from various sources, providing an initial perspective that may require updates in the future.
The journey of artificial intelligence (AI) began in the 1950s, marked by significant milestones such as Alan Turing's proposal of the "Turing Test" and the Dartmouth Conference, where the term "artificial intelligence" was coined. In the late 1950s, Frank Rosenblatt and Bernard Widrow developed the perceptron and the LMS algorithm, laying the foundation for neural networks, albeit limited in capability.
Three decades later, David Rumelhart and others introduced the backpropagation algorithm, enabling the creation of multi-layer neural networks. This marked the onset of machine learning and the ability to train networks with thousands of parameters and multiple layers.
Another three decades passed, and in 2017, researchers described how to train super-large neural networks known as transformers. This ushered in the era of generative AI and large language models, with models containing billions to trillions of parameters, trained on vast datasets. The performance of these models has astounded researchers and the general public, leading to their widespread use by millions of people daily.
The Future of AI The 30-year intervals in AI development are intriguing. What will the next 30 years bring?
The user growth rate of ChatGPT (GPT-3) has outpaced all other online services by a wide margin. It achieved one million users within five days of its release in November 2022. In comparison, Instagram took 2.5 months, Facebook took 10 months, and Twitter took two years to reach the same milestone after their respective launches.
To reach 100 million active users, ChatGPT took only two months, compared to nine months for TikTok and 30 months for Instagram. OpenAI has since trained the next-generation model, GPT-4, and introduced it behind a paywall. The growth rate of GPT-4's usage is not publicly disclosed.
Training a super-large language model like GPT-4, with 1.7 trillion parameters and using 13 trillion tokens (word snippets), is a substantial undertaking. OpenAI has revealed that it cost them $100 million and took 100 days, utilizing 25,000 NVIDIA A100 GPUs. Servers with these GPUs use about 6.5 kW each, resulting in an estimated 50 GWh of energy usage during training. If cloud costs were factored in at approximately $1 per A100 GPU hour, the cloud expenses alone would amount to around $60 million.
OpenAI has also stated that it takes about 6 FLOP (floating-point operations) per parameter per token to train GPT-4. This translates to a total of 133 billion petaFLOP for GPT-4. To put this in perspective, if the European supercomputer LUMI were employed, which operates at 550 petaFLOP/s and 8.5 MW, it would take approximately 8 years and 600 GWh to complete the training. This is merely an indication of the scale of GPU clusters needed for GPT-4's training.
To provide context, all data centers in Sweden currently use about 3,000 GWh of energy. GPT-4's training of 50 GWh alone would account for approximately 2% of that capacity for a single training session. When considering the training demands of various companies and organizations globally, it becomes evident why the data center industry is experiencing a surge in demand for capacity.
The location of training also plays a crucial role in the environmental impact of large language models. For instance, if GPT-4 were trained in Northern Sweden, where the energy mix results in 17 gCO2eq/kWh, it would be equivalent to driving an average car around the globe 300 times. Training GPT-4 in Germany, with an environmental cost of 30 average cars circling the globe 300 times, highlights the significance of training location.
Comparatively, the training of GPT-3 is less resource-intensive. With 175 billion parameters and training on 300 billion tokens, the training cost is considerably lower, resulting in 0.2% of GPT-4's training cost. On a 25,000 GPU cluster, GPT-3 can be trained in about 6 hours, using 114 MWh.
While GPT-4 is currently a paid service, its usage is expected to increase in the future. Presently, the 100 million active users primarily engage with GPT-3, the open ChatGPT service. However, if we hypothetically assume that all 100 million active users transition to GPT-4, the model would handle 3.4 petaFLOP per request, based on an assumed request size of 1,000 tokens and 2 FLOP per request, as per OpenAI's specifications. In contrast, requests to GPT-3 result in 0.35 petaFLOP per request, one-tenth of GPT-4's computational load.
OpenAI has revealed that GPT-3 serving runs on a server cluster with 128 GPUs. With each NVIDIA A100 GPU server using 0.13 Wh per request in just 0.004 seconds, it accumulates to 0.68 billion petaFLOP per day and 91 GWh per year for serving with GPT-4. In contrast, serving with GPT-3 on 10 clusters with 16 6U servers results in 0.07 billion petaFLOP per day and 9.5 GWh per year.
To provide a comparison, Google states that one search uses 0.28 Wh. Thus, a GPT-4 request is approximately four times more energy-intensive than a Google search, while a GPT-3 request is half the cost of a Google search. Google likens this energy usage for search to leaving a 60 W light bulb on for 17 seconds.
In conclusion, the energy use associated with generative AI and large language models is substantial. As these models gain popularity and usage increases, understanding and addressing their environmental and energy implications becomes increasingly important.
We at ICE data center are happy to help you if more questions about data centers, liquid cooling and energy use for AI come up—please get in touch if you have any questions!