Enterprises in for a shock when they realize power and cooling demands of AI
Energy consumption set to become a key performance indicator by 2027
Most businesses rushing to adopt AI are unprepared for the energy demands it'll place on their infrastructure, and few have a handle on the power consumption of AI systems or the implications for their datacenters.
Research commissioned by AI chip biz SambaNova found that 72 percent of corporate leaders are aware AI models have huge energy requirements, and many are concerned about this, yet only 13 percent monitor the power consumption of the AI systems they have deployed.
The power draw is driven in most cases by reliance on power-hungry GPUs that are crammed into high-performance server systems to tackle model training. SambaNova chief Rodrigo Liang, said:
"Without a proactive approach to more efficient AI hardware and energy consumption, particularly in the face of increasing demand from AI workflows, we risk undermining the very progress AI promises to deliver."
He expects attitudes to change, and forecast that by 2027, most corporate leaders will be keeping a close check on energy consumption as a key performance indicator (KPI).
Adding impetus to this is the rise of so-called agentic AI models. These are developed to be capable of taking autonomous actions and solving multi-step problems, and their greater complexity will add to energy woes, according to SambaNova.
Naturally, SambaNova is pitching its AI silicon, packaged in servers with the company's software stack, as a lower power alternative to GPUs, however, not everyone will wish to go down that route.
GPUs creepers
For organizations sticking with GPUs, dealing with the heat generated by the power-guzzling hardware is becoming another big issue, with Nvidia's Blackwell products rated at 1,200 W, for example. In many cases, this will involve more effective cooling systems, with liquid cooling becoming increasingly popular.
Analyst firm Omdia estimated last year that datacenter liquid cooling revenue was set to top $2 billion by the end of 2024, and $5 billion by 2028.
However, not all facilities will be suitable for outfitting with liquid cooling, according to managed services provider Redcentric.
"Greater investment in AI development and implementation is likely to lead to an increased demand for datacenters," claimed chief technology officer Paul Mardling. Building new facilities, he added, is "a significant investment" that takes time, planning permission, power provisioning, and physical construction.
"In the short term, this will lead to increased demand for existing datacenters, many of which haven't been designed for the density or power draw that AI systems require."
While traditional facilities were built around halls with racks of 2-5 kW power density, new builds now have to accommodate much higher, he said.
"Liquid cooling is essential in racks with greater than 10 kW power density and is desirable in the 5-10 kW range. Efficient use of excess heat is also likely to emerge, either for thermal regenerative power generation or communal heating projects."
Omdia agrees that AI is driving energy demand and the need for more effective cooling.
"Yes, greater adoption of AI computing will drive up datacenter power density," senior research director Vlad Galabov told The Register.
- Biden opens federal land to power-hungry AI datacenters
- With AI boom in full force, 2024 datacenter deals reach $57B record
- UK prepared to throw planning rules out the window for massive datacenters
- To save the energy grid from AI, use open source AI, says open source body
"We are already seeing this and there are several implications: we have seen requests to utilities for more power and adoption of onsite self-generation either through gas engines or turbines," he added.
Power upgrades also involve pre-fab modules housing extra switchgear, UPS, and batteries deployed on-campus to enable higher power capacity, while some sites have been retrofitted with high-capacity busways in place of cables to distribute more power to the racks.
However, Galabov reckons this type of retrofit is less likely in older datacenters due to the costs.
"There is likely a ceiling to how much density can be raised. At one Equinix site I saw a retrofit project result in rack density increasing from 10 kW to 30 kW per rack."
That specific case involved new pipe work to support a coolant distribution unit (CDU) connected to rear-door heat exchangers in the racks, while power distribution was swapped from cables to a new busway, and within each rack new power distribution units (PDUs) were installed, Galabov told us.
On the topic of liquid cooling, he says some sites have seen the adoption of air-to-liquid CDUs as one way to avoid having to completely update the pipe network within their datacenters, and Microsoft is a big proponent of this approach.
Yet, according to the Omdia research director, adoption of this may be limited because the density they enable is not up to supporting incoming AI infrastructure like Nvidia's Blackwell rack-level reference design (NVL72).
"In areas where operators did not want to update racks and install manifolds for direct-to-chip liquid cooling, we've also seen deployments of rear door heat exchangers, which are a good way to cope," Galabov said. Vendors believe they can go up to 100 kW with this arrangement. However, he thinks this is unlikely as the temperature of the ambient air would need significant lowering, which would be very expensive.
For UK businesses, colocation firm Telehouse recently unveiled a liquid cooling lab at its London Docklands campus, showcasing several of the available technologies. These include a waterless, two-phase system for next-gen server chips and air-assisted liquid cooling technology for up to 90 kW per cabinet. ®