Managing AI's Power Demands: Strategies for Incorporating Supercapacitors

Posted by Ludovico Barro on - 29 February 2024

As AI's computational demands surge, the urgent need for robust energy storage, especially supercapacitors, becomes evident. These components are crucial for ensuring the seamless operation of data centers, which serve as a cornerstone for AI's future. Building on previous blog posts regarding AI's challenges to hardware and power distribution networks, Ludovico Barro Savonuzzi, Head of Application Engineering, explores the role and various methods of integrating supercapacitors.

Demands on computational power

The world is witnessing a fundamental transformation in the technical realm, with some experts celebrating Artificial Intelligence as the greatest invention since the discovery of fire. At the same time, others voice stark warnings about its potential as an existential threat to civilization, a concern that looms even larger than climate change.

While predicting the pace of technological progress is fraught with difficulty, certain truths are undeniable. Machine learning is characterized by software's ability to teach itself, creating an ever-accelerating positive feedback loop. The primary bottleneck in this acceleration is not the software, but the available computational power and the pace of hardware development to accommodate such growth of cognitive resources. If software can update itself in milliseconds across thousands of servers, adapting rapidly to external conditions, the same cannot be said for the hardware required to run it. The processes of power delivery, data center construction, processing card development, and fabrication lag significantly behind in speed.

mojahid-mottakin-P3oBttW6Wt4-unsplash copy

Some view this delay as advantageous, serving as a safeguard against the swift advancement of AI. However, despite these divergent views on the pace of development and its implications, there's a broader acknowledgment within the expert community: the progression of AI is inevitable and unstoppable.

Power infrastructure challenges

This advancement's pace will be significantly influenced by how the power infrastructure evolves to support it. In another article, we touched on the need for power grid inertia, growing ever more important with the rise of renewable energy sources. The increasing power demands of data centers threaten to exacerbate an already strained situation. The core issue is that neither the power grid nor backup generators can instantly adjust their output to follow rapid transient loads: typically, for a data center with a 100MW capacity, the grid requires at least 10-20 seconds to adapt to a load demand change from 0 to 100%. Sharp fluctuations in demand are hazardous, potentially leading to voltage sags, disconnections of power sources, and, in the worst-case scenario, complete power loss.

On the other hand, the power demand from a cluster of Graphics Processing Units (GPUs) performing an AI task or updating their datasets exhibits a sharp increase, reaching its maximum in just a few milliseconds. It then stabilizes at this peak power rating for several seconds before dropping off just as abruptly. Modifying the workload to manage this load presents challenges beyond a certain point due to the intensive parallel computing required for these calculations.

SkelGrid engineer

Path from grid to GPU

The power architecture of data centers is already designed to achieve high levels of reliability, capillarity, and efficiency. Power distribution resembles a structured network of roots, originating as thick channels from the tree's trunk and branching into increasingly finer channels to reach every corner of the soil. Similarly, in a data center, the cabling system and power distribution units (PDUs) serve a comparable function. However, while a tree's roots gather water and deliver it to the main plant, in a data center, power flows from the grid connection point to the myriad transistors across all processors in the building. Each of these transistors operates on an extremely low current at very low voltage.

Regarding the power architecture, everything begins at the grid connection point, often located in remote areas, where a substation is erected to step down the voltage of distribution overhead lines from 10-20kV to 400V AC and provide necessary electrical protections. From here, power has industrial level voltage and it can be safely and economically managed. However, GPUs require very low DC voltage, sometimes as low as 5V, which means there is still a significant distance to cover to ensure functionality of the hardware.

At the substation, the power system meets backup diesel generators, a necessity for all data centers as being left without power is not an option. However, it is important to note that if generators are running, the grid must be disconnected; otherwise, it will drain power from the generators. Hence, an Automatic Transfer Switch concurrently links the generators and disconnects from the grid. Ensuring a smooth transition between these two power sources, there is also a UPS system (Uninterruptible Power Supply) at this level. After this point, the feeders divide into two: line A and line B, comprising identical sets of cables, switches, contactors, and more, delivering power to the data racks for a 100% redundant system moving forward.

The next phase typically involves voltage step-down and rectification, reducing it from 400V AC (alternating current) to 200V DC (direct current). This transformation typically necessitates additional hardware such as fuses, isolators, and bypass switches. Now in close proximity to the server racks, a Power Distribution Unit assumes responsibility for supplying power to an entire aisle of racks with a 200V DC busbar positioned above the cabinets. Each rack is individually connected to this busbar, where further power transformation occurs inside the racks. Custom-designed power supplies fit into confined spaces, converting 200V DC to a more practical 50V DC, the final common voltage for all hardware connections. Subsequently, depending on the hardware, graphic cards typically operate on 5V, necessitating additional DC-DC converters to step down the 50V DC using high-efficiency semiconductor MOSFETs.
Ludovico Barro infront of SkelGrid-1
Varied strategies for integrating supercapacitors

Rack level

While power fluctuations occur at low voltage levels, their effects propagate upstream and impact various devices even before reaching the grid connection point (a prime example being the diesel gensets). To address as many issues as possible, it is most effective to tackle the cause of the fluctuations at the source: near the GPUs and other processing units within the racks.

To accomplish this, supercapacitor modules are being developed to fit within 2 rack units (RUs) or less, engineered for maximum efficiency, minimal heat generation, and minimal footprint. In fact, the power fluctuations of an entire rack can range from 20 to 50 kW and can be managed by a single supercapacitor module from Skeleton Technologies, designed for 50 to 200V DC operation.

Addressing the issue here presents pros and cons: while it safeguards all upstream equipment, custom designing a new product to fit into an already compact and tailored data rack might encounter space constraints and additional development costs.

Aisle level

Moving up a level, addressing the issue at the aisle level offers potential solutions: enhancing a Power Distribution Unit (PDU) by incorporating a peak shaving unit based on supercapacitors can effectively manage load transients from 30 or more racks, often exceeding the 1MW threshold. The advantages include the utilization of off-the-shelf products to design a larger and more centralized unit. Disadvantages include the addition of power electronics with a MW rating at the PDU level, and the fact that all components downstream of this point are still affected by power fluctuations.

UPS room level

Continuing along this path leads to the UPS room, which is arguably the optimal location to integrate energy storage devices. Combining supercapacitors for peak shaving units at 400-800Vdc (and higher) with the existing Lead-Acid batteries and power converters offers numerous benefits, including cost and complexity reduction through the consolidation of similar hardware. These units can be seamlessly integrated with the UPS's existing DC or AC bus.

Moving upward from this point, both gensets and the grid require protection. Thus, advancing to more centralized topologies beyond this point would necessitate duplicating the entire installation, which is likely not economically feasible. Protecting the grid at medium voltage would leave the gensets vulnerable to power transients, necessitating an additional set of supercapacitor modules and power electronics. Therefore, it appears that there is no advantage in surpassing the UPS level.

The power transients induced by AI computing pose a significant obstacle to the development of this technology. Many industry players are attempting to address the consequences at various levels, sometimes relying more on their expertise rather than conducting a thorough pros and cons analysis. The truth is, there is currently a "race to the solution" underway, with numerous promising ideas being discussed. However, they all seem to share a common theme: supercapacitors are the most effective technology for handling this task. The question remains: at which level will they prove most effective?

Skeleton SkelCap copy

Trying to find the best

energy storage solution?

Our experts are at your service, offering personalized guidance to navigate the complex world of energy storage. Discover how our solutions can power your success.

Connect with an expert now