On-Prem

Systems

AWS opens cluster of 40K Trainium AI accelerators to researchers

Throwing novel hardware at academia. It's a tale as old as time


Amazon wants more people building applications and frameworks for its custom Trainium accelerators and is making up to 40,000 chips available to university researchers under a $110 million initiative announced on Tuesday.

Dubbed "Build on Trainium," the program will provide compute hours to AI academics developing new algorithms, looking to increase accelerator performance, or scale compute across large distributed systems.

"A researcher might invent a new model architecture or a new performance optimization technique, but they may not be able to afford the high-performance computing resources required for a large-scale experiment," AWS explained in a recent blog post.

And perhaps more importantly, the fruits of this labor are expected to be open-sourced by researchers and developers so that they can benefit the machine learning ecosystem as a whole.

As altruistic as this all might sound, it's to Amazon's benefit: The cloud giant's custom silicon, which now spans the gamut from CPUs and SmartNICs to dedicated AI training and inference accelerators, was originally designed to improve the efficiency of its internal workloads.

Developing low-level application frameworks and kernels isn't a big ask for such a large company. However, things get trickier when you start opening up the hardware to the public, which in large part lacks these resources and expertise, necessitating a higher degree of abstraction. This is why we've seen many Intel, AMD, and others gravitate toward frameworks like PyTorch or TensorFlow to hide the complexity associated with low-level coding. We've certainly seen this with AWS products like SageMaker.

Researchers, on the other hand, are often more than willing to dive into low-level hardware if it means extracting additional performance, uncovering hardware-specific optimizations, or simply getting access to the compute necessary to move their research forward. What was it they say about necessity being the mother of invention?

"The knobs of flexibility built into the architecture at every step make it a dream platform from a research perspective," Christopher Fletcher, an associate professor at the University of California at Berkeley, said of Trainium in a statement.

It isn't clear from the announcement whether all 40,000 of those accelerators are its first or second generation parts. We'll update if we hear back on this.

The second generation parts, announced roughly a year ago during Amazon's Re:Invent event, saw the company shift focus toward everyone's favorite flavor of AI: large language models. As we reported at the time, Trainium2 is said to deliver 4x faster training performance than its predecessor and boost memory capacity by threefold.

Since any innovations uncovered by researchers — optimized compute kernels for domain-specific machine learning tasks, for example — will be open-sourced under the Build on Trainium program, Amazon stands to benefit from its crowdsourcing of software development.

Naturally, throwing hardware at academics is a tale as old as university computer science programs, and to support these efforts, Amazon is extending access to technical education and enablement programs to get researchers up to speed. This will be handled through a partnership with the Neuron Data Science community, an organization led by Amazon's Annapurna Labs team. ®

Send us news
2 Comments

Amazon splashes $11B on AI datacenters in Georgia

Peach State already home to more than 50 bit barns

Biden said to weigh global limits on AI exports in 11th-hour trade war blitz

China faces outright ban while others vie for Uncle Sam's favor

Can AWS really fix AI hallucination? We talk to head of Automated Reasoning Byron Cook

Engineer who works on ways to prove that code is mathematically correct finds his field is suddenly much less obscure

Looming energy crunch makes future uncertain for datacenters

But investors still betting big on bit barns thanks to AI and cloud demand

UK unveils plans to mainline AI into the veins of the nation

Government adopts all 50 venture capitalist recommendations but leaves datacenter energy puzzle unsolved

Schneider Electric warns of future where datacenters eat the grid

Report charts four scenarios from 'Sustainable AI' to 'Who Turned Out The Lights?'

Where does Microsoft's NPU obsession leave Nvidia's AI PC ambitions?

While Microsoft pushes AI PC experiences, Nvidia is busy wooing developers

Microsoft eggheads say AI can never be made secure – after testing Redmond's own products

If you want a picture of the future, imagine your infosec team stamping on software forever

Ransomware crew abuses AWS native encryption, sets data-destruct timer for 7 days

'Codefinger' crims on the hunt for compromised keys

Just as your LLM once again goes off the rails, Cisco, Nvidia are at the door smiling

Some of you have apparently already botched chatbots or allowed ‘shadow AI’ to creep in

Microsoft, PC makers cut prices of Copilot+ gear in Europe, analyst stats confirm

Double-digit reduction only served to 'stimulate some interest'

We’re paying for what we don’t get: East D.C. neighbors frustrated with Amazon’s Prime delivery exclusions

Locals demand transparency - and a refund wouldn't hurt