Off-Prem

Alibaba Cloud details storage tech that's doubled its VMs per host

Using one disk as a write cache eases stresses created by manycore CPUs


Exclusive Alibaba Cloud has detailed the tech it developed to run local storage in its servers and bust bottlenecks created by new-generation manycore processors.

The tech was detailed last week in a paper titled "CSAL: the Next-Gen Local Disks for the Cloud" published in the April edition of Proceedings of the Nineteenth European Conference on Computer Systems. Eleven authors work at Alibaba Cloud, and another six work at Solidigm – Intel's old SSD business now mostly owned by SK hynix.

The paper sets the scene by reminding readers that cloud servers typically use local storage, and that local capacity determines how many VMs each cloudy host can handle. It then notes that modern manycore CPUs encourage clouds and users to run more VMs on each host.

The obvious way to run more VMs, the paper notes, is to pack cloud servers full of colossal hard disks, storage-class memory, or fast solid state disks. But hard disks have bandwidth limits, storage-class memory mostly failed (the paper mentions the Optane tech Intel snuffed), and fast SSDs have capacity problems and big price tags.

What's a cloud to do? Quad-level cell (QLC) SSDs are an obvious answer, the paper suggests, because they offer high capacity and decent prices.

Alibaba Cloud therefore tried QLC disk in three scenarios: as a drop in replacement for other disks, as part of a layered system alongside high-speed SSDs, and using the dm-zoned a kernel device mapper.

The paper explains that QLC failed as a drop-in replacement because of "the two levels of write amplification caused by device-level address mapping with Indirection Unit and NAND-level garbage collection."

A layered system that used a write-back cache to handle small writes in one SSD helped, but didn't match hard disk performance.

dm-zoned didn't help either, because under load it constantly needed to move data – which smashed performance.

Alibaba therefore devised the Cloud Storage Acceleration Layer (CSAL), which the paper explains sees the most recently used data stored in DRAM and swapped to a fast SSD, which also handles all incoming writes. When possible and sensible, data from that SSD is shunted into the QLC disk.

The paper explains CSAL's workings in sufficient detail that that even our storage-centric sibling site Blocks and Files might find its attention wavering.

The impact of CSAL on Alibaba Cloud ops is easier to understand and is outlined as follows:

Compared to last-gen HDD-based local disks (24× 2TB HDDs with a 48-core Xeon Cascade CPU), CSAL-ready servers (an 800GB HP-SSD and a 15.36TB QLC SSD with a 64-core Xeon Ice Lake CPU) can host twice more instances while achieving the same Service Level Objects.

That's a doubling of VM density from second-gen to third-gen Xeons, despite the extra 16 cores in the newer processor stressing storage more than the older silicon. Also, 64 is not double 48.

CSAL is in production across "thousands of Elastic Compute Service (ECS) nodes in Alibaba Cloud." Maybe you could run it too: Alibaba Cloud has open-sourced CSAL into the Storage Performance Development Kit.

Alibaba Cloud has racked up a few wins lately. After cutting prices, its homebrew Yitian 710 was recently rated the fastest Arm CPU in the cloud. We've also covered an in-house networking tool that slashed the number of personnel the Chinese concern needed to dedicate to troubleshooting, and research suggesting Alibaba Cloud's operations could be more efficient than Google's.

Which is great news for Chinese cloud users, who have no qualms about working with Alibaba Cloud. For the rest of us, the decision to consider Alibaba cloud is doubtless more complicated. ®

Send us news
3 Comments

Brit government contractor CloudKubed enters administration

Home Office, Department for Work and Pensions supplier in hands of FRP Advisory

AWS adds 32-vCPU option and an easier on-ramp to its cloudy desktops

Weirdly, this shows the weakness of hosted Windows with an admission about vidchats

With AI boom in full force, 2024 datacenter deals reach $57B record

Fewer giant contracts, but many more smaller ones, in bit barn feeding frenzy

Cryptojacking, backdoors abound as fiends abuse Aviatrix Controller bug

This is what happens when you publish PoCs immediately, hm?

AI hype led to an enterprise datacenter spending binge in 2024 that won't last

GPUs and generative AI systems so hot right now... yet 'long-term trend remains,' says analyst

Even Netflix struggles to identify and understand the cost of its AWS estate

If you have trouble keeping track of your various streaming subscriptions, you're gonna love the irony

AWS now renting monster HPE servers, even in clusters of 7,680-vCPUs and 128TB

Heir to Superdome goes cloudy for those who run large in-memory databases and apps that need them

$800 'AI' robot for kids bites the dust along with its maker

Moxie maker Embodied is going under, teaching important lessons about cloud services

Chinese clouds target small and medium enterprises in APAC in search of growth

Smaller buyers see deep discounts and suddenly worry less about regulatory issues

Alibaba Cloud brings chatty SaaS products out of China and into more markets

Teams-like DingTalk gets an enterprise edition, and virtual Androids unleashed

AMD secure VM tech undone by DRAM meddling

Boffins devise BadRAM attack to pilfer secrets from SEV-SNP encrypted memory

AWS introduces S3 Tables, a new bucket type for data analytics

One of the most significant API changes since S3 was launched, AWS VP tells us