AWS follows Iceberg path to unite analytics platform

But other obstacles remain before developers get free choice of storage and analytics engines

Analysis Last week, AWS jumped into Iceberg with both feet. S3 Buckets, the near-ubiquitous storage containers for developers, got another layer. The dominant cloud platform provider introduced S3 Tables, for storing data in Apache Iceberg, an open table format (OTF), which promises developers and data engineers the ability to bring their analytics engines of choice to their data, wherever it resides, instead of moving it.

The move is significant in bringing together analytics, machine learning, and S3-stored data in one environment, according to Tony Baer, principal analyst at dbInsight. In doing so, AWS has repositioned SageMaker, and extended it beyond a workspace for AI developers to an environment where data and AI come together, he said.

SageMaker was now a platform that brought together AWS query engines, various data sources, and development tools. "Before this, you could in [AWS data warehouse] Redshift get dedicated access to models developed in SageMaker, but they were each on a point-to-point connection," he said. "With SageMaker, now they've put in an umbrella that brings us all together, but not just the skin. They've actually gotten down underneath the hood to do real integration against single data source."

Baer also noted that SageMaker Data Lakehouse represents a "full implementation" of Apache Iceberg, the open table format for analytics originating in a Netflix project. Iceberg becomes the default data store on S3 Tables so users can push down queries to data where it lives, including Redshift's native managed storage.

AWS has previously indicated it was in favor of Iceberg as the default table format, which is a rival to the Linux Foundation's Delta Lake, developed by Databricks. In August 2023, AWS said Redshift could query Apache Iceberg tables in AWS Glue Data Catalog, although it added some caveats.

AWS's further commitment to Iceberg last week was another indication that the Apache project would win out against Delta Lake, which is also favored by Microsoft in its Fabric environment and enterprise software giant SAP, although both vendors offer some ways of working between the two formats.

While Databricks CEO Ali Ghodsi is highly competitive, he is also a pragmatist, Baer said. Databricks spent $1 billion buying Tabular, which was co-founded by Ryan Blue and the other Iceberg developers from Netflix. Despite some fears the acquisition would lead to Iceberg becoming fragmented, with Databricks pulling its development away from the Apache project to its own table format, they do not appear well founded.

"The good news is that has clearly not happened," Baer said.

Although some vendors have lined up behind Iceberg – Snowflake and Cloudera in particular – there was no particular advantage in any vendor competing over the format, Baer said.

"What they've all realized is they're not going to make or break their product on a table format; it's like competing on TCPIP, it doesn't make a damn bit of difference."

Earlier this year, Blue said the long-term vision was to converge Iceberg and Delta, but that would take a few years. In the meantime, Databricks offers UniForm, a product designed to allow data stored in Delta to be read as if it were Apache Iceberg, to help interoperation between the two formats.

With the newfound harmony among vendors over table formats, and AWS's strengthened support for Iceberg, maybe the market would move closer to offering the promise of bringing any analytics engine to any data, without moving it, an idea mooted by Cloudera and Snowflake as they promoted their support for the table format.

But there could be other stumbling blocks along the way, Baer said. "The action is going to be at the catalog level, what's going to go into the Iceberg REST API, which is the basic technical metadata catalog of Iceberg. The catalog is a place where query engine providers can differentiate, so I think that road there will be a bit rockier." ®

More about

TIP US OFF

Send us news


Other stories you might like