Off-Prem

SaaS

MySQL Heatwave dives into object storage data lakes

Oracle joins the analytics anywhere bandwagon, promises future access to AWS S3


Oracle has launched MySQL HeatWave Lakehouse, an extension to its proprietary analytics platform which now supports object storage outside the database.

The analytics system, which was built on top of the open source MySQL database, can query data in the object store in a variety of file formats as well as combine it with data in MySQL. Meanwhile, files in the object store are queried directly by HeatWave without copying the data into the MySQL database, Oracle told us.

The data lake technology supports file formats including CSV, Parquet, and export files from other databases. At the same time, MySQL Autopilot promises to improve performance and scalability without requiring database tuning expertise.

On a 500TB TPC-H benchmark, Oracle claims queries took nine times longer on AWS's data warehouse and 17 times longer on Snowflake and Databricks compared with the new Heatwave datalake. Google's BigQuery would be 36 times slower, Oracle reckons, though it did not publish comparisons with Teradata, the data warehouse vendor founded in 1979.

The system is only available on Oracle Cloud Infrastructure (OCI), but Nipun Agarwal, senior vice veep of MySQL HeatWave, told The Register that Oracle planned to extend the system to query data held in object storage in other clouds including AWS, Azure and GCP.

"One of the important things to note over here is that data in the object store remains in the object store," he said. "We do not copy data from the object store into the MySQL database. Secondly, the processing of this data, whether it's loading or queried, is done by Heatwave not by the MySQL engine. That's what gives it extreme scalability because the Heatwave cluster can scale up to 500 nodes."

Using analytics engines to query data outside their home database is not new. The approach was used by Snowflake, Cloudera and Google's BigQuery with their support for the Apache Iceberg table format. Similarly, Databricks, Microsoft and SAP have endorsed Delta Lake table format, an open source format under the Linux Foundation, created by Databricks.

Commentators and vendors have suggested most vendors will come to support most formats, including Hudi.

Agarwal said Oracle intends HeatWave to support these formats in the future, starting with Iceberg and Delta Lake.

The Autopilot feature offers schema inference, which help users determine data type in object storage before data is analyzed by the query engine.

"We can come up with this mapping, even for files which don't have metadata," Agarwal said. "Autopilot can make these predictions in less than one minute. We invented this technique called adaptive data sampling, which very intelligently scans and samples the file without compromising on the accuracy."

Autopilot also predicts the in-memory representation for a specific data source, the optimal size of the cluster that is needed to compute the data and how long it's going to take to load the data, he said.

Holger Mueller, vice president and principal analyst at Constellation Research, said Oracle had introduced new features to HeatWave in the last three years at a rapid pace. "The HeatWave team has out-innovated all other cloud databases," he claimed.

The move into object storage was "huge," he added, because it "allows users to bring all the data of the enterprise together – into one single query. It is something enterprises have long waited for."

Meanwhile, the ability to query data in AWS, Azure and GCP object storage would appeal to users who want to work across all their enterprise data using Heatwave, he said.

Like any suite model, Oracle Heatwave had the downside of competing with specialist players in any one of its features. "But, at this point, Oracle is more than good enough," Mueller said. ®

Send us news
2 Comments

AWS follows Iceberg path to unite analytics platform

But other obstacles remain before developers get free choice of storage and analytics engines

Oracle open source overlord calls it quits, leaves with big ol' pile of shares

38-year veteran Edward Screven led technology and architecture decisions since Sun merger

It's been 20 years since Oracle bought two software rivals, changing the market forever

After lawsuits and poison pills, PeopleSoft and JD Edwards failed to resist the lure of Larry's ambition

Ransomware crew abuses AWS native encryption, sets data-destruct timer for 7 days

'Codefinger' crims on the hunt for compromised keys

Mitel 0-day, 5-year-old Oracle RCE bug under active exploit

3 CVEs added to CISA's catalog

AWS adds 32-vCPU option and an easier on-ramp to its cloudy desktops

Weirdly, this shows the weakness of hosted Windows with an admission about vidchats

Looming energy crunch makes future uncertain for datacenters

But investors still betting big on bit barns thanks to AI and cloud demand

Workday on lessons learned from Iowa and Maine project woes

Nine in ten of our implementations are a success, CEO Carl Eschenbach tells The Reg

Eight things that should not have happened last year, but did

2024's Tech Fail Roll Of Dishonor

Even Netflix struggles to identify and understand the cost of its AWS estate

If you have trouble keeping track of your various streaming subscriptions, you're gonna love the irony

AWS now renting monster HPE servers, even in clusters of 7,680-vCPUs and 128TB

Heir to Superdome goes cloudy for those who run large in-memory databases and apps that need them

London's Met Police seeks business services, ERP refresh in £370M deal

Contract could be worth a cool £1 billion if associated organizations join