Skip to content
Tiatra, LLCTiatra, LLC
Tiatra, LLC
Information Technology Solutions for Washington, DC Government Agencies
  • Home
  • About Us
  • Services
    • IT Engineering and Support
    • Software Development
    • Information Assurance and Testing
    • Project and Program Management
  • Clients & Partners
  • Careers
  • News
  • Contact
 
  • Home
  • About Us
  • Services
    • IT Engineering and Support
    • Software Development
    • Information Assurance and Testing
    • Project and Program Management
  • Clients & Partners
  • Careers
  • News
  • Contact

CIOs are (still) closer than ever to their dream data lakehouse

The data lakehouse battle is over. And open-source Apache Iceberg has won. Not even Databricks’ 10-figure acquisition of Tabular, the startup founded by Iceberg’s creators, will change that.

Even so, the bold move has confused and distracted some CIOs. They’re at a loss to explain why Databricks – a lakehouse pioneer and the architect of Delta Lake, Apache Iceberg’s primary competitor – would spend so much to buy three-year-old Tabular, a startup with great promise but barely $1 million in annual revenue.

Some speculate that Databricks wanted to slow the cruising Iceberg ecosystem with a dose of uncertainty. Others wonder whether the company plans to pile Delta Lake projects on the Tabular crew, which continues to play an integral role in steering and developing Iceberg. That would help its own platform, the theory goes, while simultaneously sapping resources from the alternative lakehouse table format.

Another hypothesis: Databricks execs were billion-dollar stoked to stick it to Snowflake by drowning out its event with a buyout its rival reportedly sought. Or maybe they just wanted a quick way to set themselves apart in the active Iceberg space in hopes of soothing Wall Street jitters ahead of its perennially imminent IPO.

Whatever the reason, Databricks is saying all the right things about the openness and portability the acquisition will bring – albeit in terms just vague enough to keep the speculation alive.

“I do think the acquisition has been a bit of a distraction, but that’s probably true anytime that kind of money starts moving around,” David Nalley, director of open-source strategy and marketing at Amazon Web Services, told me. AWS, which has integrated Iceberg into analytics services like AWS Glue and Amazon Athena, has been actively involved in Iceberg’s development for the past three years. “That said, all the signals I’ve seen is that

more people are getting involved. The velocity is actually increasing. And we’re excited about that.”

Indeed, for all the handwringing, much of the work now being done isn’t even on the Iceberg table format, which insiders say is relatively stable. And now that it’s established as the default table format, the REST catalog layer above – that is, the APIs that help define just how far and wide Iceberg can stretch, and what management capabilities data professionals will have – is becoming the new battleground.

It’s also where Databricks can still make an impact by marrying data under its purview with information stored on competitive platforms. In fact, it’s already working toward that. In June, the week after Databricks bought Tabular, it made the Databricks Unity Catalog, its own governance tool, open source.

“The data catalog is critical because it’s where business manages its metadata,” said Venkat Rajaji, Senior Vice President of Product Management at Cloudera. Cloudera also has been investing in both Iceberg and REST catalog capability in its platform. “There’s been a ton of innovation lately around the Iceberg REST catalog because the data turf war is over. But the metadata turf war is just getting started.”

Lakehouse appeal

The pitch for data lakehouse table formats sounds almost too good to be true. The formats are basically abstraction layers that give business analysts and data scientists the ability to mix and match whatever data stores they need, wherever they may lie, with whatever processing engine they choose.

There’s a record of everything – including metadata changes – which paves the way for a host of management and governance capabilities. The data itself remains intact, uncopied and unaltered. So any number of projects can tap into the data at once. And the table formats will keep track of all of it.

CIOs give the thumbs up because the formats do away with needless data copies for individual projects that compound storage fees and swell security, reliability, and manageability headaches. And in theory, at least, it all happens without vendor lock-in.

That last part – the lack of Hotel California-style gotchas like proprietary enhancements and steep egress fees that conspire to pen enterprises into proprietary data warehouses – played a key role in shaping Iceberg by its creators, who worked for Netflix at the time. The

vendor-agnostic approach is also what helped draw large datacentric companies like Apple, Citibank, and Pinterest to the project. And it continues to fuel its rising popularity.

Delta Lake is technically open as well. Databricks donated Delta Lake to the Linux Foundation at about the same time that Netflix handed over the Iceberg project to the Apache Software Foundation. But some CIOs worry that Databricks’ outsized influence in the platform affords the company opportunity to maintain and augment proprietary hooks, like in Databricks Runtime.

“There’s definitely a feeling out there that Delta Lake is the brainchild of one company,” said Russell Spitzer, Principal Engineer at Snowflake. Spitzer, who in June joined Snowflake from Apple, is on the Iceberg project management committee (PMC) as well as the podling (incubating) PMC for Apache Polaris, a REST-compatible API that Snowflake donated to Apache in June. He also contributes code to both.

“You know, it’s open source,” Spitzer said, “But it’s really a Databricks product.”

If you can’t beat ‘em

The first wave of Iceberg adoption kicked into high gear around 2020, when it first became a top-level Apache project. In addition to AWS, more open-centric vendors like Cloudera and Dremio began building services around Iceberg. Google hopped in toward the end of the wave.

Most proprietary data platform providers sat on the sidelines during the initial wave, likely because Iceberg’s any-data-any-engine construct posed a threat to their existing business models. Snowflake was a notable exception. The data platform provider started investing in Iceberg during this period, likely because it needed a counter to Delta Lake, the lakehouse format from its most formidable competitor.

But as it became apparent that enterprises were going to combine data from competitive warehouses with Iceberg, proprietary platform providers began adding support in earnest. That put them in a better position to keep data under management – and possibly to host processing as well.

Just in the past year, Confluent, Oracle, and Salesforce all added support for Iceberg. Snowflake doubled down on Iceberg with Polaris. Microsoft, the last cloud service provider holdout – likely due to its investment in Delta Lake – joined Snowflake’s coming out party in June. And, of course, Databricks has been expanding coverage rapidly as well.

“It’s just amazing to me how far Iceberg has come,” said Snowflake’s Spitzer. “I used to have to explain why people should care about (Iceberg). And now, everyone knows. And everyone knows that everyone’s moving towards it.”

It’s all about the metadata

Iceberg creates a great foundation for combining and working with different data stores for projects. And now that the enterprise data analytics community is basically bought in, the next stage of work is happening at the catalog layer. And where AWS, Cloudera, Databricks, Snowflake, and others are all working to help Iceberg work as well as possible with as much data as possible.

“Catalogs are about more than table formats. They’re about governance as well,” said Roni Burd, Director of Open Data Analytics Engines at AWS. Burd also manages the company’s Iceberg contributions. “So there’s another really great opportunity to innovate on the catalog API, the abstraction layer above the table format. It’s what our customers are asking for. Because it’s opening up a new frontier of solving problems for them.”


Read More from This Article: CIOs are (still) closer than ever to their dream data lakehouse
Source: News

Category: NewsOctober 15, 2024
Tags: art

Post navigation

PreviousPrevious post:IT部門、データセンターの持続可能性にさらなる責任を担うNextNext post:Sanitas se apalanca en la IA para mejorar los servicios digitales de autodiagnóstico y salud personalizada 

Related posts

휴먼컨설팅그룹, HR 솔루션 ‘휴넬’ 업그레이드 발표
May 9, 2025
Epicor expands AI offerings, launches new green initiative
May 9, 2025
MS도 합류··· 구글의 A2A 프로토콜, AI 에이전트 분야의 공용어 될까?
May 9, 2025
오픈AI, 아시아 4국에 데이터 레지던시 도입··· 한국 기업 데이터는 한국 서버에 저장
May 9, 2025
SAS supercharges Viya platform with AI agents, copilots, and synthetic data tools
May 8, 2025
IBM aims to set industry standard for enterprise AI with ITBench SaaS launch
May 8, 2025
Recent Posts
  • 휴먼컨설팅그룹, HR 솔루션 ‘휴넬’ 업그레이드 발표
  • Epicor expands AI offerings, launches new green initiative
  • MS도 합류··· 구글의 A2A 프로토콜, AI 에이전트 분야의 공용어 될까?
  • 오픈AI, 아시아 4국에 데이터 레지던시 도입··· 한국 기업 데이터는 한국 서버에 저장
  • SAS supercharges Viya platform with AI agents, copilots, and synthetic data tools
Recent Comments
    Archives
    • May 2025
    • April 2025
    • March 2025
    • February 2025
    • January 2025
    • December 2024
    • November 2024
    • October 2024
    • September 2024
    • August 2024
    • July 2024
    • June 2024
    • May 2024
    • April 2024
    • March 2024
    • February 2024
    • January 2024
    • December 2023
    • November 2023
    • October 2023
    • September 2023
    • August 2023
    • July 2023
    • June 2023
    • May 2023
    • April 2023
    • March 2023
    • February 2023
    • January 2023
    • December 2022
    • November 2022
    • October 2022
    • September 2022
    • August 2022
    • July 2022
    • June 2022
    • May 2022
    • April 2022
    • March 2022
    • February 2022
    • January 2022
    • December 2021
    • November 2021
    • October 2021
    • September 2021
    • August 2021
    • July 2021
    • June 2021
    • May 2021
    • April 2021
    • March 2021
    • February 2021
    • January 2021
    • December 2020
    • November 2020
    • October 2020
    • September 2020
    • August 2020
    • July 2020
    • June 2020
    • May 2020
    • April 2020
    • January 2020
    • December 2019
    • November 2019
    • October 2019
    • September 2019
    • August 2019
    • July 2019
    • June 2019
    • May 2019
    • April 2019
    • March 2019
    • February 2019
    • January 2019
    • December 2018
    • November 2018
    • October 2018
    • September 2018
    • August 2018
    • July 2018
    • June 2018
    • May 2018
    • April 2018
    • March 2018
    • February 2018
    • January 2018
    • December 2017
    • November 2017
    • October 2017
    • September 2017
    • August 2017
    • July 2017
    • June 2017
    • May 2017
    • April 2017
    • March 2017
    • February 2017
    • January 2017
    Categories
    • News
    Meta
    • Log in
    • Entries feed
    • Comments feed
    • WordPress.org
    Tiatra LLC.

    Tiatra, LLC, based in the Washington, DC metropolitan area, proudly serves federal government agencies, organizations that work with the government and other commercial businesses and organizations. Tiatra specializes in a broad range of information technology (IT) development and management services incorporating solid engineering, attention to client needs, and meeting or exceeding any security parameters required. Our small yet innovative company is structured with a full complement of the necessary technical experts, working with hands-on management, to provide a high level of service and competitive pricing for your systems and engineering requirements.

    Find us on:

    FacebookTwitterLinkedin

    Submitclear

    Tiatra, LLC
    Copyright 2016. All rights reserved.