The need for a central data repository for enterprise analytics and gen AI has made the data lakehouse the default choice for enterprise data. Meanwhile, the emergence of open table standards makes the shift easier and reduces vendor lock-in for enterprises while also allowing for better integration between lakehouses and other enterprise systems and service providers.
Data lakehouses combine the structure of data warehouses with the flexibility of data lakes, making them versatile tools to make the most of any data the enterprise collects, whether it’s for business analytics, integration with other systems, or providing relevant context to LLMs.
The idea behind the data lakehouse is to merge together the best of what data lakes and data warehouses have to offer, says Gartner analyst Adam Ronthal.
Data warehouses also enable companies to store large amounts of structured data with well-defined schemas, as they’re designed to support a large number of simultaneous queries and deliver results quickly to many simultaneous users.
Data lakes, on the other hand, enable companies to collect raw, unstructured data in many formats for data analysts to hunt through. These vast pools of data have recently grown in prominence thanks to the flexibility they provide enterprises to store massive data streams without first having to define the purpose of doing so.
According to Gartner, data lakehouses are the next step in the evolution of data architectures, merging these two capabilities into a single platform to overcome limitations of previous architectures, reducing complexity, streamlining data management, and supporting diverse workloads.
In late 2025, Gartner also released the first market guide for data lakehouse platforms. “The lakehouse is now firmly established as the architecture that most organizations will seek to standardize on,” wrote Ronthal and his co-authors in the report.
Meanwhile, data lakehouses themselves are also standardizing on the Apache Iceberg data table format, first created by Netflix in 2017, and donated to the Apache Software Foundation the following year. It hit the tipping point in 2024 with adoption by companies like Apple, LinkedIn, Adobe, and all the major cloud vendors. Even Databricks, which created the competing Delta Lake standard, now supports Iceberg natively.
Lakehouse vendors are opening their architecture more to allow better access to the data by third parties, says Gerry Szatvanyi, chief AI officer at consulting firm OSF Digital. “That wasn’t the case a few years ago,” he says.
And other enterprise service providers have been taking advantage of this, he says. For example, Salesforce Data Cloud can connect directly to Iceberg-formatted data.
“Salesforce has a Zero Copy access format, so it can connect its own data platform to another data lake without copying data into Salesforce,” says Szatvanyi.
And of course, as with everything else in the enterprise today, gen AI is having a big effect. Data lakehouses are particularly good for LLMs because they can provide critical business context for RAG embeddings and MCP access, the two most common ways to feed data into LLMs.
“Lakehouses are being accessed more by AI agents,” Szatvanyi says. “It’s the main thing I see happening.”
Even traditional business analytics is now increasingly handled via AI interfaces, he adds, democratizing user access to enterprise data.
In a recent IDC report, the leading vendors in the data platform space are Databricks, Google, Oracle, and Snowflake, with other major players including Microsoft, IBM, and Cloudera. IDC also listed Amazon SageMaker as a lakehouse platform to watch, but it only became widely available in early 2025, so wasn’t yet included among the top vendors.
Gartner includes Databricks, Google, Oracle, Snowflake, Microsoft, IBM, Cloudera and Amazon SageMaker on its list of representative vendors for data lakehouse platforms, among other firms.
The business benefits of data lakehouses
Docusign opted to go with Snowflake for the data platform used to train an internal agent for sales, and is training its ML models in order to serve customers more accurately. Information is pulled from Salesforce, and they’re also exploring Atlassian and ServiceNow, as well as other internal custom tools.
The information also goes out to LLMs using RAG embedding pipelines, and MCP connectivity is also being explored as the technology matures.
Other companies use data lakehouses for the flexibility of the data sources it supports and the volume of data they can handle.
Sega Europe, for example, began using the Amazon Redshift data warehouse to collect event data from its Football Manager video game back in 2016. At first this event, data consisted simply of players opening and closing games.
“But there was so much more data we could collect,” says Felix Baker, the company’s head of data services. “Like what teams people were managing, or how much money they were spending.”
Because of the data structures needed for inclusion in the data warehouse, data was coming in batches and it took too much time to analyze.
“We wanted to analyze the data in real-time,” Baker adds, but this functionality wasn’t available in Redshift at the time. “Databricks offered an out-of-the-box managed services solution that did what we needed without us having to develop anything,” he adds. In addition, the data lakehouse architecture enabled Sega Europe to ingest unstructured data, such as social media feeds.
The cost efficiencies enabled by providing a source for all of an organization’s structured and unstructured data is a value driver for data lakehouses, says Steven Karan, AI transformation lead at Capgemini, and it’s helped implement data lakehouses at leading organizations in financial services, telecom, and retail.
Moreover, data lakehouses store data in a way that it’s readily available for use by a wide array of technologies, from traditional business intelligence and reporting systems to ML and AI. “Other benefits include reduced data redundancy, simplified IT operations, a simplified data schema to manage, and easier to enable data governance,” Karan says.
Helping data emerge
One particularly valuable use case for data lakehouses is in helping companies get value from data previously trapped in legacy or siloed systems. For example, one Capgemini enterprise customer, which had grown through acquisitions over a decade, couldn’t access data related to resellers of their products.
“By migrating the siloed data from legacy data warehouses into a centralized data lakehouse, the client could understand at an enterprise level which of their reseller partners were most effective, and how changes such as referral programs and structures drove revenue,” he says.
One company capitalizing on the benefits of data lakehouses is life sciences, analytics, and services company IQVIA, which began using data lakehouses several years ago.
Before the pandemic, pharmaceutical companies running drug trials used to send employees to hospitals and other sites to collect data about things such as adverse effects, says Wendy Morahan, senior director of product management for clinical data analytics at IQVIA. “That’s how they make sure the patient is safe.”
Once the pandemic hit and sites were locked down, however, pharmaceutical companies had to scramble to figure out how to get the data they needed — and to get it in a way that was compliant with regulations, and fast enough to enable them to quickly spot potential problems.
Snowflake and Databricks gave the company the ability to store the raw data in any format, including images and audio, all in a single platform.
Lakehouse adoption growth
In a February report from Research and Markets, the data lakehouse market has been growing exponentially.
In 2025, it totaled $10.3 billion and is predicted to hit $12.6 billion by the end of this year, a compound growth rate of 22%. By 2030, this will be up to $27.3 billion, the research firm projects.
And according to a recent survey from Dremio, a lakehouse vendor, 63% of companies run most analytics on a lakehouse rather than a traditional warehouse, up from 55% in 2024.
Data lakehouses are also increasingly being used for IT and security workloads, says Ed Bailey, field CISO at telemetry vendor Cribl. “Previously, lakehouse providers were the realm of business data with a focus on structured data and SQL.”
Lakehouses can handle IT and security data at a lower cost, and this is a critical issue given the volumes of data in this space. “Even mid-sized companies produce much more IT and security data than business data,” he says. “Lakehouse vendors are finally starting to push into the market.”
But it’s still early, and initial solutions are immature and an awkward fit for this kind of data. “IT and security data are very different from business data,” he adds. For example, business data tends to be more predictable and well-structured. Plus, business users are more familiar with using data analytics tools than IT and security users. “This mismatch has been a serious obstacle to adoption,” he says.
Data lakehouses are also evolving in another way. Gartner says the lakehouse isn’t the ultimate solution but a transitional architecture on the way to more advanced systems, such as data fabric. The difference between the two is data lakehouses contain data from disparate systems, while data fabrics simply contain pointers to where the data is natively located.
An advantage of data fabrics is that the original security access controls and metadata are preserved and used, there’s no duplication of data, and no need to reconcile disparate standards.
“But it comes with some performance issues, and access isn’t that seamless,” says OSF Digital’s Szatvanyi.
A data fabric might be a good place for smaller companies to start, he says, or you can use both. “You can have a big chunk of data in a lakehouse and have a fabric for two or three secondary systems,” he says. “But I’d say the data lakehouse is the gold standard.”
Read More from This Article: Data lakehouses now a backbone for enterprise analytics and AI
Source: News

