Skip to content
Tiatra, LLCTiatra, LLC
Tiatra, LLC
Information Technology Solutions for Washington, DC Government Agencies
  • Home
  • About Us
  • Services
    • IT Engineering and Support
    • Software Development
    • Information Assurance and Testing
    • Project and Program Management
  • Clients & Partners
  • Careers
  • News
  • Contact
 
  • Home
  • About Us
  • Services
    • IT Engineering and Support
    • Software Development
    • Information Assurance and Testing
    • Project and Program Management
  • Clients & Partners
  • Careers
  • News
  • Contact

The rise of the data lakehouse: A new era of data value

With 65 million vaccine doses to administer at the height of the COVID-19 pandemic, Luigi Guadagno, CIO of Walgreens, needed to know where to send them. To find out, he queried Walgreens’ data lakehouse, implemented with Databricks technology on Microsoft Azure.

“We leveraged the lakehouse to understand the moment,” the CIO says. For Guadagno, the need to match vaccine availability with patient demand came at the right moment, technologically speaking. The giant pharmaceutical chain had put its lakehouse in place to address just such challenges in its quest, to, as Guadagno puts it, “To get the right product in the right place for the right patient.”

Previously, Walgreens was attempting to perform that task with its data lake but faced two significant obstacles: cost and time. Those challenges are well-known to many organizations as they have sought to obtain analytical knowledge from their vast amounts of data. The result is an emerging paradigm shift in how enterprises surface insights, one that sees them leaning on a new category of technology architected to help organizations maximize the value of their data.

Enter the data lakehouse

Traditionally, organizations have maintained two systems as part of their data strategies: a system of record on which to run their business and a system of insight such as a data warehouse from which to gather business intelligence (BI). With the advent of big data, a second system of insight, the data lake, appeared to serve up artificial intelligence and machine learning (AI/ML) insights. Many organizations, however, are finding this paradigm of relying on two separate systems of insight untenable.

The data warehouse requires a time-consuming extract, transform, and load (ETL) process to move data from the system of record to the data warehouse, whereupon the data would be normalized, queried, and answers obtained. Meanwhile, unstructured data would be dumped into a data lake where it would be subjected to analysis by skilled data scientists using tools such as Python, Apache Spark, and TensorFlow.

Under Guadagno, the Deerfield, Ill.-based Walgreens consolidated its systems of insight into a single data lakehouse. And he’s not alone. An increasing number of companies are finding that lakehouses — which fall into a product category generally known as query accelerators — are meeting a critical need.

“Lakehouses redeem the failures of some data lakes. That’s how we got here. People couldn’t get value from the lake,” says Adam Ronthal, vice president and analyst at Gartner. In the case of the Databricks Delta Lake lakehouse, structured data from a data warehouse is typically added to a data lake. To that, the lakehouse adds layers of optimization to make the data more broadly consumable for gathering insights.

The Databricks Delta Lake lakehouse is but one entry in an increasingly crowded marketplace, that includes such vendors as Snowflake, Starburst, Dremio, GridGain, DataRobot, and perhaps a dozen others, according to Gartner’s Market Guide for Analytics Query Accelerators.

Moonfare, a private equity firm, is transitioning from a PostgreSQL-based data warehouse on AWS to a Dremio data lakehouse on AWS for business intelligence and predictive analytics. When the implementation goes live in the fall of 2022, business users will be able to perform self-service analytics on top of data in AWS S3. Queries will include which marketing campaigns are working best with which customers and which fund managers are performing best. The lakehouse will also help with fraud prevention.

“You can intuitively query the data from the data lake. Users coming from a data warehouse environment shouldn’t care where the data resides,” says Angelo Slawik, data engineer at Moonfare. “What’s super important is that it takes away ETL jobs,” he says, adding, “With Dremio, if the data is in S3, you can query what you want.”

Moonfare selected Dremio in a proof-of-concept runoff with AWS Athena, an interactive query service that enables SQL queries on S3 data. According to Slawik, Dremio proved more capable thanks to very fast performance and a highly functional user interface that allows users to track data lineage visually. Also important was Dremio’s role-based views and access control for security and governance, which help the Berlin, Germany-based company comply with GDPR regulations.

At Paris-based BNP Paribas, scattered data silos were being used for BI by different teams at the giant bank. Emmanuel Wiesenfeld, an independent contractor, re-architected the silos to create a centralized system so business users such as traders could run their own analytics queries across “a single source of truth.”

“Trading teams wanted to collaborate, but data was scattered. Tools for analyzing the data also were scattered, making them costly and difficult to maintain,” says Wiesenfeld. “We wanted to centralize data from lots of data sources to enable real-time situational awareness. Now users can write their own scripts and run them over the data,” he explains.  

Using Apache Ignite technology from GridGain, Wiesenfeld created an in-memory computing architecture. Key to the new approach is moving from ETL to ELT, where transformation is carried out while performing computations in order to streamline the entire process, according to Wiesenfeld, who says the result was to reduce latency from hours to seconds. Wiesenfeld has since launched a startup called Kawa to bring similar solutions to other customers, particularly hedge funds.

Starburst takes a mesh approach, leveraging open-source Trino technology in Starburst Enterprise to improve access to distributed data. Rather than moving data into a central warehouse, the mesh enables access while allowing data to stay where it is. Sophia Genetics is using Starburst Enterprise in its cloud-based bioinformatics SaaS analytics platform. One reason: Keeping sensitive healthcare data within specific countries is important for regulatory reasons. “Due to compliance constraints, we simply can not deploy any system that accesses all data from one central point,” said Alexander Seeholzer, director of data services at Switzerland-based Sophia Genetics in a Starburst case study.

The new query acceleration platforms aren’t standing still. Databricks and Snowflake have introduced data clouds and data lakehouses with features designed for the needs of companies in specific industries such as retail and healthcare. These moves echo the introduction of industry-specific clouds by hyperscalers Microsoft Azure, Google Cloud Platform, and Amazon Web Services.  

The lakehouse as best practice

Gartner’s Ronthal sees the evolution of the data lake to the data lakehouse as an inexorable trend. “We are moving in the direction where the data lakehouse becomes a best practice, but everyone is moving at a different speed,” Ronthal says. “In most cases, the lake was not capable of delivering production needs.”

Despite the eagerness of data lakehouse vendors to subsume the data warehouse into their offerings, Gartner predicts the warehouse will endure. “Analytics query accelerators are unlikely to replace the data warehouse, but they can make the data lake significantly more valuable by enabling performance that meets requirements for both business and technical staff,” concludes its report on the query accelerator market.   

Noel Yuhanna, vice president and principal analyst at Forrester Research, disagrees, asserting the lakehouse will indeed take the place of separate warehouses and lakes.

“We do see the future of warehouses and lakes coming into a lakehouse, where one system is good enough,” Yuhanna says. For organizations with distributed warehouses and lakes, the mesh architecture such as that of Starburst will fill a need, according to Yuhanna, because it enables organizations to implement federated governance across various data locations.

Whatever the approach, Yuhanna says companies are seeking to gain faster time to value from their data. “They don’t want ‘customer 360’ six months from now; they want it next week. We call this ‘fast’ data. As soon as the data is created, you’re running analytics and insights on it,” he says. 

From a system of insight to a system of action

For Guadagno, vaccine distribution was a high-profile, lifesaving initiative, but the Walgreens lakehouse does yeoman work in more mundane but essential retail tasks as well, such as sending out prescription reminders and product coupons. These processes combine an understanding of customer behavior with the availability of pharmaceutical and retail inventory. “It can get very sophisticated, with very personalized insights,” he says. “It allows us to become customer-centric.”

To others embarking on a similar journey Guadagno advises, “Put all your data in the lakehouse as fast as possible. Don’t embark on any lengthy data modeling or rationalization. It’s better to think about creating value. Put it all in there and give everybody access through governance and collaboration. Don’t waste money on integration and ETL.”

At Walgreens, the Databricks lakehouse is about more than simply making technology more efficient. It’s key to its overall business strategy. “We’re on a mission to create a very personalized experience. It starts at the point of retail — what you need and when you need it. That’s ultimately what the data is for,” Guadagno says. “There is no more system of record and system of insight. It’s a system of action.”

Analytics, Data Management


Read More from This Article: The rise of the data lakehouse: A new era of data value
Source: News

Category: NewsAugust 18, 2022
Tags: art

Post navigation

PreviousPrevious post:CIO 100 Conference Celebrates the Top IT leaders in the UK and globallyNextNext post:Global software spending to grow despite headwinds: Forrester

Related posts

휴먼컨설팅그룹, HR 솔루션 ‘휴넬’ 업그레이드 발표
May 9, 2025
Epicor expands AI offerings, launches new green initiative
May 9, 2025
MS도 합류··· 구글의 A2A 프로토콜, AI 에이전트 분야의 공용어 될까?
May 9, 2025
오픈AI, 아시아 4국에 데이터 레지던시 도입··· 한국 기업 데이터는 한국 서버에 저장
May 9, 2025
SAS supercharges Viya platform with AI agents, copilots, and synthetic data tools
May 8, 2025
IBM aims to set industry standard for enterprise AI with ITBench SaaS launch
May 8, 2025
Recent Posts
  • 휴먼컨설팅그룹, HR 솔루션 ‘휴넬’ 업그레이드 발표
  • Epicor expands AI offerings, launches new green initiative
  • MS도 합류··· 구글의 A2A 프로토콜, AI 에이전트 분야의 공용어 될까?
  • 오픈AI, 아시아 4국에 데이터 레지던시 도입··· 한국 기업 데이터는 한국 서버에 저장
  • SAS supercharges Viya platform with AI agents, copilots, and synthetic data tools
Recent Comments
    Archives
    • May 2025
    • April 2025
    • March 2025
    • February 2025
    • January 2025
    • December 2024
    • November 2024
    • October 2024
    • September 2024
    • August 2024
    • July 2024
    • June 2024
    • May 2024
    • April 2024
    • March 2024
    • February 2024
    • January 2024
    • December 2023
    • November 2023
    • October 2023
    • September 2023
    • August 2023
    • July 2023
    • June 2023
    • May 2023
    • April 2023
    • March 2023
    • February 2023
    • January 2023
    • December 2022
    • November 2022
    • October 2022
    • September 2022
    • August 2022
    • July 2022
    • June 2022
    • May 2022
    • April 2022
    • March 2022
    • February 2022
    • January 2022
    • December 2021
    • November 2021
    • October 2021
    • September 2021
    • August 2021
    • July 2021
    • June 2021
    • May 2021
    • April 2021
    • March 2021
    • February 2021
    • January 2021
    • December 2020
    • November 2020
    • October 2020
    • September 2020
    • August 2020
    • July 2020
    • June 2020
    • May 2020
    • April 2020
    • January 2020
    • December 2019
    • November 2019
    • October 2019
    • September 2019
    • August 2019
    • July 2019
    • June 2019
    • May 2019
    • April 2019
    • March 2019
    • February 2019
    • January 2019
    • December 2018
    • November 2018
    • October 2018
    • September 2018
    • August 2018
    • July 2018
    • June 2018
    • May 2018
    • April 2018
    • March 2018
    • February 2018
    • January 2018
    • December 2017
    • November 2017
    • October 2017
    • September 2017
    • August 2017
    • July 2017
    • June 2017
    • May 2017
    • April 2017
    • March 2017
    • February 2017
    • January 2017
    Categories
    • News
    Meta
    • Log in
    • Entries feed
    • Comments feed
    • WordPress.org
    Tiatra LLC.

    Tiatra, LLC, based in the Washington, DC metropolitan area, proudly serves federal government agencies, organizations that work with the government and other commercial businesses and organizations. Tiatra specializes in a broad range of information technology (IT) development and management services incorporating solid engineering, attention to client needs, and meeting or exceeding any security parameters required. Our small yet innovative company is structured with a full complement of the necessary technical experts, working with hands-on management, to provide a high level of service and competitive pricing for your systems and engineering requirements.

    Find us on:

    FacebookTwitterLinkedin

    Submitclear

    Tiatra, LLC
    Copyright 2016. All rights reserved.