Skip to content
Tiatra, LLCTiatra, LLC
Tiatra, LLC
Information Technology Solutions for Washington, DC Government Agencies
  • Home
  • About Us
  • Services
    • IT Engineering and Support
    • Software Development
    • Information Assurance and Testing
    • Project and Program Management
  • Clients & Partners
  • Careers
  • News
  • Contact
 
  • Home
  • About Us
  • Services
    • IT Engineering and Support
    • Software Development
    • Information Assurance and Testing
    • Project and Program Management
  • Clients & Partners
  • Careers
  • News
  • Contact

Data lakehouses give enterprises analytics edge

For enterprises looking to wrest the most value from their data, especially in real-time, the “data lakehouse” concept is starting to catch on.

The idea behind the data lakehouse is to merge together the best of what data lakes and data warehouses have to offer, says Gartner analyst Adam Ronthal.

Data warehouses, for their part, enable companies to store large amounts of structured data with well-defined schemas. They are designed to support a large number of simultaneous queries and to deliver the results quickly to many simultaneous users.

Data lakes, on the other hand, enable companies to collect raw, unstructured data in many formats for data analysts to hunt through. These vast pools of data have grown in prominence of late thanks to the flexibility they provide enterprises to store vast streams of data without first having to define the purpose of doing so.  

The market for these two types of big data repositories is “converging in the middle, at the lakehouse concept,” Ronthal says, with established data warehouse vendors adding the ability to manage unstructured data, and data lake vendors adding structure to their offerings.

For example, on AWS, enterprises can now pair Amazon Redshift, a data warehouse, with Amazon Redshift Spectrum, which enables Redshift to reach into Amazon’s unstructured S3 data lakes. Meanwhile, data lake Snowflake can now support unstructured data with external tables, Ronthal says.

When companies have separate lakes and warehouses, and data needs to move from one to the other, it introduces latency and costs time and money, Ronthal adds. Combining the two in one platform reduces effort and data movement, thereby accelerating the pace of uncovering data insights.

And, depending on the platform, a data lakehouse can also offer other features, such as support for data streaming, machine learning, and collaboration, giving enterprises additional tools for making the most of their data.

Here is a look at at the benefits of data lakehouses and how several leading organizations are making good on their promise as part of their analytics strategies.

Enhancing the video game experience

Sega Europe’s use of data repositories in support of its video games has evolved considerably in the past several years.

In 2016, the company began using the Amazon Redshift data warehouse to collect event data from its Football Manager video game. At first this event data consisted simply of players opening and closing games. The company had two staff members looking into this data, which streamed into Redshift at a rate of ten events per second.

“But there was so much more data we could be collecting,” says Felix Baker, the company’s head of data services. “Like what teams people were managing, or how much money they were spending.”

By 2017, Sega Europe was collecting 800 events a second, with five staff working on the platform. By 2020, the company’s system was capturing 7,000 events per second from a portfolio of 30 Sega games, with 25 staff involved.

At that point, the system was starting to hit its limits, Baker says. Because of the data structures needed for inclusion in the data warehouse, data was coming in batches and it took half an hour to an hour to analyze it, he says.

“We wanted to analyze the data in real-time,” he adds, but this functionality wasn’t available in Redshift at the time.

After performing proofs of concept with three platforms — Redshift, Snowflake, and Databricks — Sega Europe settled on using Databricks, one of the pioneers of the data lakehouse industry.

“Databricks offered an out-of-the-box managed services solution that did what we needed without us having to develop anything,” he says. That included not just real-time streaming but machine learning and collaborative workspaces.

In addition, the data lakehouse architecture enabled Sega Europe to ingest unstructured data, such as social media feeds, as well.

“With Redshift, we had to concentrate on schema design,” Baker says. “Every table had to have a set structure before we could start ingesting data. That made it clunky in many ways. With the data lakehouse, it’s been easier.”

Sega Europe’s Databricks platform went live into production in the summer of 2020. Two or three consultants from Databricks worked alongside six or seven people from Sega Europe to get the streaming solution up and running, matching what the company had in place previously with Redshift. The new lakehouse is built in three layers, the base layer of which is just one large table that everything gets dumped into.

“If developers create new events, they don’t have to tell us to expect new fields — they can literally send us everything,” Baker says. “And we can then build jobs on top of that layer and stream out the data we acquired.”

The transition to Databricks, which is built on top of Apache Spark, was smooth for Sega Europe, thanks to prior experience with the open-source engine for large-scale data processing.

“Within our team, we had quite a bit of expertise already with Apache Spark,” Baker says. “That meant that we could set up streams very quickly based on the skills we already had.”

Today, the company processes 25,000 events per second, with more than 30 data staffers and 100 game titles in the system. Instead of taking 30 minutes to an hour to process, the data is ready within a minute.

“The volume of data collected has grown exponentially,” Baker says. In fact, after the pandemic hit, usage of some games doubled.

The new platform has also opened up new possibilities. For example, Sega Europe’s partnership with Twitch, a streaming platform where people watch other people play video games, has been enhanced to include a data stream for its Humankind game, so that viewers can get a player’s history, including the levels they completed, the battles they won, and the civilizations they conquered.

“The overlay on Twitch is updating as they play the game,” Baker says. “That is a use case that we wouldn’t have been able to achieve before Databricks.”

The company has also begun leveraging the lakehouse’s machine learning capabilities. For example, Sega Europe data scientists have designed models to figure out why players stop playing games and to make suggestions for how to increase retention.

“The speed at which these models can be built has been amazing, really,” Baker says. “They’re just cranking out these models, it seems, every couple of weeks.”

The business benefits of data lakehouses

The flexibility and catch-all nature of data lakehouses is fast proving attractive to organizations looking to capitalize on their data assets, especially as part of digital initiatives that hinge quick access to a wide array of data.

“The primary value driver is the cost efficiencies enabled by providing a source for all of an organization’s structured and unstructured data,” says Steven Karan, vice president and head of insights and data at consulting company Capgemini Canada, which has helped implement data lakehouses at leading organizations in financial services, telecom, and retail.

Moreover, data lakehouses store data in such a way that it is readily available for use by a wide array of technologies, from traditional business intelligence and reporting systems to machine learning and artificial Intelligence, Karan adds. “Other benefits include reduced data redundancy, simplified IT operations, a simplified data schema to manage, and easier to enable data governance.”

One particularly valuable use case for data lakehouses is in helping companies get value from data previously trapped in legacy or siloed systems. For example, one Capgemini enterprise customer, which had grown through acquisitions over a decade, couldn’t access valuable data related to resellers of their products.

“By migrating the siloed data from legacy data warehouses into a centralized data lakehouse, the client was able to understand at an enterprise level which of their reseller partners were most effective, and how changes such as referral programs and structures drove revenue,” he says.

Putting data into a single data lakehouse makes it easier to manage, says Meera Viswanathan, senior product manager at Fivetran, a data pipeline company. Companies that have traditionally used both data lakes and data warehouses often have separate teams to manage them, making it confusing for the business units that needed to consume the data, she says.

In addition to Databricks, Amazon Redshift Spectrum, and Snowflake, other vendors in the data lakehouse space include Microsoft, with its lakehouse platform Azure Synapse, and Google, with its BigLake on Google Cloud Platform, as well as data lakehouse platform Starburst.

Accelerating data processing for better health outcomes

One company capitalizing on these and other benefits of data lakehouses is life sciences analytics and services company IQVIA.

Before the pandemic, pharmaceutical companies running drug trials used to send employees to hospitals and other sites to collect data about things such adverse effects, says Wendy Morahan, senior director of clinical data analytics at IQVIA. “That is how they make sure the patient is safe.”

Once the pandemic hit and sites were locked down, however, pharmaceutical companies had to scramble to figure out how to get the data they needed — and to get it in a way that was compliant with regulations and fast enough to enable them to spot potential problems as quickly as possible.

Moreover, with the rise of wearable devices in healthcare, “you’re now collecting hundreds of thousands of data points,” Morahan adds.

IQVIA has been building technology to do just that for the past 20 years, says her colleague Suhas Joshi, also a senior director of clinical data analytics at the company. About four years ago, the company began using data lakehouses for this purpose, including Databricks and the data lakehouse functionality now available with Snowflake.

“With Snowflake and Databricks you have the ability to store the raw data, in any format,” Joshi says. “We get a lot of images and audio. We get all this data and use it for monitoring. In the past, it would have involved manual steps, going to different systems. It would have taken time and effort. Today, we’re able to do it all in one single platform.”

The data collection process is also faster, he says. In the past, the company would have to write code to acquire data. Now, the data can even be analyzed without having to be processed first to fit a database format.

Take the example of a patient in a drug trial who gets a lab result that shows she’s pregnant, but the pregnancy form wasn’t filled out properly, and the drug is harmful during pregnancy. Or a patient who has an adverse event and needs blood pressure medication, but the medication was not prescribed. Not catching these problems quickly can have drastic consequences. “You might be risking a patient’s safety,” says Joshi.

Analytics, Data Architecture, Data Management


Read More from This Article: Data lakehouses give enterprises analytics edge
Source: News

Category: NewsJuly 6, 2022
Tags: art

Post navigation

PreviousPrevious post:CIO Leadership Live with Toby Granwal, Global CIO at FonterraNextNext post:Why CIOs are increasingly vital to African supply chains

Related posts

Salesforce expands beyond the front office with Agentforce Operations
April 29, 2026
Designing the AI-native cloud: What enterprise architects are learning the hard way
April 29, 2026
Incentive drift: Why transformation fails even when everything looks green
April 29, 2026
Oracle NetSuite announces AI coding skills for SuiteCloud developers
April 29, 2026
Your AI agent is ready to go. Is your infrastructure?
April 29, 2026
Why I, the CEO, am personally building our AI strategy
April 29, 2026
Recent Posts
  • Salesforce expands beyond the front office with Agentforce Operations
  • Designing the AI-native cloud: What enterprise architects are learning the hard way
  • Incentive drift: Why transformation fails even when everything looks green
  • Oracle NetSuite announces AI coding skills for SuiteCloud developers
  • Why I, the CEO, am personally building our AI strategy
Recent Comments
    Archives
    • April 2026
    • March 2026
    • February 2026
    • January 2026
    • December 2025
    • November 2025
    • October 2025
    • September 2025
    • August 2025
    • July 2025
    • June 2025
    • May 2025
    • April 2025
    • March 2025
    • February 2025
    • January 2025
    • December 2024
    • November 2024
    • October 2024
    • September 2024
    • August 2024
    • July 2024
    • June 2024
    • May 2024
    • April 2024
    • March 2024
    • February 2024
    • January 2024
    • December 2023
    • November 2023
    • October 2023
    • September 2023
    • August 2023
    • July 2023
    • June 2023
    • May 2023
    • April 2023
    • March 2023
    • February 2023
    • January 2023
    • December 2022
    • November 2022
    • October 2022
    • September 2022
    • August 2022
    • July 2022
    • June 2022
    • May 2022
    • April 2022
    • March 2022
    • February 2022
    • January 2022
    • December 2021
    • November 2021
    • October 2021
    • September 2021
    • August 2021
    • July 2021
    • June 2021
    • May 2021
    • April 2021
    • March 2021
    • February 2021
    • January 2021
    • December 2020
    • November 2020
    • October 2020
    • September 2020
    • August 2020
    • July 2020
    • June 2020
    • May 2020
    • April 2020
    • January 2020
    • December 2019
    • November 2019
    • October 2019
    • September 2019
    • August 2019
    • July 2019
    • June 2019
    • May 2019
    • April 2019
    • March 2019
    • February 2019
    • January 2019
    • December 2018
    • November 2018
    • October 2018
    • September 2018
    • August 2018
    • July 2018
    • June 2018
    • May 2018
    • April 2018
    • March 2018
    • February 2018
    • January 2018
    • December 2017
    • November 2017
    • October 2017
    • September 2017
    • August 2017
    • July 2017
    • June 2017
    • May 2017
    • April 2017
    • March 2017
    • February 2017
    • January 2017
    Categories
    • News
    Meta
    • Log in
    • Entries feed
    • Comments feed
    • WordPress.org
    Tiatra LLC.

    Tiatra, LLC, based in the Washington, DC metropolitan area, proudly serves federal government agencies, organizations that work with the government and other commercial businesses and organizations. Tiatra specializes in a broad range of information technology (IT) development and management services incorporating solid engineering, attention to client needs, and meeting or exceeding any security parameters required. Our small yet innovative company is structured with a full complement of the necessary technical experts, working with hands-on management, to provide a high level of service and competitive pricing for your systems and engineering requirements.

    Find us on:

    FacebookTwitterLinkedin

    Submitclear

    Tiatra, LLC
    Copyright 2016. All rights reserved.