Skip to content
Tiatra, LLCTiatra, LLC
Tiatra, LLC
Information Technology Solutions for Washington, DC Government Agencies
  • Home
  • About Us
  • Services
    • IT Engineering and Support
    • Software Development
    • Information Assurance and Testing
    • Project and Program Management
  • Clients & Partners
  • Careers
  • News
  • Contact
 
  • Home
  • About Us
  • Services
    • IT Engineering and Support
    • Software Development
    • Information Assurance and Testing
    • Project and Program Management
  • Clients & Partners
  • Careers
  • News
  • Contact

Moving Enterprise Data From Anywhere to Any System Made Easy

Since 2015, the Cloudera DataFlow team has been helping the largest enterprise organizations in the world adopt Apache NiFi as their enterprise standard data movement tool. Over the last few years, we have had a front-row seat in our customers’ hybrid cloud journey as they expand their data estate across the edge, on-premise, and multiple cloud providers. This unique perspective of helping customers move data as they traverse the hybrid cloud path has afforded Cloudera a clear line of sight to the critical requirements that are emerging as customers adopt a modern hybrid data stack. 

One of the critical requirements that has materialized is the need for companies to take control of their data flows from origination through all points of consumption both on-premise and in the cloud in a simple, secure, universal, scalable, and cost-effective way. This need has generated a market opportunity for a universal data distribution service.

Over the last two years, the Cloudera DataFlow team has been hard at work building Cloudera DataFlow for the Public Cloud (CDF-PC). CDF-PC is a cloud native universal data distribution service powered by Apache NiFi on Kubernetes, ​​allowing developers to connect to any data source anywhere with any structure, process it, and deliver to any destination.

This blog aims to answer two questions:

  • What is a universal data distribution service?
  • Why does every organization need it when using a modern data stack?

In a recent customer workshop with a large retail data science media company, one of the attendees, an engineering leader, made the following observation:

“Everytime I go to your competitor website, they only care about their system. How to onboard data into their system? I don’t care about their system. I want integration between all my systems. Each system is just one of many that I’m using. That’s why we love that Cloudera uses NiFi and the way it integrates between all systems. It’s one tool looking out for the community and we really appreciate that.”

The above sentiment has been a recurring theme from many of the enterprise organizations the Cloudera DataFlow team has worked with, especially those who are adopting a modern data stack in the cloud. 

What is the modern data stack? Some of the more popular viral blogs and LinkedIn posts describe it as the following:

Ben Patterson/IDG

A few observations on the modern stack diagram:

  1. Note the number of different boxes that are present. In the modern data stack, there is a diverse set of destinations where data needs to be delivered. This presents a unique set of challenges.
  2. The newer “extract/load” tools seem to focus primarily on cloud data sources with schemas. However, based on the 2000+ enterprise customers that Cloudera works with, more than half the data they need to source from is born outside the cloud (on-prem, edge, etc.) and don’t necessarily have schemas.
  3. Numerous “extract/load” tools need to be used to move data across the ecosystem of cloud services. 

We’ll drill into these points further.  

Companies have not treated the collection and distribution of data as a first-class problem

Over the last decade, we have often heard about the proliferation of data creating sources (mobile applications, laptops, sensors, enterprise apps) in heterogeneous environments (cloud, on-prem, edge) resulting in the exponential growth of data being created. What is less frequently mentioned is that during this same time we have also seen a rapid increase of cloud services where data needs to be delivered (data lakes, lakehouses, cloud warehouses, cloud streaming systems, cloud business processes, etc.). Use cases demand that data no longer be distributed to just a data warehouse or subset of data sources, but to a diverse set of hybrid services across cloud providers and on-prem.  

Companies have not treated the collection, distribution, and tracking of data throughout their data estate as a first-class problem requiring a first-class solution. Instead they built or purchased tools for data collection that are confined with a class of sources and destinations. If you take into account the first observation above—that customer source systems are never just limited to cloud structured sources—the problem is further compounded as described in the below diagram:

Unisys

The need for a universal data distribution service

As cloud services continue to proliferate, the current approach of using multiple point solutions becomes intractable. 

A large oil and gas company, who needed to move streaming cyber logs from over 100,000 edge devices to multiple cloud services including Splunk, Microsoft Sentinel, Snowflake, and a data lake, described this need perfectly:

“Controlling the data distribution is critical to providing the freedom and flexibility to deliver the data to different services.”

Every organization on the hybrid cloud journey needs the ability to take control of their data flows from origination through all points of consumption. As I stated in the start of the blog, this need has generated a market opportunity for a universal data distribution service.

P Wei / Getty Images

What are the key capabilities that a data distribution service has to have?

  • Universal Data Connectivity and Application Accessibility: In other words, the service needs to support ingestion in a hybrid world, connecting to any data source anywhere in any cloud with any structure. Hybrid also means supporting ingestion from any data source born outside of the cloud and enabling these applications to easily send data to the distribution service.
  • Universal Indiscriminate Data Delivery: The service should not discriminate where it distributes data, supporting delivery to any destination including data lakes, lakehouses, data meshes, and cloud services.
  • Universal Data Movement Use Cases with Streaming as First-Class Citizen: The service needs to address the entire diversity of data movement use cases: continuous/streaming, batch, event-driven, edge, and microservices. Within this spectrum of use cases, streaming has to be treated as a first-class citizen with the service able to turn any data source into streaming mode and support streaming scale, reinforcing hundreds of thousands of data-generating clients.
  • Universal Developer Accessibility: Data distribution is a data integration problem and all the complexities that come with it. Dumbed down connector wizard–based solutions cannot address the common data integration challenges (e.g: bridging protocols, data formats, routing, filtering, error handling, retries). At the same time, today’s developers demand low-code tooling with extensibility to build these data distribution pipelines.

Cloudera DataFlow for the Public Cloud, a universal data distribution service powered by Apache NiFi

Cloudera DataFlow for the Public Cloud (CDF-PC), a cloud native universal data distribution service powered by Apache NiFi, was built to solve the data collection and distribution problem with the four key capabilities: connectivity and application accessibility, indiscriminate data delivery, streaming data pipelines as a first class citizen, and developer accessibility. 

IDG

CDF-PC offers a flow-based low-code development paradigm that provides the best impedance match with how developers design, develop, and test data distribution pipelines. With over 400+ connectors and processors across the ecosystem of hybrid cloud services including data lakes, lakehouses, cloud warehouses, and sources born outside the cloud, CDF-PC provides indiscriminate data distribution. These data distribution flows can then be version controlled into a catalog where operators can self-serve deployments to different runtimes including cloud providers’ kubernetes services or function services (FaaS). 

Organizations use CDF-PC for diverse data distribution use cases ranging from cyber security analytics and SIEM optimization via streaming data collection from hundreds of thousands of edge devices, to self-service analytics workspace provisioning and hydrating data into lakehouses (e.g: Databricks, Dremio), to ingesting data into cloud providers’ data lakes backed by their cloud object storage (AWS, Azure, Google Cloud) and cloud warehouses (Snowflake, Redshift, Google BigQuery).

In subsequent blogs, we’ll deep dive into some of these use cases and discuss how they are implemented using CDF-PC. 

Get Started Today

Wherever you are on your hybrid cloud journey, a first class data distribution service is critical for successfully adopting a modern hybrid data stack. Cloudera DataFlow for the Public Cloud (CDF-PC) provides a universal, hybrid, and streaming first data distribution service that enables customers to gain control of their data flows. 

Take our interactive product tour to get an impression of CDF-PC in action or sign up for a free trial.

Data Center


Read More from This Article: Moving Enterprise Data From Anywhere to Any System Made Easy
Source: News

Category: NewsJuly 14, 2022
Tags: art

Post navigation

PreviousPrevious post:Making the World a Better Place with DataNextNext post:Elevate Your Protection Strategy at CSO’s Future of Cybersecurity Summit

Related posts

SAS makes AI governance the centerpiece of its agent strategy
April 29, 2026
The boardroom divide: Why cyber resilience is a cultural asset
April 28, 2026
Samsung Galaxy AI for business: Productivity meets security
April 28, 2026
Startup tackles knowledge graphs to improve AI accuracy
April 28, 2026
AI won’t fix your data problems. Data engineering will
April 28, 2026
The inference bill nobody budgeted for
April 28, 2026
Recent Posts
  • SAS makes AI governance the centerpiece of its agent strategy
  • The boardroom divide: Why cyber resilience is a cultural asset
  • Samsung Galaxy AI for business: Productivity meets security
  • Startup tackles knowledge graphs to improve AI accuracy
  • AI won’t fix your data problems. Data engineering will
Recent Comments
    Archives
    • April 2026
    • March 2026
    • February 2026
    • January 2026
    • December 2025
    • November 2025
    • October 2025
    • September 2025
    • August 2025
    • July 2025
    • June 2025
    • May 2025
    • April 2025
    • March 2025
    • February 2025
    • January 2025
    • December 2024
    • November 2024
    • October 2024
    • September 2024
    • August 2024
    • July 2024
    • June 2024
    • May 2024
    • April 2024
    • March 2024
    • February 2024
    • January 2024
    • December 2023
    • November 2023
    • October 2023
    • September 2023
    • August 2023
    • July 2023
    • June 2023
    • May 2023
    • April 2023
    • March 2023
    • February 2023
    • January 2023
    • December 2022
    • November 2022
    • October 2022
    • September 2022
    • August 2022
    • July 2022
    • June 2022
    • May 2022
    • April 2022
    • March 2022
    • February 2022
    • January 2022
    • December 2021
    • November 2021
    • October 2021
    • September 2021
    • August 2021
    • July 2021
    • June 2021
    • May 2021
    • April 2021
    • March 2021
    • February 2021
    • January 2021
    • December 2020
    • November 2020
    • October 2020
    • September 2020
    • August 2020
    • July 2020
    • June 2020
    • May 2020
    • April 2020
    • January 2020
    • December 2019
    • November 2019
    • October 2019
    • September 2019
    • August 2019
    • July 2019
    • June 2019
    • May 2019
    • April 2019
    • March 2019
    • February 2019
    • January 2019
    • December 2018
    • November 2018
    • October 2018
    • September 2018
    • August 2018
    • July 2018
    • June 2018
    • May 2018
    • April 2018
    • March 2018
    • February 2018
    • January 2018
    • December 2017
    • November 2017
    • October 2017
    • September 2017
    • August 2017
    • July 2017
    • June 2017
    • May 2017
    • April 2017
    • March 2017
    • February 2017
    • January 2017
    Categories
    • News
    Meta
    • Log in
    • Entries feed
    • Comments feed
    • WordPress.org
    Tiatra LLC.

    Tiatra, LLC, based in the Washington, DC metropolitan area, proudly serves federal government agencies, organizations that work with the government and other commercial businesses and organizations. Tiatra specializes in a broad range of information technology (IT) development and management services incorporating solid engineering, attention to client needs, and meeting or exceeding any security parameters required. Our small yet innovative company is structured with a full complement of the necessary technical experts, working with hands-on management, to provide a high level of service and competitive pricing for your systems and engineering requirements.

    Find us on:

    FacebookTwitterLinkedin

    Submitclear

    Tiatra, LLC
    Copyright 2016. All rights reserved.