Skip to content
Tiatra, LLCTiatra, LLC
Tiatra, LLC
Information Technology Solutions for Washington, DC Government Agencies
  • Home
  • About Us
  • Services
    • IT Engineering and Support
    • Software Development
    • Information Assurance and Testing
    • Project and Program Management
  • Clients & Partners
  • Careers
  • News
  • Contact
 
  • Home
  • About Us
  • Services
    • IT Engineering and Support
    • Software Development
    • Information Assurance and Testing
    • Project and Program Management
  • Clients & Partners
  • Careers
  • News
  • Contact

How FiveStars re-engineered its data engineering stack

Building and managing infrastructure yourself gives you more control — but the effort to keep it all under control can take resources away from innovation in other areas. Matt Doka, CTO of FiveStars, a marketing platform for small businesses, doesn’t like that trade-off and goes out of his way to outsource whatever he can.

It shows in his reluctance to run his own servers but it’s perhaps most obvious in his attitude to data engineering, where he’s nearing the end of a five-year journey to automate or outsource much of the mundane maintenance work and focus internal resources on data analysis.

FiveStars offers small businesses an online loyalty card service — the digital equivalent of “buy nine, get one free” stamp cards — that they can link to their customers’ telephone numbers and payment cards. Over 10,000 small businesses use its services, and Doka estimates around 70 million Americans have opted into loyalty programs it manages. More recently, it has moved into payment processing, an option adopted by around 20% of its clients, and offers its own PCI-compliant payment terminals.

Recording all those interactions generates a prodigious amount of data, but that’s not the half of it. To one-up the legacy payment processors that just drop off a terminal and leave customers to call for support if it stops working, FiveStars builds telemetry systems into its terminals, which regularly report their connection status, battery level and application performance information.

“The bulk of our load isn’t even the transactions, the points or the credit cards themselves,” he says. “It’s the huge amounts of device telemetry data to make sure that when somebody wants to make a payment or earn some points, it’s a best in class experience.”

Figuring that out from the data takes a lot of analysis — work that the 10-person data team had less time for since just maintaining their data infrastructure was eating it all up.

The data team that built the first version of FiveStars’ data infrastructure started on the sales and marketing side of the business, not IT. That historical accident meant that while they really knew their way around data, they had little infrastructure management experience, says Doka.

When Doka took over the team, he discovered they had written everything by hand: server automation code, database queries, the analyses — everything. “They wrote bash scripts!” Doka says. “Even 10 years ago, you had systems that could abstract away bash scripts.”

The system was brittle, highly manual and based on a lot of tribal knowledge. The net effect was that the data analysts spent most of their time just keeping the system running. “They struggled to get new data insights developed into analyses,” he says.

Back in 2019, he adds, everyone’s answer to a problem like that was to use Apache Airflow, an open-source platform for managing data engineering workflows written in and controlled with Python. It was originally developed at AirBnB to perform exactly the kinds of things Doka’s team were still doing by hand.

Doka opted for a hosted version of Airflow to replace FiveStars’ resource-intensive homebrew system. “I wanted to get us out of the business of hosting our own infrastructure because these are data analysts or even data engineers, not experienced SREs,” he says. “It’s not a good use of our time either.”

Adopting Airflow meant Doka could stop worrying about other things besides servers. “There was a huge improvement in standardization and the basics of running things,” he says. “You just inherit all these best practices that we were inventing or reinventing ourselves.”

But, he laments, “How you actually work in Airflow is entirely up to the development team, so you still spend a lot of mind cycles on just structuring every new project.” And a particular gripe of his was that you have to build your own documentation best practices.

So barely a year after beginning the migration to Airflow, Doka found himself looking for something better to help him automate more of his data engineering processes and standardize away some of the less business-critical decisions that took up so much time.

He cast his net wide, but many of the tools he found only addressed part of the problem.

“DBT just focused on how to change the data within a single Snowflake instance, for example,” he says. “It does a really good job of that, but how do you get data into Snowflake from all your sources?” For that, he adds, “there were some platforms that could abstract away all the data movement in a standardized way, like Fivetran, but they didn’t really give you a language to process.”

After checking out several other options, Doka eventually settled on Ascend.io. “I loved the fact there was a standard way to write a SQL query or Python code, and it generates a lineage and a topology,” he says. “The system can automatically know where all the data came from; how it made its way to this final analysis.”

This not only abstracts away the challenge of running servers, but also of deciding how you do work, he says.

“This saves a ton of mental load for data engineers and data analysts,” he says. “They’re able to focus entirely on the question they’re trying to answer and the analysis they’re trying to do.”

Not only is it easier for analysts to focus on their own work, it’s also easier for them to follow one another’s, he adds.

“There’s all this documentation that was just built in by design where, without thinking about it, each analyst left a clear trail of crumbs as to how they got to where they are,” he says. “So if new people join the project, it’s easier to see what’s going on.”

Ascend uses another Apache project, Spark, as its analytics engine, and it has its own Python API, PySpark.

Migrating the first few core use cases from Airflow took less than a month. “It took an hour to turn on, and two minutes to hook up Postgres and some of our data sources,” Doka says. “That was very fast.”

Replicating some of the workflows was as easy as copying the underlying SQL from Airflow to Ascend. “Once we had it working at parity, we would just turn the [old] flow off and put the [new] output connector where it needed to go,” he says.

The most helpful thing about Ascend was it would run code changes so quickly so the team could develop and fix things in real time. “The system can be aware of where pieces in the workflow have changed or not, and it doesn’t rerun everything if nothing’s changed, so you’re not wasting compute,” he says. “That was a really nice speed up.”

Some things still involved an overnight wait, though. “There’s an upstream service you can only download from between 2 a.m. and 5 a.m., so getting that code just right, to make sure it was downloading at the right time of day, was a pain but it wasn’t necessarily Ascend’s fault,” he says.

Mobilizing a culture shift

The move to Ascend didn’t lead to any major retraining or hiring needs either. “Building is pretty much zero now that we have everything abstracted,” Doka says, and there are now three people running jobs on top of the new systems, and around six analysts doing reporting and generating insights from the data.

“Most of the infrastructure work is gone,” he adds. “There’s still some ETL work, the transforming and cleansing that never goes away, but now it’s done in a standardized way. One thing that took time to digest, though, was that shift from what I call vanilla Python used with Airflow to Spark Python. It feels different than just writing procedural code.” It’s not esoteric knowledge, just something the FiveStars team hadn’t used before and needed to familiarize themselves with.

A recurring theme in Doka’s data engineering journey has been looking for new things he can stop building and buy instead.

“When you build, own, and run a piece of infrastructure in house, you have a greater level of control and knowledge,” he says. “But often you sacrifice a ton of time for it, and in many cases don’t have the best expertise to develop it.”

Convincing his colleagues of the advantages of doing less wasn’t easy. “I struggled with the team in both eras,” he says. “That’s always part of a transition to any more abstracted system.”

Doka says he’s worked with several startups as an investor or an advisor, and always tells technically minded founders to avoid running infrastructure themselves and pick a best-in-class vendor to host things for them — and not just because it saves time. “You’re also going to learn best practices much better working with them,” he says. He offers enterprise IT leaders the same advice when dealing with internal teams. “The most consistent thing I’ve seen across 11 years as a CTO is that gravity just pulls people to ‘build it here’ for some reason,” he says. “I never understood it.” It’s something that has to be continually resisted or wind up wasting time maintaining things that aren’t part of the core business.

CIO, Data Engineering, IT Leadership


Read More from This Article: How FiveStars re-engineered its data engineering stack
Source: News

Category: NewsJanuary 17, 2023
Tags: art

Post navigation

PreviousPrevious post:COP27: successes, failures, and next stepsNextNext post:How IT leads can build a high-performing environment for sales teams

Related posts

Barb Wixom and MIT CISR on managing data like a product
May 30, 2025
Avery Dennison takes culture-first approach to AI transformation
May 30, 2025
The agentic AI assist Stanford University cancer care staff needed
May 30, 2025
Los desafíos de la era de la ‘IA en todas partes’, a fondo en Data & AI Summit 2025
May 30, 2025
“AI 비서가 팀 단위로 지원하는 효과”···퍼플렉시티, AI 프로젝트 10분 완성 도구 ‘랩스’ 출시
May 30, 2025
“ROI는 어디에?” AI 도입을 재고하게 만드는 실패 사례
May 30, 2025
Recent Posts
  • Barb Wixom and MIT CISR on managing data like a product
  • Avery Dennison takes culture-first approach to AI transformation
  • The agentic AI assist Stanford University cancer care staff needed
  • Los desafíos de la era de la ‘IA en todas partes’, a fondo en Data & AI Summit 2025
  • “AI 비서가 팀 단위로 지원하는 효과”···퍼플렉시티, AI 프로젝트 10분 완성 도구 ‘랩스’ 출시
Recent Comments
    Archives
    • May 2025
    • April 2025
    • March 2025
    • February 2025
    • January 2025
    • December 2024
    • November 2024
    • October 2024
    • September 2024
    • August 2024
    • July 2024
    • June 2024
    • May 2024
    • April 2024
    • March 2024
    • February 2024
    • January 2024
    • December 2023
    • November 2023
    • October 2023
    • September 2023
    • August 2023
    • July 2023
    • June 2023
    • May 2023
    • April 2023
    • March 2023
    • February 2023
    • January 2023
    • December 2022
    • November 2022
    • October 2022
    • September 2022
    • August 2022
    • July 2022
    • June 2022
    • May 2022
    • April 2022
    • March 2022
    • February 2022
    • January 2022
    • December 2021
    • November 2021
    • October 2021
    • September 2021
    • August 2021
    • July 2021
    • June 2021
    • May 2021
    • April 2021
    • March 2021
    • February 2021
    • January 2021
    • December 2020
    • November 2020
    • October 2020
    • September 2020
    • August 2020
    • July 2020
    • June 2020
    • May 2020
    • April 2020
    • January 2020
    • December 2019
    • November 2019
    • October 2019
    • September 2019
    • August 2019
    • July 2019
    • June 2019
    • May 2019
    • April 2019
    • March 2019
    • February 2019
    • January 2019
    • December 2018
    • November 2018
    • October 2018
    • September 2018
    • August 2018
    • July 2018
    • June 2018
    • May 2018
    • April 2018
    • March 2018
    • February 2018
    • January 2018
    • December 2017
    • November 2017
    • October 2017
    • September 2017
    • August 2017
    • July 2017
    • June 2017
    • May 2017
    • April 2017
    • March 2017
    • February 2017
    • January 2017
    Categories
    • News
    Meta
    • Log in
    • Entries feed
    • Comments feed
    • WordPress.org
    Tiatra LLC.

    Tiatra, LLC, based in the Washington, DC metropolitan area, proudly serves federal government agencies, organizations that work with the government and other commercial businesses and organizations. Tiatra specializes in a broad range of information technology (IT) development and management services incorporating solid engineering, attention to client needs, and meeting or exceeding any security parameters required. Our small yet innovative company is structured with a full complement of the necessary technical experts, working with hands-on management, to provide a high level of service and competitive pricing for your systems and engineering requirements.

    Find us on:

    FacebookTwitterLinkedin

    Submitclear

    Tiatra, LLC
    Copyright 2016. All rights reserved.