How FiveStars re-engineered its data engineering stack

Building and managing infrastructure yourself gives you more control — but the effort to keep it all under control can take resources away from innovation in other areas. Matt Doka, CTO of FiveStars, a marketing platform for small businesses, doesn’t like that trade-off and goes out of his way to outsource whatever he can.

It shows in his reluctance to run his own servers but it’s perhaps most obvious in his attitude to data engineering, where he’s nearing the end of a five-year journey to automate or outsource much of the mundane maintenance work and focus internal resources on data analysis.

FiveStars offers small businesses an online loyalty card service — the digital equivalent of “buy nine, get one free” stamp cards — that they can link to their customers’ telephone numbers and payment cards. Over 10,000 small businesses use its services, and Doka estimates around 70 million Americans have opted into loyalty programs it manages. More recently, it has moved into payment processing, an option adopted by around 20% of its clients, and offers its own PCI-compliant payment terminals.

Recording all those interactions generates a prodigious amount of data, but that’s not the half of it. To one-up the legacy payment processors that just drop off a terminal and leave customers to call for support if it stops working, FiveStars builds telemetry systems into its terminals, which regularly report their connection status, battery level and application performance information.

“The bulk of our load isn’t even the transactions, the points or the credit cards themselves,” he says. “It’s the huge amounts of device telemetry data to make sure that when somebody wants to make a payment or earn some points, it’s a best in class experience.”

Figuring that out from the data takes a lot of analysis — work that the 10-person data team had less time for since just maintaining their data infrastructure was eating it all up.

The data team that built the first version of FiveStars’ data infrastructure started on the sales and marketing side of the business, not IT. That historical accident meant that while they really knew their way around data, they had little infrastructure management experience, says Doka.

When Doka took over the team, he discovered they had written everything by hand: server automation code, database queries, the analyses — everything. “They wrote bash scripts!” Doka says. “Even 10 years ago, you had systems that could abstract away bash scripts.”

The system was brittle, highly manual and based on a lot of tribal knowledge. The net effect was that the data analysts spent most of their time just keeping the system running. “They struggled to get new data insights developed into analyses,” he says.

Back in 2019, he adds, everyone’s answer to a problem like that was to use Apache Airflow, an open-source platform for managing data engineering workflows written in and controlled with Python. It was originally developed at AirBnB to perform exactly the kinds of things Doka’s team were still doing by hand.

Doka opted for a hosted version of Airflow to replace FiveStars’ resource-intensive homebrew system. “I wanted to get us out of the business of hosting our own infrastructure because these are data analysts or even data engineers, not experienced SREs,” he says. “It’s not a good use of our time either.”

Adopting Airflow meant Doka could stop worrying about other things besides servers. “There was a huge improvement in standardization and the basics of running things,” he says. “You just inherit all these best practices that we were inventing or reinventing ourselves.”

But, he laments, “How you actually work in Airflow is entirely up to the development team, so you still spend a lot of mind cycles on just structuring every new project.” And a particular gripe of his was that you have to build your own documentation best practices.

So barely a year after beginning the migration to Airflow, Doka found himself looking for something better to help him automate more of his data engineering processes and standardize away some of the less business-critical decisions that took up so much time.

He cast his net wide, but many of the tools he found only addressed part of the problem.

“DBT just focused on how to change the data within a single Snowflake instance, for example,” he says. “It does a really good job of that, but how do you get data into Snowflake from all your sources?” For that, he adds, “there were some platforms that could abstract away all the data movement in a standardized way, like Fivetran, but they didn’t really give you a language to process.”

After checking out several other options, Doka eventually settled on Ascend.io. “I loved the fact there was a standard way to write a SQL query or Python code, and it generates a lineage and a topology,” he says. “The system can automatically know where all the data came from; how it made its way to this final analysis.”

This not only abstracts away the challenge of running servers, but also of deciding how you do work, he says.

“This saves a ton of mental load for data engineers and data analysts,” he says. “They’re able to focus entirely on the question they’re trying to answer and the analysis they’re trying to do.”

Not only is it easier for analysts to focus on their own work, it’s also easier for them to follow one another’s, he adds.

“There’s all this documentation that was just built in by design where, without thinking about it, each analyst left a clear trail of crumbs as to how they got to where they are,” he says. “So if new people join the project, it’s easier to see what’s going on.”

Ascend uses another Apache project, Spark, as its analytics engine, and it has its own Python API, PySpark.

Migrating the first few core use cases from Airflow took less than a month. “It took an hour to turn on, and two minutes to hook up Postgres and some of our data sources,” Doka says. “That was very fast.”

Replicating some of the workflows was as easy as copying the underlying SQL from Airflow to Ascend. “Once we had it working at parity, we would just turn the [old] flow off and put the [new] output connector where it needed to go,” he says.

The most helpful thing about Ascend was it would run code changes so quickly so the team could develop and fix things in real time. “The system can be aware of where pieces in the workflow have changed or not, and it doesn’t rerun everything if nothing’s changed, so you’re not wasting compute,” he says. “That was a really nice speed up.”

Some things still involved an overnight wait, though. “There’s an upstream service you can only download from between 2 a.m. and 5 a.m., so getting that code just right, to make sure it was downloading at the right time of day, was a pain but it wasn’t necessarily Ascend’s fault,” he says.

Mobilizing a culture shift

The move to Ascend didn’t lead to any major retraining or hiring needs either. “Building is pretty much zero now that we have everything abstracted,” Doka says, and there are now three people running jobs on top of the new systems, and around six analysts doing reporting and generating insights from the data.

“Most of the infrastructure work is gone,” he adds. “There’s still some ETL work, the transforming and cleansing that never goes away, but now it’s done in a standardized way. One thing that took time to digest, though, was that shift from what I call vanilla Python used with Airflow to Spark Python. It feels different than just writing procedural code.” It’s not esoteric knowledge, just something the FiveStars team hadn’t used before and needed to familiarize themselves with.

A recurring theme in Doka’s data engineering journey has been looking for new things he can stop building and buy instead.

“When you build, own, and run a piece of infrastructure in house, you have a greater level of control and knowledge,” he says. “But often you sacrifice a ton of time for it, and in many cases don’t have the best expertise to develop it.”

Convincing his colleagues of the advantages of doing less wasn’t easy. “I struggled with the team in both eras,” he says. “That’s always part of a transition to any more abstracted system.”

Doka says he’s worked with several startups as an investor or an advisor, and always tells technically minded founders to avoid running infrastructure themselves and pick a best-in-class vendor to host things for them — and not just because it saves time. “You’re also going to learn best practices much better working with them,” he says. He offers enterprise IT leaders the same advice when dealing with internal teams. “The most consistent thing I’ve seen across 11 years as a CTO is that gravity just pulls people to ‘build it here’ for some reason,” he says. “I never understood it.” It’s something that has to be continually resisted or wind up wasting time maintaining things that aren’t part of the core business.

CIO, Data Engineering, IT Leadership

Read More from This Article: How FiveStars re-engineered its data engineering stack
Source: News

How FiveStars re-engineered its data engineering stack

Mobilizing a culture shift

Related posts