Few sports are so closely associated with data analytics as baseball. For more than 160 years, statisticians have tried to represent the game numerically. In 2015, Major League Baseball revolutionized a sport already known for its sophisticated use of data with MLB Statcast, a tracking technology that collects enormous amounts of game data.
Alexander Booth, assistant director of R&D for the Texas Rangers, says the data from Statcast, the Rangers’ own data sources, and the team’s use of analytics, machine learning (ML), and AI were contributing factors to the Rangers’ World Series title in 2023.
From 2015 to 2019, Statcast consisted of a combination of camera and radar systems, and in 2020, MLB partnered with Hawk-Eye Innovations to provide optical tracking systems. Each MLB club now has 12 Hawk-Eye cameras arrayed around their ballparks. Five alone focus on pitch tracking, while the other seven track players and batted balls. With the help of Hawk-Eye, Statcast tracks and quantifies all manner of data: pitching (including velocity, spin rate and direction, and movement), hitting (exit velocity, launch angle, batted ball distance), running (sprint speed, base-to-base times), and fielding (arm strength, catch probability, catcher pop time).
“Not only do we have the traditional ball tracking metrics like velocity and spin rate, we’ve also got player positioning data,” Booth says. “We’re tracking the positions of everybody on the field at 30 frames a second for the entire game, which is a lot of information to process and parse.”
Booth notes that the new trove of data allowed the Rangers to start analyzing biomechanics: how the body moves when performing athletic actions.
“We’re looking at the pitching motion, we’re looking at the batting motion, and now we’re able to track these joint centers — your head, shoulders, knees, and toes — at up to 300 frames a second,” Booth says.
The data is feeding AI predictions around everything from the optimal batting lineup against a starting pitcher, and optimal defensive positioning against a given batter facing a given pitcher, to injury prediction.
Streamlining teamwork
As a result of the Statcast revolution, the Rangers’ analytics team started to transform, and Booth was the fifth person on it when he joined the Rangers in 2018. He remembers using a lot of spreadsheets, and a lot of the team’s work consisted of the GM reading or hearing something, and asking the analytics team to look into it, which generally involved a lot of SQL queries, imputing data into spreadsheets, and looking at local instances.
“How do you maintain a single source of truth effectively if you have multiple people working on the same copy of a spreadsheet?” he says. “How do you know which version is the real one? We were the go-to guys for any ML or predictive modeling at that time, but looking back it was very primitive.”
The analytics team started shifting its operations from on-premises systems to the cloud, leveraging Apache Spark and Databricks. The team has now scaled up to about 25, but Booth says it’s still lean and maintains a startup-like mindset.
“We’re going to act and fail quickly, like a tech startup,” he says. “But there’ve been a couple of current deliverables from this migration that have led to where we are today, especially that World Series win. Positioning revolutionized a lot of our defensive models.”
A shift in rules and strategy
For about the last 100 years, a strategy called the defensive shift (or infield shift) was popular in baseball. Primarily used against left-handed hitters, the shift involved the third baseman moving into the spot typically held by the shortstop, the shortstop moving just to the right of second base, and the second baseman moving to the grass in shallow right field, becoming a de facto fourth outfielder. The shift was not without defensive vulnerabilities: It left the area around third base and left field more sparsely defended, creating opportunities for left-handed hitters to exploit.
Booth and his team built models to predict not only the optimal times to deploy the shift, but spots for players to position themselves on the field. In 2023, MLB implemented several new rules that imposed limits on the defensive shift by requiring teams to have four infielders starting with at least one of their feet on the infield dirt prior to the pitch. The rules also require teams to have two infielders on each side of second base prior to the pitch.
“Even with a ban on the shift recently, we’re still able to have models that say [shortstop] Corey Seager should stand this close to second base, and in the outfield, how deep [center-fielder] Evan Carter should play some of these fly balls against specific players,” Booth says. “That really helped to lead to our playoff run.”
Another big storyline in MLB last year was the Rangers’ defensive prowess, specifically in turning double plays.
“[shortstop] Marcus Semien and Seagar are two very talented players who can turn a lot of really difficult double plays,” Booth says, “but I like to think we set them up for success by encouraging them to stand in the places where probabilistically they were most likely to be able to turn those double plays.”
AI and longevity
On the biomechanics side, a lot of the analytics team’s work is focused on predicting and understanding injury and fatigue. Booth notes that in recent years, the Rangers have acquired pitchers Jacob deGrom and Max Scherzer, both of whom are now in their late 30s.
“Both of these guys are really good, but they’re getting a little bit older and more injury prone,” he says. “We wanted to understand exactly how to manage their workloads.”
By leveraging biomechanics data from games and practices, as well as data from the players’ workouts and nutritionists, even sleep studies, the Rangers are better able to understand player health and performance. He notes that the new understanding of injury and player management has been even more impactful for the Rangers’ minor league affiliates.
“We use data to understand the whole journey of these players,” says Booth. “When they first come into rookie ball playing in the Dominican [Republic], for instance, how do we make them the best player they can be over three, four, or five years. This has gone hand-in-hand with our amateur scouting department too.”
Early on, Booth says, the primary consumers of the data his team analyzed was the front office, which was using it and reports for player evaluations, making trades, and so on. But as data became more accessible, a wider swath of the team started using it.
“Our coaches are asking for more data, trying to justify the gut feel of their domain expertise with the raw numbers,” he says. “We now have analysts that travel with the team who are our conduit of communication.”
This data democratization has played a significant role to help the team become more data driven at all levels.
“One of our tenets was data availability can lead to disruption,” Booth says. “While anyone can use low-code and BI tools, and create awesome reports, they have to have the data clean and available first.”
That principle makes Booth excited about the possibilities inherent in gen AI, as most of the Ranger’s potential data consumers aren’t technical users. Gen AI will make it possible for those non-technical users to interact with the team’s trove of data and gain the insights they need to maximize performance.
Read More from This Article: AI is key player in Texas Rangers’ winning formula
Source: News