Data and Baseball
As baseball fans, the threat to our privacy, autonomy and way of life that big data arguably presents is 'no biggie' -- especially relative to our concerns about what big data is doing to our beloved game -- especially, as the baseball season approaches. No one has ever accused true baseball fans of being normal people.
Most midldle-aged and older fans have been introduced to the role data can play in the modern game through Money Ball. For most fans, Money Ball showed how data analysis could yield surprising results that go against the grain of conventional baseball wisdom -- the value of walks, on-base percentages, the relative unimportance of stolen basis and strikeouts.
The Money Ball results were defined by three predominant features. First, they were easy to express in the familiar categories already in use in the baseball lexicon, thus rendering data easy to understand using the common parlance. The data was not esoteric, say, in the way spin rates are.
Second, the data driven results changed the way the game was played, primarily at the margins. Money Ball baseball may have led to a data analyst or two joining the front office, but it did not threaten to replace managers and their decision making authority with computers, or scouts with computer scientists.
Finally, those data driven results provided a way to restore a kind of competitive balance to the sport, increasing the likelihood that teams that couldn't afford ace pitchers and home run sluggers, like the Yankees could, might be able to compete though limited by smaller payrolls, but taking advantage of effective low cost strategies, as the A's and the Rays have.
It also created a new baseball icon: the nerd.
It did all this and more without threatening the game as we knew it or our appreciation of it.
Data was the first fully legal performance enhancing drug of the modern baseball era.
What changed?
Unsurprisingly, the data got more complex and esoteric. Rather than changing the game at the margins, it threatened to change the game at its core. Line-up cards were filled out by the analytics division, who in the mythological narrative of the era were composed of individuals squirreled away in dark rooms -- wearing thick glasses and reading computer screens instead of watching baseball games.
Data-driven baseball robs the game of its emotional core, severing the emotional ties that bond fans to teams. How many times have Yankee fans longed to bring back Lou Pinella to manage the team because of his his fiery disposition, and not the results he produced on the field? Probably as often as Mets fans called for Showalter's head for his unwillingness to 'protect' his players -- especially Alonso -- who were subject to being thrown at and hit by opposing pitchers.
Data hasn't quite turned the major league game into a glorified video game, but it sure feels like it has dehumanized the game and made all the moves within not just predictable but predetermined.
Let me offer an optimistic take on how to think about data in baseball
I don't doubt that big data and its analysis has transformed the way the pro game is played and not always in a good way. But to me, we are still in the early, formative stages of determining what data is important in baseball and how it can be used most effectively.
While most fans fear that data fixation will lead to micro management of every pitch in every at bat removing all creativity and surprise, I don't see that as the likely best use of existing data.
Also most fans worry that heavy reliance of data implies that the human element -- judgment, instinct, relative baseball IQ-- will disappear from the game. I think the opposite is true.
* Data' is not synonymous with 'information.' Information informs; not all data does. This means that we need to distinguish among data that which is noise from that which is genuine information.
* Context makes data meaningful or comprehensible but it doesn't make it useful.
* Data's usefulness depends on a framework for interpreting it.
* That framework has three components: A set of goals; a way of organizing the data coherently so that it can be assessed for its usefulness in reaching those goals. And then a plan based on the data is formulated that puts the data to work in achieving the goals.
Identifying context, goals, creating a framework and organizing the data coherently, and then formulating plans to reach those goals are all human activities that require judgment, baseball IQ, a degree of creativity and insight.
Data is not a substitute for judgment. You cannot make sense out of data without human intervention, and you cannot use data effectively without human judgement. One would be foolish not to attend to what the data shows, but the data is compatible with a wide range of options, of paths one can follow. It's literally impossible to be a prisoner of data, since all data is compatible with a large number of different conclusions. Data informs judgments, it does not eliminate the need for them.
But that's why in your organization you want people who understand the strength and limits of data and are neither threatened nor seduced by it
I work with two basic principles when I analyze baseball. One applies to the baseball organization as a whole; the other applies to the baseball team in games and over the course of a season.
Organizational Goals, Risk Management and Data
The Principle of Optimal Risk Management: When it comes to weighing risk and reward, the goal is to minimize the sum of the costs of the risks you take and the costs of avoiding those risks and reducing those costs.
Data for an organization is most important to the extent that it bears on optimal risk management. The more you know about your goals, your players, their likelihood of success, injury, what you have in the minors, how likely their talent is to project at the major league level, the better you will be at risk management. You're not looking at individual at bats. You are looking at the landscape of risk and reward, and taking it on board to reach the best judgments you can make given your long term goals. And its this kind of thinking that helps you balance payroll, roster construction strategies, investments in assessing talent, and everything else I have discussed in previous posts.
Baseball Team Performance and Data
Every team is constructed to perform as well as it can over the course of a season, including the potential of post season play. In my view, we have kind of an ordinary or folk language in which we all talk about our teams: good field, no hit; lots of homers, but no situational hitting, etc. But this is just a common way of expressing the much more complex notions of competitive and strategic advantage.
In every sport, you are looking for a team that has as many competitive advantages at different positions or across a line-up or a pitching staff. And you are trying to create schemes that allow you to take advantage of those advantages. So if you have great team speed, that is likely to be an advantage over most teams you play. Can you exploit it? Not if you don't get men on base. So you construct a line up and play a style of ball (small ball, baserunning, hit and run, etc) designed around that competitive advantage. If you can do that, you are taking reasonable steps to turn your competitive advantages into a strategic one. And then you have to be able to exploit your strategic advantage.
Let me take an example from football. Suppose you have a fast, great route running, ball catching possession receiver. This person has, let's stay a strategic advantage against both zone and man to man coverages. Now the question is, can you exploit it? Well, you need a quarterback who is accurate and gets rid of the ball on time.
No sport has changed more and in a way that reveals that the lens through which they are constructing their teams through emphasis on competitive and strategic advantage than professional basketball. The game is dominated by looking to create a strategic advantage through matchups.
This is as true of baseball as it is of any sport. It is just easier to see in football, hockey and basketball than it is in baseball.
The Principle of Strategic Advantage: Teams should be constructed around identifying competitive advantages in all phases of the game, then scheming how to play the game to turn competitive advantages into strategic ones that can be exploited. Teams play the game in ways to create strategic advantages that they are capable of exploiting -- on both sides of the ball.
There are so many banalities in baseball that are insightful once you understand that they are just ways of expressing the importance of relying on data. My favorite banality is that the best managers see their job as putting each of their players in a position to succeed, and in doing so to contribute most to the team's success. How in the world is that accomplished without data and its analysis.
When managers put players in the best position to succeed, the players can respond instinctively. I want my decision makers to know what the data suggests they do, but to have enough faith in their own judgment to have an educated feel for the circumstances. The more one understands the data, the more one appreciates its limitations as well as its power, the less control the data has over them.
Embrace the data because it can inform your judgment. If you are a slave to it, however, it will cripple you, kill your capacity to innovative and be creative. Ignore it and you'll keep sending righty batters to pinch hit for your lefty scheduled hitter whenever a lefty comes in from the bullpen -- even if the guy you just took out of the game, though a lefty, hits 280 against lefties and guy you are pinch hitting him with is in the middle of an 0-10 slump against all pitchers. And commentators will no doubt report that in doing so the manager is playing the odds, when in fact he is doing just the opposite.
How do the Mets Use Data
When it comes to the Mets, I am confident that the leadership -- Cohen and Stearns -- understand the importance of data for proper risk management, which is itself fundamental to formulating strategies for long term success.
When it comes to the team leadership in game and over the course of the season, I am less confident, but hopeful. I think Willard understands how data drives a pitching staff's ultimate success: pounding the zone with your best stuff, and proper sequencing. Heffner, frankly, drove me crazy as the pitching staff became a bunch of nibblers which led to far too many walks, which, we presumably learned a long time ago from Money Ball is a valuable advantage for the offense.
It remains to be seen what the other coaches will bring to the table. Most importantly, it remains to be seen how well Mendoza integrates data and how much faith and confidence he has in his own judgment.
I read his quick hook with starters differently than others. I did not see it as reflecting confidence in his own judgment. Quite the contrary in fact, I saw him as a prisoner of a common but unsophisticated understanding ot the data. He followed the conventional understanding of when the data tells you to pull pitchers. He was a prisoner of it because he wasn't confident enough in what he had seen and understood from 20+ years in the game.
On the offensive end, the data showed that the Mets style of play was too fragile. They scored a lot of runs and hit their share of homers, but the distribution was poor as regards over the course of a game, over the course of a series of games and over the line-up. They wanted to approach run scoring differently. The approach they put in place this year was driven by data. The total of runs and home runs doesn't matter as much as its distribution over games and over the line-up. The prior approach was too fragile and top heavy. So they had to find players that they believed would make fewer unproductive outs, were sufficient in number to create a lineup able to score throughout the game and not be as vulnerable as they have been to off days of their best players.
Defensively, the data revealed that their capacity to prevent runs was in fact a strategic liability. This impacted their approach to defense and to pitching, as constrained by availability of players that fit their bill at reasonable cost. This is a great example about how the two principles I outlined above work together. The Mets chose not to sign players that would have fit either the defensive or offensive plans because doing so would have reduced risks or harms to their potential long term success at too high of a cost, and I do not mean by that financial cost. Some of the players they could have signed would have stood in the way of players nearly ready to perform at the major league level. Ultimately, I believe that was a suficient reason to turn their attention away from Bellinger -- even if he fit the profile of offense and defense that the data indicated the Mets would be wise to pursue. But with Benge and Ewing in the outfield and Clifford and Reimer knocking at the door, the first two of which play good defense and augment the approach to the line-up the team has in mind, Bellinger fits the bill but at too high of an opportunity cost.
The key point here is that data is primarily valuable to the Mets and should be to most teams when it comes to responding to strategic issues, not to calling every pitch from the dugout. Second, data makes for better judgments but, like money, is no substitute for it -- especially for the Mets.
It will take time for teams to create the right frameworks for analyzing data just as it will take time to figure out the proper balance between data and baseball judgment. There's nothing to fear, but lots to keep an eye on. I know I will be doing just that, and hoping to remain cautiously optimistic about the Mets' ability to appropriately rely on data and to be led by people who understand its virtues and limits and to manage the risks associated with both optimally!

11 comments:
Morning Jules
I'm just a reader with comments now so I read ... and comment
Jules, you are a highly intelligent PHd former Ivy League Professor and Lecturer
We're a bunch of Moe's that and Mets fans and Mets writers that are Mets fans
You approach this game and role as a Lecturer of theory. We just want to know what position Brett Baty is playing
There is nothing wrong with what you are writing except they are too long and don't relate directly to the Mets, stats, and players.
It is a proven fact posts should be minimal 500 words and max out at 1,000. That is the attention span of the reader.
What you wrote today had an into and four clear highlighted areas
They should be part of a 4-part series THAT ALL LEAD BACK TO DIRRCT REFERRNCES TO METS STUFF
Gotta be Mets
Line them up on four 9am slots the same week and readers will stay with them and respond
*intro and four parts
Mack,
Excellent comment and outstanding advice.
I love data, and analyzing data. Data theory, not so much.
I get the points
Data is the new OIL!!! What gets measured, gets done.
As Jules says: The key is collecting, organizing, analyzing AND applying the data better than the rest.
Cohen & Stearns know how to do this - well. This season will tell if Mendoza can apply it for advantage. We will see.
i have just shortened the post to reflect many of the points, but not all. I think people need to understand what data is and where it is most effectively used in the baseball context. I do discuss the Mets especially because it does seem to me that the leadership understands data and that it figures prominently in their ability to organize efficiently to pursue their long term goals. I then discuss the leadership on the field because I am pretty sure that last year Mendoza did not strike the right balance between data and judgment; The success on the field depends on that; and the point of the post was to explain the connection between data and judgment; between quantitative and qualitative, etc.
Next post will be on who should play RF, which is great but predictive. My expertise is not predictive; it is analytical.
Then I have my Richard Neer interviews and after that I will post the biomechanical videos showing what Tong does and how it is different from what Myers does, but which if you just looked at the delivery you wouldn't notice. And then the next one will be a contrast between Alvarez sequencing also on video and Jacob Reimer's.
By the way, Reimer has a very quick bat. He has a short swing as does Vientos, but Vientos's bat does not go through the zone quickly. If you want to think of bats that have gone through the zone quickly, the three power hitters that come to mind are Aaron, Rice and Bonds. If the bat comes through the zone quickly it speaks volumes about two things. First and foremost, your sequencing. secondly the length of your swing.
I will go through all this in the next couple of weeks. But frankly, people talk about data all the time, but they don't understand what it is. And they dislike it because they think it eliminates baseball knowledge; and that it's just numbers and not thought. I tried in this piece to point out that they have it backwards.
Great stuff Doc
Understand
I think 🤔
I agree very much with your assessment of Mendoza. I hope that he is able to form a more trusting relationship with the new pitching coaches Willard and McKinney so they can help him make better decisions on when to pull a pitcher. Both of these new pitching coaches are well versed in analytics but also have the instincts to reduce it to practice.
Post a Comment