Pages

3/3/26

Cautious Optimist - Baseball and Data Analytics



 Data and Baseball


As baseball fans, the threat to our privacy, autonomy and way of life that big data arguably presents is 'no biggie' -- especially relative to our concerns about what big data is doing to our beloved game -- especially, as the baseball season approaches.  No one has ever accused true baseball fans of being normal people.

Most midldle-aged and older fans have been introduced to the role data can play in the modern game through Money Ball.  For most fans, Money Ball showed how data analysis could yield surprising results that go against the grain of conventional baseball wisdom -- the value of walks, on-base percentages, the relative unimportance of stolen basis and strikeouts.  

The Money Ball results were defined by three predominant features.  First, they were easy to express in the familiar categories already in use in the baseball lexicon, thus rendering data easy to understand using the common parlance.  The data was not esoteric, say, in the way spin rates are.  

Second, the data driven results changed the way the game was played, primarily at the margins.   Money Ball baseball may have led to a data analyst or two joining the front office, but it did not threaten to replace managers and their decision making authority with computers, or scouts with computer scientists.  

Finally, those data driven results provided a way to restore a kind of competitive balance to the sport, increasing the likelihood that teams that couldn't afford ace pitchers and home run sluggers, like the Yankees could, might be able to compete though limited by smaller payrolls, but taking advantage of effective low cost strategies, as the A's and the Rays have. 

It also created a new baseball icon: the nerd.

It did all this and more without threatening the game as we knew it or our appreciation of it.

Data was the first fully legal performance enhancing drug of the modern baseball era.

What changed?

Unsurprisingly, the data has gotten more complex and esoteric.  Rather than changing the game at the margins, it threatened to change the game at its core.  Line-up cards were filled out by the analytics division, who in the mythological narrative of the era were composed of individuals squirreled away in dark rooms -- wearing thick glasses and reading computer screens instead of watching baseball games. 

At bottom, the traditional baseball fan worries that heavy reliance on data will squeeze the human element from the way the game is played, making teams virtually indistinguishable from one another, rendering the sport boring or worse ultimately robbing it -- ultimately, severing the ties that bond fans to teams. 

Most of us have experienced directly the ways in which easy access to information has led to questionable consequences for previous mainstays of our lives, how we have gone from performing in games with others to watching our kids and grandkids playing them alone on screens. 

This is a serious worry, not just about baseball, but about the social fabric of our lives that should not be dismissed with the wave of the hand.

I am in no position to quell these larger fears, which are real, but I might be able to reduce some of the anxiety Mets fans (and even fans of other teams) have about seeing their game being taken over by 'data-fication' and transformed in ways that render it unrecognizable to them. 

A more optimistic take on how to think about data in baseball: Part I

The first thing you need to understand is that data cannot replace human judgment, intuition, and instinct.  Here's a brief explanation of why this is true.

    * We need information to act.  It's easy to think that 'data' is synonymous with 'information,' but it's not.  Information informs; data may or may not.  Some data is informative; some of it is just noise.  That doesn't mean that data that is noise is false or that its being false explains why it is not information.  No, the data may be true, but just not relevant.  Hold that thought about relevance for a moment.  The first point about data, however, is a simple but important one: we have to distinguish data that is information from data that is noise. 

    * Here's a piece of data. He's really fast. It may be true. It may be false, but we don't know yet because we don't know exactly what is being conveyed by it.  Are we talking about his typing, his running, the time it takes to perform a task: which task?  Here's the second point about data: Data requires context to make it comprehensible.

        * Ok, let's suppose that the piece of data that he's really fast is contextualized to completing a task, particularly, let's say the task of completing an at bat in a baseball game.  But we don't yet know whether this bit of data is helpful or not.  Here's the third point about data: Data does not interpret itself.  It doesn't tell us whether it is noise or information, valuable or not.

    *  Interpretation of data is not just a technical exercise that we call data analytics.  It is a complex framework that has at least two distinct elements: identifying a goal or purpose; and then a criterion for organizing the data given that purpose, so that the data, taken as a whole, is coherent enough to be assessed for its usefulness in achieving the goal or purpose. Here's the fourth point about data: In order to distinguish information from noise -- in other words, determine relevancy of the data -- you need an interpretive framework that specifies what you are looking for and why

    Finally the data you now have determined is useful for a given purpose has to be put to work by developing strategies and plans for doing so.  And that is the fifth and most important point about data.

Identifying context, goals, creating a framework and organizing the data coherently, and then formulating plans to reach those goals are all human activities that require judgment, baseball IQ, a degree of creativity and insight.   

The takeaway: Not only can't data replace the human element, including human character, emotion and intelligence; data is useless without it.  We can't make sense of it, determine its relevance, or achieve goals using it otherwise. 

One would be foolish not to attend to what the data shows, but the data is compatible with a wide range of options, of paths one can follow. It's literally impossible to be a prisoner of data, since all data is compatible with a large number of different conclusions -- which isn't to say that people (and especially baseball managers) cannot be crippled or frozen by it. 

Data informs judgments, it does not eliminate the need for them. Anything but! That's why in your organization you want people who understand the strength and limits of data and are neither threatened nor seduced by it

A more optimistic view about data in baseball: Part II

I haven't said anything yet about the fear that data will ultimately make the game robotic and emotionally void, thus boring, ultimately severing the bonds between fans and teams.  

Unlike the above knockdown argument against the view that data will squeeze out baseball's human element, I can't offer a knockdown argument capable of quelling those fears, which are genuine. 

My view is that ultimately data is most effectively and efficiently applied to help solve more systemic or structural issues. My thoughts are preliminary and plausible, but even if I am right, it doesn't eliminate the fan's worst fears.  

Read the following with those caveats in mind

Organizational Goals, Risk Management and Data

The Principle of Optimal Risk Management:  When it comes to weighing risk and reward, the goal is to minimize the sum of the costs of the risks you take and the costs of avoiding those risks and reducing those costs.

Data for an organization is most important to the extent that it bears on optimal risk management.  The more you know about your goals, your players, their likelihood of success, injury, what you have in the minors, how likely their talent is to project at the major league level, the better you will be at risk management.  You're not looking at individual at bats.  You are looking at the landscape of risk and reward, and taking it on board to reach the best judgments you can make given your long term goals.  And its this kind of thinking that helps you balance payroll, develop and implement roster construction strategies, determine where to invest in talent assessment, development and creating environments that increase confidence in performance projections and so on. 

Baseball Team Performance and Data

The Principle of Strategic Advantage: Teams should be constructed around identifying competitive advantages in all phases of the game, then scheming how to play the game to turn competitive advantages into strategic ones that can be exploited.  Teams play the game in ways to create strategic advantages that they are capable of exploiting -- on both sides of the ball.

Data is essential to identifying the competitive advantages individual player's possess and the conditions under which they are most likely to be able to turn those competitive advantages into strategic ones, as well as in figuring out the style of play on both sides of the ball that puts the team in the best position to create and exploit their strategic advantages while reducing the impact of their strategic vulnerabilities.  

No sport has changed more in a way that displays the extent to which the front office constructs the team almost entirely in terms of creating competitive and strategic advantages than professional basketball. The game is dominated by looking for strategic advantages through matchups, and entire styles of play have changed over time with exactly this in mind.  Need an example: basketball is dominated by two distinct offensive styles of play: pick and roll; five players spreading the court leaving the lane open.  Denver's offense is completely based on the first; the Thunder's on the second.  

This is as true of baseball as it is of any sport.  It is just easier to see in football, hockey and basketball than it is in baseball.  It was easiest to see in baseball when teams were allowed to employ the infield defensive over-shift.  And it really says a lot about baseball that players and teams were unable to adjust their offensive strategies to take advantage of the vulnerabilities that deploying the shift created.  League leadership never should have intervened.  Nothing shows baseball's relative inability to exploit data to create strategic advantages as much as the Commissioner's Office intervention to eliminate strategic advantages and potential liabilities. 

The Mets and Data

So how are the Mets using data?  Are they any good at it?  What should fans' feel about their current attitude toward risk?

As I said, we should think about data as responding to different kinds of issues.  When it comes to the front office and baseball organization in general, I don't think there have been better articles written than those RVH has produced on Mack's Mets specifically on the Mets, and his account of the systemic approach they have taken.  The only addition I have introduced to his framework is the principle of risk management and its centrality to the organization -- given Steve Cohen's real job as head of a Hedge Fund.  Read RVH's posts.

When it comes to the Mets, I am confident that the leadership -- Cohen and Stearns -- understand the importance of data for proper risk management, which is itself fundamental to formulating strategies for long term success.  

If there is an issue that the Mets have with risk, it is not at the organizational level. 

When it comes to the team leadership in game and over the course of the season, I am less confident, but hopeful.  I think Willard understands how data drives a pitching staff's ultimate success: pounding the zone with your best stuff, and proper sequencing. Heffner, frankly, drove me to drink as the pitching staff became a bunch of nibblers leading to far too many walks, which, we presumably learned a long time ago from Money Ball is a valuable advantage for the offense -- in this case, the teams the Mets were playing!

It remains to be seen what the other coaches will bring to the table.  Most importantly, it remains to be seen how well Mendoza integrates data and how much faith and confidence he has in his own judgment.  

I read his quick hook with starters differently than many others do.  I do not see it as reflecting confidence in his own judgment.  Quite the contrary in fact.  I saw him as shackling himself to a common but unsophisticated understanding ot the data.  He followed the conventional understanding of when the data tells you to pull pitchers.  He let the data determine his actions, not because the data is infallible or determinative (it is neither), but because he wasn't confident enough in what he had seen and understood from 20+ years in the game.  Hopefully this will change in the upcoming season.  But first he needs to understand data well enough so as not to be frozen into mechanistic deployment of it.  Full grasp of data puts it into proper perspective.  Anything less is likely to lead to overplaying it or refusing to learn from it.  It's like I used to say to my students: 'Now you know just enough philosophy to be dangerous.'

On the offensive side of the ball, the data showed that the Mets style of play was too fragile.  They scored a lot of runs and hit their share of homers, but the total of runs and home runs doesn't matter as much as its distribution over games and over the line-up does. The Mets totals were good; the distribution was lousy.

They took the data seriously and chose to adopt a different approach to creating scoring opportunities.  So they had to find players that they believed would make fewer unproductive outs, were sufficient in number to create a lineup able to score throughout the game and not be as vulnerable as they had been to the off days of their best players.

Defensively, the data revealed that their capacity to prevent runs was in fact a strategic liability.  They came into this offseason hoping to turn this strategic vulnerability into a strategic asset.  But they may not have been able to achieve that lofty goal, and may have to settle for having neutralized their vulnerability.  

It is important to take a closer look at what they did and did not do this past offseason because it illustrates a number of the points I have been making in this and previous posts.  Stearns looked at the defensive data and saw a vulnerability.  I immediately assumed that his doing so made signing Bellinger his number one defensive target -- having traded Nimmo and and lost Alonso to free agency. Bellinger is an excellent left fielder and far better than average first baseman. He's also a better than average hitter and clubhouse presence. An obvious target. Of course, I was both confident and mistaken.

Decisions can make sense at some prices but not at others.  Here are some of the factors that went into Stearns's decision to back away from Bellinger that I had not fully appreciated until after the fact.  He had a stop gap at 1st base in Polanco and two minor leaguers knocking at the door, one of whom, if not both, likely to be ready by next year: Reimer and Clifford.  He also had two young vets on his team who could in principle split time with Polanco and replace him this year if necessary or perhaps next year if neither Clifford nor Reimer is ready to do so -- three if you count Mauricio.  So Bellinger's insurance value at 1st base is real, but less significant.

None of us knew it at the time, but Soto was heading to left field, which is Bellinger's best position.  However, the Mets had two top tier minor league outfielders in Benge and Ewing both of whom could play right field which had been left empty by moving Soto to left. 

Even more importantly, signing Bellinger for the long term contract he wanted meant taking on some potential declining years while blocking Ewing, Benge and maybe others as well.

Also, Stearns could improve the defense at a much lower cost and for a shorter period of time while also taking on back-up insurance in the event Benge isn't ready for OD.  Thus he brought in Robert, Tauchman and Melendez.

This decision making strategy exhibits Stearns' commitment to the risk management principle I have discussed above and in other posts.  Next, it displays the primary importance of data as grounding strategies and plans as opposed to its role in making one-off decisions. Finally, as I have emphasized here, data makes for better judgments but, like money, is no substitute for it -- especially for the Mets.

It will take time for teams to create the right frameworks for analyzing data just as it will take time to figure out the proper balance between data and baseball judgment.  There's nothing to fear, but lots to keep an eye on.  I know I will be doing just that, and hoping to remain cautiously optimistic about the Mets' ability to appropriately rely on data and to be led by people who understand its virtues and limits and to manage the risks associated with both optimally!


 




12 comments:

  1. Morning Jules

    I'm just a reader with comments now so I read ... and comment

    Jules, you are a highly intelligent PHd former Ivy League Professor and Lecturer

    We're a bunch of Moe's that and Mets fans and Mets writers that are Mets fans

    You approach this game and role as a Lecturer of theory. We just want to know what position Brett Baty is playing

    There is nothing wrong with what you are writing except they are too long and don't relate directly to the Mets, stats, and players.

    It is a proven fact posts should be minimal 500 words and max out at 1,000. That is the attention span of the reader.

    What you wrote today had an into and four clear highlighted areas

    They should be part of a 4-part series THAT ALL LEAD BACK TO DIRRCT REFERRNCES TO METS STUFF

    Gotta be Mets

    Line them up on four 9am slots the same week and readers will stay with them and respond

    ReplyDelete
    Replies
    1. Mack,
      Excellent comment and outstanding advice.

      Delete
  2. I love data, and analyzing data. Data theory, not so much.

    ReplyDelete
  3. Data is the new OIL!!! What gets measured, gets done.

    As Jules says: The key is collecting, organizing, analyzing AND applying the data better than the rest.

    Cohen & Stearns know how to do this - well. This season will tell if Mendoza can apply it for advantage. We will see.

    ReplyDelete
  4. i have just shortened the post to reflect many of the points, but not all. I think people need to understand what data is and where it is most effectively used in the baseball context. I do discuss the Mets especially because it does seem to me that the leadership understands data and that it figures prominently in their ability to organize efficiently to pursue their long term goals. I then discuss the leadership on the field because I am pretty sure that last year Mendoza did not strike the right balance between data and judgment; The success on the field depends on that; and the point of the post was to explain the connection between data and judgment; between quantitative and qualitative, etc.
    Next post will be on who should play RF, which is great but predictive. My expertise is not predictive; it is analytical.
    Then I have my Richard Neer interviews and after that I will post the biomechanical videos showing what Tong does and how it is different from what Myers does, but which if you just looked at the delivery you wouldn't notice. And then the next one will be a contrast between Alvarez sequencing also on video and Jacob Reimer's.

    ReplyDelete
  5. By the way, Reimer has a very quick bat. He has a short swing as does Vientos, but Vientos's bat does not go through the zone quickly. If you want to think of bats that have gone through the zone quickly, the three power hitters that come to mind are Aaron, Rice and Bonds. If the bat comes through the zone quickly it speaks volumes about two things. First and foremost, your sequencing. secondly the length of your swing.
    I will go through all this in the next couple of weeks. But frankly, people talk about data all the time, but they don't understand what it is. And they dislike it because they think it eliminates baseball knowledge; and that it's just numbers and not thought. I tried in this piece to point out that they have it backwards.

    ReplyDelete
  6. I agree very much with your assessment of Mendoza. I hope that he is able to form a more trusting relationship with the new pitching coaches Willard and McKinney so they can help him make better decisions on when to pull a pitcher. Both of these new pitching coaches are well versed in analytics but also have the instincts to reduce it to practice.

    ReplyDelete
  7. Hi everyone, and thanks for the comments. My first effort at this post was less than I had hoped for. No excuses really, but I have been dealing with serious house issues: five cracked pipes, falling ceilings and walls, a bevy of plumbers and construction workers. We're not done yet, but I can see the oncoming train at the end of the tunnel.
    In any case I made a few modifications when I had time yesterday, and awoke at dawn today to rework the piece to my satisfaction. I have done so and it has been updated accordingly. If you would like to, I would encourage you to consider rereading it or reading it for the first time. Saying it is up to my own expectations is not to say that it is any more plausible or convincing than it was before. It's just to say that I can live with it.

    ReplyDelete