Ok, I love Neil Young. I used to hold the view and expressed it too often to deny it now that the better musical talent in the Buffalo Springfield (and CSNY) was Stephen Stills. So instead of denying the obvious or pretending I wasn't wrong (I was), I hope to redeem myself by ripping off the title of one of Young's many painfully touching songs.
And now for something completely different
I had an opportunity to read a recent Bill Madden column in which he criticized the current Mets' approach to roster construction. The problem stems from David Stearns' approach which, according to Madden, has relied on 'analytics' at the expense of other strategies, which are not specifically identified, but which share the feature of not relying primarily on analytics. While I have not followed Madden's work over the years, I have no reason to believe that he is anything other than professional, takes his craft seriously, and is good at it.
I read the article carefully in search of an account of what Mr. Madden takes the 'analytic approach' to baseball or to roster construction to be, and an explanation of why following such an approach would be a bad thing. My search was not rewarded. To be sure, Madden ascribed such an approach to Stearns, but with only a name and not a characterization of the approach, I could only guess at what exactly he had in mind when using the term 'analytics.' He did give a couple of examples of ways in which the roster has failed to measure up to expectations, and other examples exist of players who were let go and have now found success elsewhere.
Excuse me for demanding more, but there is no evidence offered that the player Stearns let go was released because he failed to measure up on some analytics criteria, let alone any argument that he would not otherwise have been let go for the usual reasons players have always been let go -- long before there was such a thing as 'analytics' (if ever there was). So without an account of what analytics is and why it's bad, and more importantly, why it has led to the the Mets miserable performance this year, Madden's piece is little more than an example of the logical fallacy of 'post hoc ergo procter hoc' which is roughly understood as inferring that A is the cause of B because A precedes B.
He may well be right in his conclusion but his argument for the conclusion lacked the basic elements of a sound argument.
What was missing?
My goal here is to educate, not to criticize. Madden's piece is not an outlier. It's common to criticize the analytic approach to baseball generally or some particular use of it in baseball without specifying what the author means by 'analytics.' And it is equally commonplace to attribute a failure in a team's performance to its adoption of an analytic approach without defending the claim that the approach is the cause of the failure, which on some accounts of causation would amount to showing that the failure would not otherwise have occurred but for adopting the analytic approach.
I take the latter criterion of causation to be too strong, but it is reasonable to ask for some causal evidence to support the claim. Right?
So two things are missing, not just in Madden's piece, but in virtually everything I have read about analytics and its failings in baseball
* An account of what makes an approach 'analytic'
* An account of how the analytic approach is responsible for the failures attributed to it.
I'm prepared to believe that the analytic approach may be inapt in baseball and other activities and that implementing it is responsible for some bad baseball performance (though I am also prepared to believe that it may well be responsible for some baseball success stories).
What I am pretty sure of is that we have no clear or shared conception of what the analytic approach is and that we have no agreed upon account of how we can determine which outcomes on the field can be attributed to it. Lots of fire in the criticisms but precious little light.
So let's try to shed some light, OK?
To be honest and fair, the only thing that most people associate with analytics in sports (and in other areas as well) is a (distinctive kind of) reliance on data, particularly quantitative information. This conception of the analytic approach identifies it with related terms like 'data analytics' and 'quantitative assessment'; and it invites the following kind of objection.
* There is an important difference between quantitative and qualitative information. The objection to the analytic approach is that it ignores or undervalues qualitative information.
This objection doesn't survive even modest scrutiny, because while there is a difference between quantitative and qualitative assessments, many, if not most, qualitative assessments are grounded in quantitative information. So, for example, if we are trying to figure out whether a risk is worth taking, i.e. justified or even mandatory, we surely would want to know and thus be able to compare the expected costs of taking the risk and the gains that could be expected to be realized.
So even if we were committed to assessing players on some qualitative standard applying that standard to the facts at hand often requires developing data and analyzing it; and not just in baseball. In many areas of life, our judgments about what should be done depends on the numbers. And as the saying goes, why guess at the numbers when you can measure?
On the other hand, I know of no serious data analytics person who believes that qualitative judgments are or can be eliminated in favor of data. Data needs to be interpreted, at the very least in the light of some goals or interests that helps sort which data is relevant and why. Data is an instrument, a premise in an argument. It is insufficient on its own, without norms or standards, goals and interests, values and principles, to warrant a conclusion of any sort.
If the issue is simply that the data analytics folks eschew qualitative standards in favor of numbers then that's a straw man and should be of no real interest to anyone. There's nothing to argue about.
* One variation of the above objection is that the analytics geeks rely too heavily on the measureable and too little on the non measurable.
Here the objection is a bit different. The response to the first objection is that both quantitative and qualitative information are interwoven with one another. Qualitative judgments require support and often that support takes the form of quantitative data. At the same time quantitative data doesn't tell us what to do absent qualitative standards, interpretive principles, goals and interests. So they are inextricably linked.
This objection takes a different form. It relies on the idea that commitment to analytics is not just commitment to the role of data as support or evidence that requires some independently specified criteria or goal. To be committed to data analytics is also to be committed to a certain quantitative kind of criteria. Here are some examples. When comparing risks and benefits, the relevant criteria is always going to be optimal risk reduction or maximizing benefit relative to risk. If comparing probabilities of a righty hitter facing a lefty pitcher and a lefty hitter facing a lefty pitcher, the correct standard to apply is: higher probability of success, period. And so on. The criteria are themselves suggested by the data: higher probability of success, greatest expected gain, lowest expected risk or cost, and so on.
And so the objection is based on the idea that if you are true believer in analytics, you are also a believer that the data call for a certain kind of standard of assessment: one that puts emphasis on 'more' 'less' 'efficient' 'optimal' and the like. The problem is that there are genuinely different kinds of criteria that are applicable in baseball, like, BB IQ, intuition, instinct, creativity, character, discipline, leadership, and so on.
The criticism can now be understood as the claim that the data analytics folks rely too heavily on criteria of the first sort as compared with criteria of the second sort: in effect, criteria that seem suitable when dealing with numbers as opposed that rely on the less measureable attributes of players.
Now this objection is getting closer to a genuine concern, but it is not yet there, because as stated, the objection is really a call for the right balance between the two kinds of criteria. Finding that balance must be decided team by team. Its resolution depends on where the expertise lies in the organization and the results the team produces.
* A slightly different objection focuses on the difference between quantitative and qualitative information and calls into question the actual possibility of balancing criteria of the one sort with criteria of the other. Call the former 'hard variables' and the latter, 'soft variables'. Height and weight are examples of the former; beauty and creativity are examples of the latter. Balancing the quantitative and qualitative requires that the two to be commensurable. In other words, there has to be something common to both that allows me to trade one off against the other.
But there is no common ground between height and weight on the one hand, and beauty and creativity on the other. They are incommensurable, and what happens when you have an organization that is run by the analytics group is that the 'soft' variables, e.g. baseball IQ or intuition, instinct, creativity, character, team spirit, fellowship, etc., are 'squashed' by decision makers who are largely trained to assess the numbers and not the other traits that figure just as much, if not more,in ultimate performance and team success.
When drafting players you may be able to find the right balance when you have to trade off between a five tool player and a three tool player who has exceptional skills at those three skills and league average skill levels otherwise, but how can you trade off or find a balance in choosing between great skills with no creativity or instincts on the one hand and great leadership and discipline but major league average physical skill set?
Even if we could imagine ways of balancing or making trade-offs, the truth is that the leadership is likely to go with what they know best and feel most confident doing. The real objection here is that those who are trained largely in one way of organizing a baseball team or playing the game are very likely not just going to adopt their approach in hard cases, but to see certain cases that to neutral observers or ordinary fans as not hard at all; as easy in fact. It's always going to come down to being about the numbers.
I would not dismiss this objection or concern out of hand. We are all prone to endorse the way we do things or the way we have been trained to do them and while this should not lead us to dismiss other approaches, it often naturally leads to discounting them, and at worse, suppressing them. This is not a problem with analytics, so much as it is a human failing or shortcoming, and frankly needs to be monitored and checked in every organization whatever blueprint it follows -- from data analytics to the eye test.
But the objection to analytics runs deeper than this
The problem with stopping here is four fold.
* It is unsatisfying to say that the problem is a human shortcoming in favoring the methods with which one is familiar while discounting other approaches. Acknowledging the human condition may not give proper due to those who object to the modern approach.
* At the same time, stopping here with reducing the objection to identifying a human failing kind of cuts the debate off at its knees and never really explains to anyone what analytics is.
* It doesn't give those who believe that analytics is an overall positive force in baseball an opportunity to explain how and why it is.
* Finally, it does not give me a chance to open up the discussion of analytics and broaden it by pointing out some of its most glaring shortcomings: shortcomings that even someone who sees the value of analytics -- someone like me -- would have to admit are serious.
In short, stopping the discussion here would prevent all of us from becoming a little more modest about how much we can accomplish through the tools we employ, recognizing where those tools can be most effectively deployed as well as where they are least likely to be helpful. In other words, we need to find the limits of our approaches as well as their respective promise.
In the meantime, I'd like to know where most of you stand on the analytics approach to baseball. Let me know in the comments.

Everyone wanted to play with Neil
ReplyDeleteI'll stop here
Sorry for the lateness of the post. I'm currently on the west coast and when I scheduled the publication for 9am -- my usual time -- that turned out to be 12pm eastern time. Its a numbers problem, honestly
ReplyDeleteWhat do most people believe the analytic approach is and why do they think there is something especially wrong with it when it comes to baseball? Baseball's use of analytics is nothing compared to the data driven impact on soccer and American football. Every football team looks at numbers primarily. No one drafts quarterbacks who can't complete 20yd sideline passes
ReplyDelete