12/18/11

What Are “Sabermetrics” FINAL Edition

Several months ago, Mack asked if anyone wanted to delve into the world of Sabermetrics and write a series of articles, highlighting some of the more common formulas that are thrown around by today’s talking heads. I raised my hand, since I have a bit of a background in statistics and a general appreciation for mathematics (maybe that makes me a bit odd, but so be it, I am a Mets fan after all).


Anyway, including this article, I have spent the past nine weeks going over some of the more common measurements and I have learned a bit more then I bargained for (and hopefully you have picked up a few things, too). Since this article is the last one for the year (i.e. the site is going on hiatus for a few weeks for the holidays), I figured that this would be a good opportunity to tie up the series on Sabermetrics with a nice, red bow (see what I did there?)

If any of you are interested in a statistical measurement that I have not covered, or maybe I covered it but you want more (I know you are out there), I am sure if you ask Mack, he can relay the message to me and I would be glad to write another article on the topic. Otherwise, I would like to branch out a bit, once the new year is upon us and cover a few different topics.


So, let’s get on with it already! Last week, during the eighth installment of this series, we discussed VORP (value over replacement player) and how it generally relates to player evaluation. Today, I would like to change our focus and explore a set of statistical formulae called PECOTA, which is also known as the “player empirical comparison and optimization test algorithm”. Some of you may have exclaimed “WTF?”


As screwed up as that sounds, it is actually pretty cool and it was named after a middling player from yesteryear, Dave Pecota (who was thought of as the average man’s “average ballplayer”.


More specifically, Nate Silver developed the system in the early 2000’s and it general intent was to serve as a forecasting tool for baseball player performance. In other words, it is a predictive model, sort of like the large, crystal balls the creepy fortune tellers use at the the local fair (not that I would know). Due to the system’s popularity, Baseball Prospectus “bought” the rights to the formula(s) and now owns the whole ball of wax (although Nate Silver was heavily involved for many years after that).


Consider the following quote, for more insight;


“PECOTA forecasts a player's performance in all of the major categories used in typical fantasy baseball games; it also forecasts production in advanced Sabermetric categories developed by Baseball Prospectus. In addition, PECOTA forecasts several summary diagnostics such as breakout rates, improve rates, and attrition rates, as well as the market values of the players.”


Well, crap.....what do we need humans for, when computers can do all the work? Joking aside, the reason I mention this statistic is that you will hear different player’s projections as “Pecota projections” and it is pretty popular with the “fantasy baseball crowd” (again, I know you are out there). As a quick hint, I think a series of articles on fantasy baseball may be in our future for 2012.


Getting back to the point of this article, PECOTA is based on a method called “comparable players” or “similarity scores”. In essence, you take the player in question and create a statistical baseline for their career, up to that point. According to the experts, of the many thousands of players who have played the game in the past, at least one player probably had a similar statistical output over a similar amount of time (this can be done for both hitters and pitchers).


The comparisons are made using four basic criteria;


  1. Production Metrics - which are your basic statistical measurements, like batting average, strikeout rates, etc.
  2. Usage Metrics - which are things like career length, innings pitched, plate appearances, etc.
  3. Phenotypic Attributes - which include height, weight, handedness, etc.
  4. Player’s Role - fielding position for players, starter/reliever for pitchers.


PECOTA then uses an analysis method called “nearest neighbor” which matches the player in question to those most similar to him (typically focusing on the most recent three year stretch of playing time). What I have described is a basic version of what takes place. There are additional factors considered, such as the home ball park of the player, but you get the overall idea, right?


Once there is an acceptable “comparable player”, the formulae attempt to predict future performance for the player in question, by looking at what the comparable player(s) accomplished and utilizing probability distributions for the different metrics in question (i.e. what is likely to happen, or what is probable).


Is it perfect? No, because there are always exceptions in life and that is true in baseball, as well. Sort of like the 2012 Mets shocking all of baseball and winning the World Series (yes, I have had a few beers while writing this, so what?) Is that probable? No, it is highly unlikely, but it could happen.


But, statistically speaking, what is probable is much more likely to happen then what is possible. Smart planners, whether it be general managers, or just a fan who manages his fantasy baseball roster, can take solace in statistics and statistics are pretty good at predicting the probabilities.

So, as the holidays pass and we get into 2012 (Spring Training), if you hear someone refer to PECOTA, you will know what it means and that the proposed statistics are what is likely to happen.


Random/Year Ending Thoughts


Before I finish up, on a completely unrelated topic, I want to ask any of you if you have ever broken an inanimate object on purpose? Maybe this is crazy, but, I have and I usually am pretty pleased with myself after the fact. You know, it needs to be taught a lesson, right? Sort of like when the hammer you are using needs to be shown who is boss, right after you accidentally hit your thumb with it. Oh, never mind......my wife doesn’t get it, either.


Ryan Braun, what the hell were you thinking? Scary to think about, really. I mean, if he’s doing it, then who else is? Dude’s a string bean, too. Very odd and very bad for baseball.


Beer recommendation of the week.......Sierra Nevada’s Celebration Ale!


The Mets are in the middle of some pretty crappy baseball. Hello Captain Obvious! The current “streak” has pretty much encompassed the latter part of 2007 and everything since. Oddly enough, this offseason is the first time in quite a while where I am at ease, despite that fact. I have that much trust in our current leadership (despite the Wilpon factor) and the process that they have started for our favorite team. 2012 will be a bit on the rough side, but I am pretty stoked for 2013 and beyond. I think we will all look back at this time period and laugh some day! (and not the maniacal laughter of the insane)


Lastly, I want to take a moment to thank Mack for all he does for this site, giving us amateur hacks an outlet to write and for all of the excellent coverage he provides on our minor leagues. Even though I am a semi-regular contributor, Mack’s Mets is still required, daily reading for me. Mack definitely deserves better from the Mets and the general blogosphere.


Merry Christmas to all of you and here’s to better times in the (hopefully) near future!


1 comment:

Mack Ade said...

I just hate a writer that doesn't prepare... :)