1/7/13

Michael Friere - What Are Sabermetrics?



A decade ago, most fans would have looked at you like you were from another planet, when asked that question. Now, most fans are at least aware that “Sabermetrics” means statistical analysis. More specifically, the use of countless (and sometimes confusing) math formulas to predict and explain athletic performance, which in turn leads to team success, if done properly.

The new age of General Manager has come to rely on this process, some more heavily then others. Players and even sports agents are now on board with this process, using the formulas in reverse to bargain for additional money and longer contracts.

Think about the overall struggle that took place in the book (and now the movie) “Moneyball”. It was whether traditional scouting methods were more valuable (accurate) then the “new fangled” approach that valued statistics and computer projections. Our very own Sandy Alderson, and later Paul DePodesta (working as an assistant to A’s General Manager Billy Beane), were heavily involved in the genesis of this approach. While the book also focused on identifying and capitalizing on undervalued assets, it inadvertently documented the early stages of the statistical movement.

It is clear that regardless of what side you fall on (traditional scout or “stat geek”), the use of “Sabermetrics” is not going away (I personally think a blend of statistical analysis, along with the input of savvy baseball scouts is the way to go). So, instead of being resistant to change, it benefits all baseball fans to have a basic understanding of the statistical process, so that you can understand what sportscasters, analysts and even bloggers are talking about from time to time.

Per Mack’s previous request, I am going to try and produce a series of articles on a weekly basis, that highlight a statistical formula, or two. In doing so, I will also try to explain the formula, what it measures and why it is relevant. I am in no way an expert, with just a college minor in statistics, but I will do my best. I encourage you to ask questions, contribute comments, or to even introduce other statistical formulas that you would like to discuss.

Where to begin?

Part of me longs for the days when you could pick up a sports page and look at a traditional box score. It listed basics, such as batting average, runs scored and runs batted in, in the traditional format of AB - R - H - RBI. Now, the box scores are considerably more advanced, to say the least!

Some folks blame “fantasy baseball”, while others recognize it is a direct reflection of the aforementioned “statistical movement”.

I think it is important to understand the basics, before moving on to more advanced topics. Sort of like learning basic algebra, before moving on to “fun” classes like trigonometry and calculus. In that vein, my contributions will seem pretty simple at first, but will most likely get more complicated as we move along.

I think we should start with OPS and a newer derivative called OPS+. OPS basically stands for on base percentage, plus slugging percentage. The plus added to the original formula basically makes an allowance for specific ballpark factors, scaled to what is called the average. This allows for a more in depth comparison between different players, plying their craft in different ballparks.

Looking a bit closer, it makes sense to define both on base percentage and slugging percentage, in order to understand OPS and OPS+.

On base percentage (OBP) is basically the number of times a player gets on base, divided by the total number of times a player could have gotten on base. You might think of it as the old batting average on steroids (OK, bad choice of words). In the old way of figuring batting average, it was simply hits divided by at bats (excluding errors, walks and sacrifices since they are not official at bats).

To calculate OBP, you add the number of hits, walks and hit by pitches and divide that number by the total number of at bats, walks, hit by pitches and sacrifice flies. So, just on the surface, you can see how much more conclusive that is compared to just batting average.

For example, Player A has five at bats in one game. Let’s say he has one hit, draws two walks (we know it isn’t Jose Reyes then) and makes two outs. Subtract the walks and he was officially one for three on the day. Divide the one hit by three at bats and you have a batting average of .333 for that game.

That doesn’t really explain the overall impact of Player A’s day. Using on base percentage, you have to factor in the two walks, as well. So, you have one hit and two walks (three times on base) divided by the total number of official at bats (three), plus the two walks for a total of five. Three times on base divided by five chances equals an OBP of .600, or a much bigger impact on the team’s chances of scoring runs for the game.

Moving on to Slugging Percentage (SLG), as the second component of OPS. SLG simply put, is the total number of bases a player earns, divided by the total number of official at bats. For this calculation, walks are not included in total bases, nor are they included in official at bats.

So, returning to Player A for one moment, we know that he had one hit in three at bats (minus the two walks........and if you are reading this Jose, walks won’t hurt you, my friend).

We know that the player had one hit, but what type of hit was it? A single is not the same thing as a home run, which is why this statistic was created. To properly calculate SLG, you need to know what type of hit the player had, not just if the player had a hit.

So, a single is one base, a double is two bases, and so on. If player A hit a double in three official trips to the plate, then you take two bases and divide that by three at bats. Your slugging percentage for the day would be .667 (which would be great over the course of a season).

OBP and SLG are not difficult calculations, but you can see they reveal more about a player’s contributions then simply looking at a batting average. Adding the two together, as stated above, gives us OPS.

Player A had an OBP of .600 and a SLG of .667 in our example. The OPS then, would be 1.267, which would be fantastic for a season.

In the modern era, a “good” OPS is usually anything over .800, but that depends on what position the player is assigned and where they hit in the batting order (middle infielders are not usually the same as a first baseman, or a corner outfielder with regards to expectations and your leadoff hitter is not looked at the same as your cleanup hitter).

What about OPS+?

The basic principle is to take a player's OPS, adjust it for different ballpark factors and then put it on a percentage scale. When it comes to OPS+, 100 is the league average, 110 is 10 percent above league average, and 90 is 10 percent below league-average.

The actual number of ballpark variables involved makes the calculation complicated to write out here, but the actual arithmetic is simple;

OPS+ = 100 X (OBP/lgOBP* + SLG/lgSLG* - 1)

What the hell is that? Basically, you are adjusting OBP and SLG by the specific park factors. In other words, allowances have to be made for a place like CIti Field (harder to hit), when compared to the new Yankee Stadium (easier to hit).

So, you would expect Mark Texiera to have a lower OPS in Citi Field, then the one he produced in Yankee Stadium.

As far as the math is concerned, you can locate the listed adjustments if you are interested in doing the math. My point is to show you the overall statistic (OPS+) and how it is generated, so when you see it, you know what they are referring to.

So, if a player has an OPS+ of 100, they are average for their position, taking into account where they play. Anything over 100 is above average, under 100 is below average.

For reference, Matt Kemp of the Dodgers posted an OPS of .986 for 2011, which is an excellent number. However, his OPS+ was a staggering 171 this year! Which meant he was 71 percent above the average for his position, even when you consider that he played half his games in Dodger Stadium.

In closing, I hope that the basic calculations of OPS and OPS+ shed some additional light on the topic of Sabermetrics. I also hope that you can now look at those two statistics and further understand what they are meant to measure. It is one of many ways to look at a player’s impact on the game, or to see if a prospective player is worth more or less then his reputation.

6 comments:

Anonymous said...

Michael, good topic and good simple explanation of the basics. However, I need some more explanation of OPS+.

I assume lgOBP and lgSLG refer to the average OBP and SLG of all players in the league, right? If so, is the number of at bats and plate appearances of each player taken into account to arrive at a weighted average? So that a player with 500 ABs and 600 PAs would count twice as much as a player with 250 ABs and 300 PAs?

Alternatively, is each ball park assigned a "hitability factor" with pitcher's parks getting above 1.0 and hitter's parks getting less than 1.0, so that the players SLG and the lgSLG are weighted by the number of ABs each player has in each park. This would seem incredibly complex, but would result in a much more accurate OPS+ figure.

The question of OBP is even more complex, because it involves both hits and walks Is it any easier to draw a walk in Coors Field than at Citifield? Can they possibly develop a "walkability factor" for each park?

I'm very interested in this issue because I have long been a believer in the importance of OPS, and usually look at OBP and SLG separately, because I feel that OBP is more important to a player whose speed is his best tool, while SLG is most important for a power hitter. To wieght whether a player would do well at Citi as opposed to how he has been doing in his previous home, I look closely at his home and away splits. Perhaps a valid OPS+ measure would obviate the need ot look at so many other factors.

Mack Ade said...

Herb:

I'm not sure Mike is going to respond to this.

This is a re-print of a series he did around a year ago. There's very little to write about right now, I'm working at my other job today, and none of the other Mack's Mets writers have written anything in around a month.

Sorry.

Hobie said...

Michael, I have a question similar to enyherb’s.

How are the LgOBP* and LgSLG* calculated?

Are they based on total production, home team & visitor in a given park (e.g. CitiSLG = total bases in Queens/AB) then weighted with a Park Factor over the NL? If so, what determines the Park Factor outside of the local slugging data itself (the stuff you wish to “weight”)?

Hobie said...

Oops. Thought it sounded familiar.

Anonymous said...

Sorry, Mack. I thought Michael was one of your contributors. Are there more articles in the series that you plan to publish?

Herb

Anonymous said...

Mack and Hobie,
I did a little research and answered (at least partially) my own question. Actually the formula Michael quoted in his article is not quite right. Actually, the number arrived at using Michael's formula is then divided by a Ball Park Factor (BPF) that represents the degree to which the player's home park is more or less hitter friendly than the average of all the parks in the league.