Attention stats/analytics-oriented community: I’m seeking your help for a project…
If you were to want more “mainstream” fans/media to be more aware of/better understand advanced metrics, can you rattle off 3-5 you think are most important. You can tweet them to me or email me at email@example.com …. Your help would be much appreciated. Thanks.
Great question from Evan Grant. I’m going to answer it, even if the tweet was posted through Sulia.
Actually, I’m going to cheat and change the question a bit. Instead of talking about 5 metrics, I’m going to talk about 5 concepts. There are often many stats that measure the same thing, and their usage is dictated by nuance. But since we’re not at the level of nuance, let’s focus on the big picture. My top 5 important “saber” concepts:
- Runs are runs are runs (are wins). Bartering isn’t so popular these days. Instead, we measure the value of goods and services in currency, at least in part because it gives us a common unit of comparison. In baseball, runs are the common currency. Hitting helps add runs. Baserunning helps add runs. Outfield assists help reduce runs. If you want to compare different players with different skills, you need a common currency. So, measure hitting in runs, baserunning in runs, and defense in runs. Then add it all together.
If we don’t do this, players who contribute a little in a lot of categories become underrated, such as Kenny Lofton (the only player ever to be 100+ runs above average in hitting, baserunning, and fielding). This doesn’t require a formula for everything, either. Feel free to eyeball defense, for example.
For more, check out the WARs at B-Ref, Fangraphs, and Baseball Prospectus.
- Defense = position + fielding. There are lots of fielding stats these days. But they only measure fielding relative to position. Don’t forget include the position. For example, a -5 fielder at SS is still much better than a +5 fielder at 1B.
It’s really easy to find a 1B with a decent bat, and much harder to find a CF with a decent bat (hence BJ Upton’s $75M contract.) Simply playing a difficult position provides value, allowing better hitters to play the easier positions. Playing a tough position well and/or hitting well while playing that position are often just icing on the cake.
For more, check out position adjustments throughout history.
- Shit happens. Or, more technically, variation and regression to the mean happen. A baseball season isn’t that long of a time. A month, a week, or a single game of a baseball season is nothing. Which means the stats produced in these short time frames are next to meaningless when predicting what will come next. Hitters guess right for a while, then guess wrong. Pitching have the feel for a curve ball, then lose it. We can always create plausible theories for why variation happens, but the default theory should always be “we don’t know”. When predicting the future, ignore recent happenings, and don’t overfit stories to data.
For more, enjoy this: http://xkcd.com/904/
- Linear weights »> AVG. Getting hits is just one piece of the puzzle. There are other ways to get on base and some hits are better than others. With some math, we can measure the relative importance of singles, triples, GIDPs, etc. We can even measure their relative importance in different situations (runner on third, no outs vs bases loaded, two outs). That’s linear weights (and OPS and wOBA and TAv and RE24 and WPA, to varying extents.) Instead of measuring a hitter’s ability to get a hit, we should measure a hitter’s ability to help his team score more runs. It’s more complete, more accurate.
For more, check out wOBA and RC+ at Fangraphs. Or if you’re adventurous, I love this series of articles on how runs are really created. (Note: you never have to use or think about the term “linear weights” if it seems dumb or confusing.)
- Context matters. The league ERA in Sandy Koufax’s last season was 3.61. His home park helped pitchers. The league ERA in Pedro Martinez’ best season was 4.91. His park hurt pitchers. They posted 1.73 and 1.74 ERAs, respectively. Ergo, Pedro’s ERA is way more impressive. Not all context differences are this obvious, but even a bunch of small differences adds up.
The AL is different from the NL. Every home park is different. The population able/allowed to play baseball has changed. Rules have changed. The ball has changed. Numbers require context, even if we can’t always measure the context (although we can, a lot of the time.)
The numbers aren’t always right, and they don’t need or claim to be. It’s the concepts behind them that are important, and these concepts are pretty solid. In fact, I’d be happy to read articles that account for these concepts but that don’t use any specific stats or numbers.
Thanks for reading. Always happy to continue the discussion or answer more questions.
Update: some good suggestions for additional concepts via Twitter friends:
- Money matters. Players are a combination of their performance and salary.
- Runs prevention != pitching.
- Don’t confuse team stats with individual stats. Much of what we measure for an individual player is heavily reliant on his team. For example, ERA and Wins for pitchers (dependent on defense and/or teammates’ hitting) and RBI/Runs (require teammates to drive in/teammates to drive you in.)