|Batting AVG||HOME RUNS||RBI|
Only one player finished in the Top 10 of all three traditional Triple Crown categories, Albert Pujols. Pujols actually finished in the Top 3 in all of them – that’s one reason why he’s expected to easily win the MVP. But are these three stats an accurate way to estimate a player’s batting contribution? Batting Average measures the rate at which a player gets a hit per official at bat. But it ignores several key factors. First, not all hits are equal, but batting average treats a single the same as a Home Run. Second, it does not include walks, which means that it completely ignores the fact that players like Pujols reach base over 100 times that way.
What about Home Runs? They are certainly very important, but on their own, they don’t tell you nearly enough about a player’s offensive contribution. And what about RBIs? They are the stat that has historically correlated highest with MVP voting, but RBIs are a very team-dependent stat. To get an RBI, there usually has to be a batter or batters on base already (except for HRs). But there are very large disparities in the RBI opportunities for different players (see table below). Although Andre Ethier and Hanley Ramirez both finished with 106 RBI last year, Ethier actually had 56 more runners on base in his at bats than Ramirez. And what about hits that move runners around, but don’t actually score the run – those are ignored too. Finally, neither Home Runs and RBI take into account the number of opportunities that a player had. So looking at HR/RBI probably isn’t the best way to assess a player’s contribution to an offense.
|Batter||RBI||Runners on Base|
Evaluating Offense - Some History Runs Created In the late 1970s, Bill James introduced the Runs Created stat in his Baseball Abstract. The basic concept is that scoring runs involves two things – getting runners on base, and then advancing base runners. At the team level, these two concepts can be quantified by On-Base Percentage (how often players reach base safely) and Slugging Percentage (total bases per at bat). The first Runs Created formula was simply OBP*SLG*AB, and remarkably, this simple formula usually predicts how many runs a team scores within 5%. Here are the results for 2009, where Runs Created is predicted by the simple formula RC = 0.98 * OBP * SLG *AB.
|Team||AB||OBP||SLG||RUNS||RC RUNS||% Error|
As the table shows, the simple formula comes remarkably close to the actual values. Combining Runs Created with the so-called Pythagorean Theorem (Team Win Percentage = RS^2/(RS^2 + RA^2), where RS = Runs Scored and RA = Runs Allowed) were a revelation for most baseball fans. They allowed fans to make fairly accurate estimates of things like “How many more runs would the Braves scored if Bob Horner had been healthy?” or “Suppose the Giants could replace Johnny LeMaster with Dave Concepcion?” Although this simple RC formula gets pretty close to the actual runs scored by a team, the formula has been refined over the years to have unique coefficients for singles, doubles, triples, and HR, and to include SB and CS. Variations of this are RC/27, which estimates the runs scored per game, or 27 outs.
Linear Weights A second method for quantifying a player’s offensive contribution, based on linear weights, was developed by George Lindsey in 1963. Using play-by-play data, Lindsey quantified the run-scoring value of each event. This technique was later expanded by Pete Palmer in the book Total Baseball, using game data as well as simulations. LWTS = 0.46*1B = 0.80*2B + 1.02*3B + 1.40*HR + 0.33*(BB+HBP) + 0.30*SB – 0.60*CS – 0.25*(AB-H) Linear Weight models are the basis for many current batting evaluators, including Equivalent Average (EqA) and weighted On-Base Average (wOBA).
Evaluating Offense - Today
OPS Since team runs can be estimated so well by only two parameters, OBP and SLG, naturally people looked at these two stats at the individual level. And, to simplify the math, adding the two numbers together became a common way to quantify a player’s offensive contribution – OPS=OBP+SLG. Here are the leaders in these three stats for the NL in 2009.
Although OPS correlates very well to Runs Scored, it’s not the best run estimator around. But it is popular because it is easy to calculate. But more refined calculations have shown that OBP should be weighted higher than SLG, rather than just adding them equally. And the fact that SLG treats a HR as 4 times better than a single isn’t quite right; people have since found better coefficients.
Win Shares were presented by Bill James a few years ago in his book of the same name. Three notable things about Win Shares are: 1) The sum of the individual player’s Win Shares match up with the total Wins by the team. 2) The stat incorporates “clutch” stats such as hitting with RISP and hitting HR with runners on base. 3) Play-by-play data is not used for the defensive evaluations; rather, the totals for assists, putouts, DPs, errors, etc. are used. 4) There are no “Loss Shares” or negative Win Shares for players who play below replacement level.
These are significant because the other stats (WAR, WARP) are based on the components (2B, HR, BB, etc.), not the actual team wins. So teams and players who win more games than predicted (presumably for better clutch performance) get extra credit for it. James assigned 3 Win Shares per team win. The first step is to divide the team’s win shares between offense and defense, based on the team’s relative strengths from a Marginal Runs calculation (including park effects). Then, within this, Win Shares are awarded to each player based on their contribution.
Offense – Once we know how many Win Shares the team has in total, this total is divided among the players based on the fraction of the team’s runs they created using the latest Runs Created formula. James also includes some “clutch” and RISP numbers in his formulas.
Defense - There is a different set of formulas for fielding Win Shares at each position. The stats used are things like Assists, Putouts, Errors, Double Plays, etc. Fielding Win Shares are based on cumulative stats, not play-by-play data.
Pitching - For pitchers, the Win Shares are allocated using a component ERA method – calculating a prediction of how many runs the pitcher gave up based on the number of singles, doubles, etc. that he allowed – sort of an inverse Runs Created formula. Again, there are correction factors for things like Saves tacked on at the end. More details on Win Shares can be found here.
Here are the 2009 NL Leaders in Win Shares (from Bill James Online):
2009 NL Win Shares Leaders
WARP (Wins Above Replacement Player) WARP is the stat used by Baseball Prospectus to combine all of a player’s stats into Wins above a Replacement Player. WARP is not directly connected to the actual number of wins that a team had, but is based on summing up the individual performance of each player. Offense - Offense is based on BP’s stat Equivalent Average. This is a park-adjusted linear weights-type formula that converts all of a player’s stats into a number that is scaled to the same scale as batting average, so that an average player is at .260 EqA. This can then be converted to Equivalent Runs, and to Runs Above a Replacement player. Defense – BP uses “Seasonal Totals” rather than Play-by-Play data. Adjustments are made for the nature of the pitching staff (LH/RH, GB/FB). Pitching – BP bases their pitching wins on stats based on park-adjusted ERA and Innings Pitched, as discussed in the VORP section of LINK.
2009 NL WARP Leaders
Wins Above Replacement (WAR)
WAR is the stat used at FanGraphs to rank players. The key differences to note are: 1) Defense is evaluated based on play-by-play data, not seasonal data, using Ultimate Zone Rating (UZR). 2) Pitching is evaluated based on Fielding-Independent stats (see 2009 NL Cy Young article). 3) A positional adjustment bonus is given to difficult defensive positions. 4) Catcher’s Defense is not yet rated, so all catchers are rated equal defensively. Offense – Offense is evaluated based on weighted on-base average (wOBA). This is also a park-adjusted, linear weights system, scaled to match on-base percentage. Defense – Defense is quantified using Ultimate Zone Rating. UZR divides the field into 64 zones, and counts how often the player makes plays in his nearby zones. Park factors, the speed of the batted ball, the handedness of the pitcher, and the flyball/groundball nature of the pitcher are also considered.
Pitching – Pitchers are evaluated based on FIP, rather than actual ERA. The logic here is that FIP is a better assessment of the pitcher’s contribution than ERA, which involves the interaction of the defense and relief pitchers. 2009 NL WAR Leaders
Summary Win Shares, WARP, and WAR all attempt to combine all of a player's stats into a single number, on the scale of Wins. Win Shares is the only one that uses actual Team Wins, while the other two assign Wins based on the individual component terms. However, the way that Win Shares divides shares between offense, fielding, and pitching is very complicated, and often quite arbitrary. All three methods have approximately equal methods to evaluate batting. However, WAR uses fielding-independent pitching, rather than the real ERA or hits allowed. Is this a better way to isolate the pitcher's performance? The jury is still out of this one, as we saw in some controversial ballots in the Cy Young voting. WAR is also the only one to evaluate fielding based on actual play-by-play data, rather than seasonal fielding totals. However, since Win Shares and WARP use seasonal defensive totals, they can be calculated throughout baseball history, while WAR is restricted to modern data. The table below summarizes the NL leaders in the stats of Win Shares (from Bill James Online), WARP (from Baseball Prospectus), and WAR (from FanGraphs), with their rank in parenthesis.
My NL MVP Ballot
|1||Albert Pujols||8.4 (1)||12.1 (1)||13.0 (1)|
|2||Chase Utley||7.6 (3)||8.6 (3)||10.5 (6)|
|3||Hanley Ramirez||7.3 (4)||7.3 (10)||11.4 (4)|
|4||Prince Fielder||6.8 (6)||7.9 (6)||11.9 (2)|
|5||Tim Lincecum||8.2 (2)||7.4 (9)||7.5 (18)|
|6||Matt Kemp||5 (15)||7.9 (5)||8.7 (9)|
|7||Adrian Gonzalez||6.3 (8)||9.2 (2)||11.3 (5)|
|8||Troy Tulowitzki||5.4 (12)||6.3 (18)||8.0 (11)|
|9||Pablo Sandoval||5.2 (14)||5.8 (24)||9.0 (7)|
|10||Ryan Howard||4.8 (18)||5.4 (28)||8.8 (8)|
|11||Ryan Braun||4.8 (19)||6.8 (15)||11.3 (3)|
Pujols is the clear #1, in every ranking system. I have the middle infielders, Utley and Ramirez, at #2 and #3, with Utley's defense just edging Ramirez's offensive advantage. Prince Fielder gets the #4 slot, while Lincecum is the highest ranked pitcher at #5. I have the 6-11 slots filled with Kemp, AGonzalez, Tulowitzki, Sandoval, Howard, and Braun, although I expect that Tulo will finish much higher in the actual voting.]]>
|Jorge De La Rosa, COL||9.00||4.38||16-9|
|Braden Looper, MIL||8.97||5.22||14-7|
|Max Scherzer, ARI||8.35||4.12||9-11|
|Derek Lowe, ATL||8.14||4.67||15-10|
Many pitchers were also on the unlucky side of the run support stat. Take Clayton Kershaw, for example, who finished 8-8 with a 2.79 ERA in 171 IP. The conventional wisdom is that Kershaw’s low Win total was because he ran up high pitch counts and couldn’t go deep into games. While that was true to some extent, Kershaw was also remarkably unlucky in his starts. For example, on 7/29 against St. Louis, Kershaw threw 8 scoreless IP, but left without a decision. Pitchers in that situation received Wins in 68 out of 77 such occurrences (88%) in 2009. On two other occasions, Kershaw threw 7 scoreless IP without a Win (the starting pitcher earned a Win 81% of the time last year), and two more times went 6 shutout innings without a Win. For the season, Kershaw had 11 starts with at least 6 IP and 1 ER or less, yet was only awarded wins in 5 of them. Aside from Run Support, another big factor in Pitcher Wins is the Bullpen. Take another Dodger, Chad Billingsley, for example, who finished 12-11 with a 4.03 ERA and was eventually dropped from the rotation in the playoffs. Yet there were 5 games in 2009 where Billingsley left the game with the lead, and saw the bullpen give the game away after he left. I am pretty sure that if Billingsley’s record had been 16-8 or 17-9 instead of 12-11, the fans and Joe Torre would think more highly of Billingsley.
ERA and ERA+ Since it’s clear that Wins can be deceptive, it may be better to look at a pitcher’s ERA, which eliminates the Run Support factor and reduces the bullpen effect. Here are the NL leaders in ERA in 2009: 2009 NL ERA Leaders Chris Carpenter, 2.24 Tim Lincecum, 2.48 Jair Jurrjens, 2.60 Adam Wainwright, 2.63 Clayton Kershaw, 2.79 This is clearly a much better group of pitchers than the Win leaders, which included de la Rosa and Lowe in the Top 5. But there are several things that ERA doesn’t account for. One of these is the pitcher’s home park, since some stadiums are easier to score runs in than others. ERA+, or Adjusted ERA, tries to account for this, by scaling a pitcher’s ERA with a Park Factor for each stadium. ERA+ is also normalized to league average (ERA+ = 100*(lgERA/ERA), adjusted for ballpark), so a score of 100 is average. This makes it useful for comparing players across different seasons and eras. Here are the leaders in ERA+ in 2009: 2009 NL ERA+ Leaders Chris Carpenter, 183 Tim Lincecum, 176 Jair Jurrjens, 158 Adam Wainwright, 157 Matt Cain, 151
VORP Another factor to consider with ERA is that it is a rate stat, and not a counting stat. A pitcher who gives up 1 ER in 6 IP has an ERA of 1.50, but is clearly not as valuable as one who pitches 250 IP with an ERA of 2.50. But would a pitcher with 200 IP and an ERA of 2.30 be worth more than the 250 IP-2.50 ERA pitcher? VORP (Value over Replacement Pitcher, developed by Keith Woolner at Baseball Prospectus) is a stat that combines the pitcher quality (runs allowed) with the quantity of innings pitched. The idea is to calculate how many runs this pitcher saved compared to a “replacement-level” pitcher. So the formula for VORP is VORP = (Replacement_Level – RA)/9*IP, where Replacement_Level is generally defined as around 40% higher than league average for starting pitchers. Here are the 2009 VORP leaders in the NL: 2009 NL VORP Leaders Tim Lincecum, 69.8 Chris Carpenter, 68.7 Adam Wainwright, 67.1 Matt Cain, 61.3 Jair Jurrjens, 60.5 Dan Haren, 60.2 Again, the Top 3 of Carpenter, Lincecum, and Wainwright are very close.
SNWL SNWL (Support-Neutral Wins and Losses) looks at a pitcher’s performance on a game-by-game basis, rather than over the season total of ER and IP. For each game pitched, it calculates the probability that the team would win the game, assuming a league average offense and bullpen. So given the IP and Runs Allowed by the pitcher in that game, we can find the probability that the team should win (This is similar to the discussion above with Clayton Kershaw). SNWL is reported as either as a Win-Loss record, or can be converted to a “Value over Replacement” scale, SNVAR. Here are the SNWL and SNVAR leaders in 2009 for the NL:
FIP (and xFIP) A big split in the evaluation of pitchers came with stats like DIPS (Defense Independent Pitching Stats, by Voros McCracken) and FIP (Fielder Independent Pitching, by Tom Tango). The stats listed above (ERA, VORP, SNWL) are all based on the actual runs given up by a pitcher. However, it is clear that ERA involves many players besides the pitcher – namely the defense behind him. A good defense will obviously make more outs and give the pitcher a lower ERA. So how can we isolate the contributions of the pitcher from the defense behind him? One attempt to isolate the impact of the pitcher alone is FIP. FIP removes the effects of the fielders, and only looks at the things that the pitcher has control over – strikeouts, walks, HBP, and Home Runs allowed. The formula for FIP is: FIP = (HR*13+(BB+HBP-IBB)*3-K*2)/IP (plus a scaling factor to match the scale to that of ERA or RA)
Why do analysts think this is more useful that looking at things like ERA? One, because it eliminates the huge variability of fielders from the equation. And two, because it turns out that using FIP is a better predictor of future performance than ERA (or ERA+). That is, a pitcher who has managed a low ERA despite high BB and HRs (hence a high FIP) is much more likely to see that ERA rise in the future than one with the same ERA and a low FIP. So it may be a better evaluator of a pitcher’s true performance and skill than ERA.
Here are the leaders in NL FIP in 2009: 2009 NL FIP Leaders Tim Lincecum, 2.48 Javier Vazquez, 2.77 Chris Carpenter, 2.78 Josh Johnson, 3.06 Clayton Kershaw, 3.08 Adam Wainwright, 3.11
xFIP adds one more level of correction. The rate at which pitchers give up Home Runs is primarily a function of their fly ball rate and the home park. So is a pitcher who gives up a lot of warning-track fly balls showing a skill or just getting lucky? The research indicates that it’s probably just luck, and isn’t likely to continue. So xFIP takes out the pitcher’s actual HR rate, and uses the fly ball rate instead, assuming that an average percentage of fly balls will result in Home Runs.
Here are the leaders in xFIP for the NL in 2009 (from The Hardball Times 2009 NL xFIP Leaders Javier Vazquez, 2.89 Tim Lincecum, 2.94 Dan Haren, 3.16 Ricky Nolasco, 3.29 Josh Johnson, 3.42 Adam Wainwright, 3.45 Chris Carpenter, 3.45 By the way, Zach Greinke, who deservedly won the AL Cy Young award despite only 16 wins, is a big fan of FIP. After winning the award, Greinke was quoted in the New York Times as saying "That’s pretty much how I pitch, to try to keep my FIP as low as possible."
tRA True Run Average (from Graham MacAree) is similar to FIP, in that it attempts to isolate pitching from the defense. The primary difference is that it divides batted balls into ground balls, fly balls, and line drives. By looking at the vast amounts of data available, the expected outcome for each type of batted ball has been determined:
|Strikeout||Line Drive||Ground Ball||Fly Ball (OF)||Fly Ball (IF)|
This table shows that (for 2008), 83% of fly balls to the outfield were converted to outs, while 98.5% of fly balls to the infield became outs. From this data, the expected run outcome of each type of batted ball can be calculated. Then, since we have the batted-ball breakdown that each pitcher allowed, we can calculate the expected, or true, Run Average, given an average defense behind the pitcher. 2009 NL tRA Leaders (from Fangraphs) Tim Lincecum, 2.83 Chris Carpenter, 3.02 Clayton Kershaw, 3.36 Josh Johnson, 3.41 Adam Wainwright, 3.56 Javier Vazquez, 3.67 One interesting fact from the tRA data is that Javier Vazquez goes from the best in xFIP down to #6 in tRA. This is because his line drive rate of 23.6% was one of the worst in the league.
Conclusions 2009 has several pitchers who were very close in performance:
Carpenter has the lowest ERA of the group, but also the second fewest innings pitched. Wainwright has the most Wins and IP. Lincecum is 2nd in ERA, but led the league in strikeouts, and led all of the fielding-independent stats. It's very close, but I would rank them: 1. Tim Lincecum, SF 2. Chris Carpenter, StL 3. Adam Wainwright, StL]]>