What the Phillies and Rays Learned About BABIP

As strange as it may sound, Cole Hamels and James Shields have a lot in common. While one earned a championship ring and World Series MVP honors in 2008 and the other did not, both have strongly benefited and suffered from the effect of BABIP. BABIP is a stat that many analysts use to infer how much of a pitcher’s successes and failures are due to factors outside of his control, such as randomness and quality of defense. Generally speaking, the average BABIP is around .300 and tends to regress back to that over longer periods of time. For example, Adam Eaton‘s career average BABIP is only .005 higher than that of Roy Halladay, .298 to .293. Over 1,000 balls in play — Eaton will give up only five more non-home run hits than Halladay.

For Hamels, 2008 was a magical season. At 24 years old, he was the ace of the Phillies’ rotation. His team had broken a lengthy playoff drought the previous year, but were swept out of the NLDS by the Colorado Rockies with surprising speed. If the Phillies were to take the next step, Hamels needed to continue his progression. He logged over 227 innings during the 2008 regular season, finishing with an ERA barely above 3.00. He started the playoffs off in style, tossing eight shut-out innings against the Milwaukee Brewers and finished the post-season with a 1.80 ERA in 35 innings, helping his team win both the first and last games of the World Series against the Rays.

With all of his success in 2008, though, there was reason for pessimism going into 2009. His K/9 had declined from 8.7 in 2007 to 7.8 while his walk and ground ball rates stayed relatively static. Moreover, he benefited significantly from a .259 BABIP. ERA retrodictors such as xFIP and SIERA had him pitching at an ERA level more than a half-run higher, around 3.60.

Hamels spent a lot of time in the off-season on the media circuit, appearing on talk-shows and showing up at many media events to promote his and his team’s enormous success. Who could blame him? Unfortunately, he spent less time than usual getting in baseball shape in the winter, and it manifested in spring training and at the start of the 2009 season. In his first start on April 10, Hamels could not get out of the fourth inning, surrendering seven runs on 11 hits. Most importantly, he struck out only one of 22 batters faced.

While Hamels was not quite that bad over the course of the 2009 season, it certainly got him off on the wrong foot and his inconsistency snowballed. He pitched through the seventh inning only 10 of his 32 starts and did so on back-to-back occasions only three times. Nevertheless, the Phillies reached the World Series for the second consecutive season, but were ushered out by the New York Yankees in six games. Hamels’ post-season was awful compared to his showing in 2008. In four starts, he posted an ugly 7.58 ERA and never got through six innings. After an ugly start in Game Three of the World Series, Hamels was brutally honest with the media, saying, “I can’t wait for it to end. It’s been mentally draining.” Unsurprisingly, that didn’t sit well with Phillies fans, especially not after the Phillies lost in the final round.

In the off-season, Hamels received an enormous amount of criticism, quite surprising given how much praise he was given exactly one year ago. Fans and media accused the young lefty of being soft and for letting his prior success get to his head. The Phillies were in trade talks during the off-season and fans were hoping Hamels could be used to leverage Roy Halladay from the Toronto Blue Jays. Ultimately, the Phillies received Halladay and kept Hamels, but sent Cliff Lee to the Seattle Mariners for a handful of prospects in amounted to two separate trades.

Going into 2010, there was reason for optimism with Hamels, despite the amount of vitriol fans sent in his direction. Performance-wise, 2009 was almost identical to his 2008, but the results couldn’t have been more different. Hamels’ K/9 stayed at 7.8 and his BB/9 at 2.0 while his ground ball rate remained at 40 percent. ERA retrodictors had him in exactly the same spot as in 2008; in fact, his FIP at 3.72 was exactly identical in both years. Unfortunately, Hamels was done in by bad BABIP luck. His 2009 mark of .317 was nearly 60 points higher than in ’08. As a result, Hamels allowed more base runners and stranded fewer of them, finishing with a 4.32 ERA.

Sabermetrically-oriented analysts called for a return to form for Hamels, while Phillies fans dissatisfied with his ’09 performance gave up on him. Hamels took the criticism to heart, spending more time in the off-season keeping himself in baseball shape and even working on a new pitch, a cut fastball. Hamels’ 2010 started off on the wrong foot, finishing April with a 5.28 ERA after five starts. Still, Saberists urged for patience.

As if on cue, Hamels turned the corner, tossing eight innings of one-run ball against the St. Louis Cardinals on May 4. By the end of July, his ERA was under 3.50 and Hamels had taken it to the next level. Not only did Hamels’ BABIP regress (to .289), he improved in two areas: K/9 (9.1) and ground ball rate (45 percent). The cut fastball gave him another wrench in his already-potent fastball-change-curve arsenal. Where, in the previous two seasons, ERA retrodictors had him around 3.60, Hamels finished below 3.30 in 2010.

Most importantly, he showed up in top form for the playoffs, dominating the Cincinnati Reds with a complete game shut-out in Game Three of the NLDS and keeping the San Francisco Giants at bay in Game Three of the NLCS. The Phillies were not able to reach the World Series for the third consecutive year, but fans took solace in the fact that their young ace was back. They happily included him in “four aces” discussions along with Halladay, Lee, and recent acquisition Roy Oswalt.

The Tampa Bay Rays went through something similar with right-hander James Shields. Shields’ 2008 was phenomenal, both in the regular and post-season, but he appeared to take a step back in ’09. Unlike Hamels, though, Shields’ struggles amplified in 2010. He finished with a 5.16 ERA and made one ugly start in the ALDS against the Texas Rangers, allowing four runs and failing to make it out of the fifth inning. The Rays lost the series in five games to the eventual World Series runner-up.

Shields was the blame for many of the team’s woes. As a result of his awful regular season, fans were very unhappy when he got the nod against the Rangers in the ALDS. They had already referred to him as Big Blast James (as opposed to the Big Games James moniker he earned in 2008) and James Yields. One fan was so unhappy with him that she took umbrage with his heritage.

Shields, however, was significantly better, unbeknownst to many people. On a per-nine scale, he averaged nearly 1.5 more strikeouts and ERA retrodictors identified him as a third of a run better than in the previous two seasons and more than a run and a half better than his 2010 regular season ERA. He was undone by a lofty .341 BABIP. If anything, fans should have been quite optimistic about the right-hander, but such is the chasm between performance and results.

Shields has since returned to form. Thus far in 2011, he has increased his K/9 from 8.3 to 8.6 and induced grounders at a slightly higher rate. In 27 starts, he has a 2.96 ERA and a whopping 10 complete games — even challenging the esteemed Halladay and Lee in that regard. His SIERA is an astounding 2.99, telling us that his performance this year is quite real.

Many who entrench themselves firmly in the anti-Sabermetrics camp disregard BABIP because they feel it vastly underrates how much control a pitcher has over his fortune. As Hamels and Shields have illustrated, though, BABIP is actually an excellent tool that can help us more accurately assess a pitcher’s strengths and weaknesses. While there are certainly some pitchers that aren’t properly accounted for using BABIP (e.g. Matt Cain), it does its job well for its purpose. If more people take the time to understand and properly use this statistic, the less players like Hamels and Shields are unfairly hounded for events entirely out of their control.

Leave a Reply

*

24 comments

  1. gfw

    August 29, 2011 09:25 AM

    I still don’t understand BABIPs value if it has Halladay = Eaton. Exactly what does BABIP tell you if it can’t differentiate between these two poles of performance?

  2. bill

    August 29, 2011 09:30 AM

    That BABIP is not controlled by the pitcher, no matter how good or bad they are.

  3. Nick

    August 29, 2011 09:30 AM

    It’s saying that all things (luck-related) equal, Halladay is a better pitcher than Eaton based on statistics he can control like K/9 and BB/9.

  4. Nick

    August 29, 2011 09:32 AM

    Luckily the Phillies front office were irrational like so many of the fans when it came to Hamels’ 2009 season. Not trading Hamels was as big of a decision as some of the high profile trades/acquisitions the team has made over the last few years. Now hopefully (and I think they will) they lock Hamels up for many years to come.

  5. Nick

    August 29, 2011 09:33 AM

    oops, should have read “Phillies front office was NOT irrational..”

  6. gfw

    August 29, 2011 10:20 AM

    Thanks.
    Another question or two.
    How consistent is a team’s BABIP across its pitching staff?

    Is there some place I can look this stuff up myself?

    Thanks
    Gfweb

  7. Eric

    August 29, 2011 10:42 AM

    Bill, check your first paragraph. The eaton/halladay example you chose doesn’t mean that over 1000 batters, 5 more will get hits. It means that given 1000 balls in play for each pitcher, 5 more of halladay’s will go for hits, a very important distinction given the vast difference in their career walk rates.

  8. Tim

    August 29, 2011 10:59 AM

    Interestingly enough, changing it from batters faced to balls put in play doesn’t make a big difference in the final number of non-HR hits, since with the differing walk, HR, and K rates, they have similar rates of batters who put the ball in play. For ever 1000 batters faced over his career, Halladay has given up 216 non-HR hits, while Eaton has given up only 210 non-HR hits/ 1000 BF. Of course, Eaton’s extra walks and HRs to those batters mean a world of difference in effectiveness.

    Also, a higher percentage of Eaton’s non-HR hits that he gives up are for extra bases (27% of non-HR hits) than for Halladay (21% of non-HR hits for extra bases), which will make a difference on the bottom line as well.

  9. sean

    August 29, 2011 11:31 AM

    zack greinke has experienced a hamels 09 type season this year. he had the injury to start the year and he’s had terrible batted ball luck though a lot of runs scored in his appearances so he has wins. He hasn’t pitched deep into many games unlike he normally does, so if you are in a fantasy league try to buy greinke this offseason, if he returns to form like 2010 or 2009 you have an ace.

  10. Rob

    August 29, 2011 11:39 AM

    The part I can’t figure about the article is the reference to Hamels’ 2008 offseason – increased appearances, less preparation, less baseball ready. If that was true, and was an impact to his performance, then why was his ERA predictor the same in 2009 as 2008? If it was really just bad luck and not a product of pitching more poorly in 2009, why include this piece of information as you would be concluding his preparation was not a factor?

    Though I subscribe heavily to advanced metrics in baseball, the mere suggestion that he pitched as well in 2009 as 2008 is almost too much to swallow. If the cause was truly just “luck”, wouldn’t this have been more variant during the year – as opposed to consistently having poor outings? There doesn’t seem to be any reason I can see that a string of luck (good or bad) would continue for an entire season and postseason, but that is what you are suggesting happened in both 2008 and 2009 for Hamels. Hard to understand.

  11. Bill Baer

    August 29, 2011 12:00 PM

    I think I make several caveats that BABIP isn’t the only factor that influences a pitcher’s season, it’s just that a pitcher has comparatively little control over the conversion of batted balls into outs when compared to walk rate or batted ball splits.

    Many Sabermetric stats make the mistake of rounding out to too many decimal points, inferring accuracy that isn’t there. I think that’s the case with retrodictors (xFIP/SIERA) and WAR. While Hamels’ FIP was exactly the same in both years, I mentioned that only as an illustrative effect; elsewhere, I used general terms to describe his xFIP/SIERA (“around 3.60″). So, if Hamels had been in better baseball shape going into 2009, who knows how good he could have been? Maybe his xFIP would’ve been in the 3.30’s.

    As for luck, think about it this way: the odds of you flipping a coin and it landing tails is 50%. The odds of you flipping it five times and getting tails all five times is three percent, which is pretty low. When you do get tails five times in a row, you tend to focus on that event as a singular occurrence, without considering how many other events have been run prior. It’s ex post facto analysis. “There’s no way I could have flipped five tails in a row, that’s so rare!” But if you run 100 sets of five flips, you should come close to three sets of five consecutive tails, like everyone else.

    Bringing that back to Hamels, in 2009, don’t you think that, of the 67 NL pitchers who threw 100+ innings, we would find a few with abnormally high or low BABIP? Indeed, if you make a histogram of 2009 BABIP frequency buckets, you find 21 from .250-.289 and 15 from .320-.359 with an additional 31 from .290-319.

    EDIT: Just realized that those buckets are lopsided because I’m a dum-dum and grabbed one row extra accidentally in Excel. Still, the point should be clear otherwise.

    Essentially, we tend to look at outliers and try to rationalize them back into the norm, rather than simply viewing them as outliers.

  12. Rob

    August 29, 2011 01:03 PM

    Appreciate the thoughtful response. Understanding the spread of BABIP is helpful. Based on your sample of 67 pitchers, 36 landed in the abnormally high or low range. So, two consecutive seasons of unusual results (one high, one low) is roughly 25% likely to happen. Was thinking this would be a much smaller number – closer to your coin flip percentage, in which case two consecutive seasons of being an outlier would have been truly unlikely (0.03^2=0.0009). Helps put it in perspective. Guess this normalizes over the course of a career, as illustrated by Halladay vs. Eaton, but varies quite a bit season to season.

    Understand of course that he could have been better if he had worked out harder in the offseason – just wasn’t seeing it reflected in the xFIP results as a reason for poorer results in 2009 as compared to 2008.

  13. Brad.

    August 29, 2011 03:00 PM

    Just an illustration of local fans lingering suspicion of Hamels:

    DNL ran a poll question last week about who you’d rather have pitching a big game, Halladay, Lee, or Hamels. Halladay, rightly, got 50something percent, Lee got 30something, and Hamels 10 percent. Hamels has pitched great in big games several times, especially in 2008, but it’s like 2009 wiped that out of people’s minds. Lee, on the other hand, struggled in the postseason just last year. But then again, he hits the long ball.

  14. FanSince09

    August 29, 2011 03:41 PM

    Shields > Hammels

  15. LTG

    August 29, 2011 03:56 PM

    I didn’t realize “Shields” and “Hammels” stand for universal constants (like “g” or “pi”). Well, learn something new everyday. Maybe tomorrow I’ll learn that 3 has a color.

  16. SABR

    August 29, 2011 09:28 PM

    Bill – This is a blog, so it has to be entertaining, and for that I tip my cap, as I could not do what you do. But I was reading Rob’s post and it dawned on me that even someone as statistically inclined as you writes the same type of narratives on Hamels’ seasons as the mass media. I am not sure if you were doing this to give the mass media perspective on the season or if these are your thoughts, but your entire paragraph starting “Hamels spent a lot of time in the off-season on the media circuit” seems to be writing a narrative to the results (even though you later agree that his peripherals were identical) and cherry picking starts in April, citing his inconsistency throughout the season. I feel like this was almost Joe-baiting.

    Perhaps it is the only way of making it interesting enough to read – I am certainly not a writer. I just thought it to be perhaps the most interesting part of this post.

    It seems to me that the development of the cut fastball is what elevated Cole Hamels from very good to elite pitcher. He only had 2 pitches for most of 2008-2009 that he could throw effectively (with a sporadically used curve), and I believe that his success really took off when he learned a new pitch to keep batters guessing. While Hamels’ BABIP has come back to normal levels, his xFIP and SIERA have steadily fallen with a rising k/9, which to me says that he has grown for a reason outside of simply a regression in BABIP.

  17. Bill Baer

    August 29, 2011 09:44 PM

    I wouldn’t say it’s a narrative, but I do think Hamels’ own admission to lack of off-season preparation played a role, however large or small, in his lack of success in 2009. As I mentioned above, Hamels’ peripherals were the same, but what if he was in normal shape? Maybe it’s better. We can’t ever know, but we do know by his own admission he spent too much time granting media access and not enough time staying in shape.

    Not that that’s a direct criticism of Hamels, of course. Personally, I’d take the whole winter off to play video games, so I would never begrudge someone for not spending the majority of their off-season in the weight room.

  18. SABR

    August 29, 2011 10:21 PM

    Fair enough. If you are citing what he has said himself, then it certainly carries weight/merit that he could have done better to prepare himself for the season, although who knows what may have happened even if he did work out 50 hours more that offseason.

    I find it questionable in general to use perceived narratives to slight athletes, as is so often used as a crutch of bad journalism. Lack of caring is by far my favorite one – nothing is more entertaining than reading that a guy, who his whole life has performed better than 99.9999% of people at his sport doesn’t REALLY care about that sport according to . Don’t think you are doing this, just in general

  19. Bill Baer

    August 29, 2011 10:34 PM

    Totally agree, and I have gone after writers who have done as much. Mandy Housenick in particular takes cheap shots at players just for not being good interviews.

    As to his shape, I’ll look for more and better sources, but here’s one from Tom Verducci:

    Hamels is in far better shape than he was at this time last year, when he showed up flipping fastballs at 81 mph after he shut down his offseason throwing because of his huge innings jump in 2008. Hamels said he talked with buddy and fellow Year-After Effect victim Mark Prior about what happened to Prior after his innings jump in 2003; both agreed they paid a price for the extra work. This offseason Hamels threw all winter and his arm strength is back. The Phillies hope it translates into a better curveball for a decent third pitch. If not, they plan either to tighten the curve into something more like a slurve or just give him a cutter.

Next ArticlePhillies Best Individual Offensive Seasons, 2006-11