Making Consistency Count

Believe it or not, the much-maligned starting pitching of the Phillies was fairly consistent in 2009, despite:

  • Chan Ho Park quickly being removed from the starting rotation
  • Injuries to Brett Myers and Jamie Moyer
  • Relying on Rodrigo Lopez for a few starts from the back of the rotation
  • Cole Hamels’ struggles

You’ve probably heard broadcasters and writers alike talk about getting consistent starting pitching and how important it is to winning baseball games. Similarly, I would wager that most baseball fans would prefer a pitcher who goes six innings and allows five runs every time (7.50 ERA) over one who alternates between six inning, shut-out starts and six inning ten-run starts (also a 7.50 ERA).

I haven’t seen any studies done to verify a relationship between success and consistency, so I used the Play Index on Baseball Reference to sort through starting pitcher performances in 2009. Does starting rotation consistency play a factor in determining regular season success?

Consistency, of course, is a bit of a vague term. Generally, when we call a player “consistent” we mean that he isn’t Jekyll (a metaphor for good or bad, your pick) one day, and Hyde (the opposite) the next. The same can be said about teams. In other words, the player’s (or team’s) performance, if consistent, has low variance. Thankfully, variance is one of the things easily measured using statistics.

First, a list of the 2009 Phillies’ starters average game score with standard deviation. The smaller the standard deviation, the more consistent the player is. Game score, admittedly, is an imperfect metric, but it’s good enough to help us reach a conclusion. If you’re not familiar with game score, click here.

Click the image for a clearer and/or bigger version.

Of the starters with at least 9 starts (arbitrary starting point, but I wanted to include Pedro Martinez), Brett Myers was actually the most consistent and Cliff Lee was least consistent (note that both have slightly smaller sample sizes than the other mainstays). That doesn’t mean that Myers was better than Lee, of course; it just means that Myers’ performances were more similar to each other than Lee’s.

From a logical standpoint, we would prefer that our worse pitchers have the most variance and that our better pitchers have the least variance. Myers (one of the worst) and Lee (one of the best) are not exemplary of this logic. Consider:

  • If Adam Eaton, a terrible pitcher, puts up a 9.00 ERA (6 runs in 6 innings) every single time he pitches, that would require the Phillies’ offense to score at least seven runs in order to win — a tall task. The Phillies’ offense scored seven or more runs 50 times in 162 games in 2009, 31%. Roughly speaking, the Phillies would only win three out of ten games started by Eaton. Over 32 starts, that comes out to about a 10-22 record.
  • If Eaton instead started 32 games and alternated between 6 IP, 0 ER and 6 IP, 12 ER performances, the Phillies will win a lot more games. They scored exactly zero runs only 7 times in ’09 (4%) and scored more than 12 runs four times (2.5%). The Phillies will lose 4% of the 16 good starts (15-1) and win 2.5% of the bad starts (1-15). That comes out to 16-16, six games better than the inconsistent Eaton.

The reverse logic applies for great pitchers.

Note that the above examples leave out some factors, primarily bullpen performance. Also note that the Phillies had a well-above-average offense, so they are more likely to win despite poor starting pitching performances.

This chart is an illustration of the frequency of the types of game scores the Phillies’ starters had.

Compared to the other 29 teams, the Phillies were in the top-third with the highest average game score (50.5), and in the middle in standard deviation (17). Thus, they were in the top-third with the highest -1 STDEV game score (33.5) and +1 STDEV game score (67.5). What that means is that about 68.2% of the Phillies’ game scores were between 33.5 and 67.5. The Washington Nationals, on the other hand, had about 68.2% of their starts between 31.5 and 60.5.

But is there a relationship between consistency in starts and regular season success?

Note: The standard deviations were rounded off to the nearest .5 or .0, which does not affect the conclusion — it’s merely done for convenience.

In 2009, there was no relationship between starting pitching consistency and regular season success. Running the data for past years will make the conclusion more accurate, but the numbers would have to severely differ from 2009’s in order to produce a significant r-square. 0.0002 is extremely insignificant.

This finding, along with the logical approach above, can provide insight as to which types of starting pitchers teams should target in free agency and in trades. Teams can make consistency count (a little) by acquiring very inconsistent back-of-the-rotation starters and very consistent front-line starters.

Roy Halladay is a great example of a consistent ace. His average game score in 2009 was 60.5 with a standard deviation of 15.5. Cliff Lee, of course, would be an example of an inconsistent ace, with an average game score of 56 with a standard deviation of 23 (in his Phillies starts). To put that in perspective, 16% of Lee’s starts were worse than a 33 game score; 16% of Halladay’s starts were worse than 45. In other words, Halladay’s starts are skewed much more favorably than Lee’s, regardless of Halladay’s slightly better average.

Meanwhile, Luke Hochevar of the Kansas City Royals had a terrible 2009, led all starters (with at least 120 IP) with a 6.55 ERA. His average game score was 43, but had a standard deviation of 21.5. There are various reasons why Hochevar himself would not be an attractive target for a team looking to maximize the back of its starting rotation (his 2011 arbitration eligibility being one of them), but a team looking to fill the back of the rotation on the cheap should look for free agents in the same mold as Luke.

Compare ’09 Hochevar (AVG = 44, STDEV = 21.5) to ’07 Adam Eaton (AVG = 42, STDEV = 15). Hochevar is much more preferable among pitchers of that skill level.

If all of the numbers and statistical mumbo-jumbo was too much, here’s a Cliff’s Notes summary of the above:

  • Consistency within the starting rotation doesn’t matter much, as it does not correlate at all to regular season success
  • Teams can make consistency within the starting rotation work in their favor (thus making it matter a little) by…
    • Utilizing inconsistent starters in the back of the rotation (ex. Luke Hochevar)
    • Utilizing consistent starters at the front of the rotation (ex. Roy Halladay)

Leave a Reply


This site uses Akismet to reduce spam. Learn how your comment data is processed.


  1. Peter

    November 30, 2009 12:07 PM

    I wonder if game score Std. Deviation correlates with (I would assume negatively) Free Agent salaries. In other words, do teams pay for consistency? Or do they just pay for the production and leverage it as they see fit.

  2. Jeff Zimmerman

    November 30, 2009 11:06 PM

    I have actually done some work on this of pitcher consistency. I have done some work and need to get to publishing the results, but I found that the key is to game out come is run support and the game score. I found by dividing game scores depending on teams run support into Win, Maybe and Lose buckets the team’s chance of winning the game can be determined. The buckets are set at the 90% chance of a Win or a loss threshold. I have the 2009 Phillies around 5.0 runs/games. From what I have found so far, game scores of 35 or less will 90% of the time be a loss and game scores of 55 or more will be win.

  3. Patrick

    December 02, 2009 11:42 PM

    Actually, more or less this same thing was studied in brief a few months ago on Baseball Think Factory, in a post by Dan Szymborski.

    Ahhh… The “Stochastinator”, my favorite made up word from the world of sabermetrics. ๐Ÿ™‚

    From the looks of this, he found – in a theoretical study – essentially the same thing you found in your real world one.

    You essentially won’t win any games when a pitcher gives up 14 runs per nine, but you’ll hardly win any games when a pitcher gives up 7 either, so you’d much rather have a pitcher who gives up 1 run, then 14, than one who gives up around 7 every time.

  4. Patrick

    December 02, 2009 11:55 PM

    One other thought:

    I wonder if inconsistency is a “skill”, IE, is it repeatable? Does it correlate year to year?

    So, in your chosen measure, is the SD in a players game score something that correlates from year to you?

    I bet that correlation exists but could be very, very weak.

  5. Patrick

    December 02, 2009 11:55 PM

    That should be “year to year”, not “year to you”… Oops.

  6. Bill Baer

    December 03, 2009 02:35 AM


    The ubiquitous Eric Seidman e-mailed me after I posted this. He too studied the same thing in a similar fashion.

    To quote his message about his findings:

    1) Consistency doesn’t really correlate to success on any level–inconsistent guys were just as likely to have good numbers as consistent guys

    2) Consistency itself is inconsistent. Over a multi-year span, running an intra-class correlation, pitchers consistent one year were in no way guaranteed to be consistent moving forward, meaning it is not a repeatable skill.

  7. Patrick

    December 04, 2009 12:45 AM

    Ahh, thanks Bill!

    That’s exactly what I was wondering. Interesting…

    So, no point to trying to sign inconsistent bad pitchers! They might just have to settle for trying to sign GOOD pitchers…

  8. Bill Baer

    December 04, 2009 06:58 AM

    Well, if you have two bad pitchers of equivalent skill, you’re actually better off (theoretically) signing the inconsistent one.

Next ArticleFanGraphs: Fan Projections