Please Stop Calling Cliff Lee Streaky

A week after I wrote that the NL Cy Young might be a two man race, Cliff Lee appears to have pitched himself back into contention. The night after that post went live, Lee capped off a brilliant August with 8 and 2/3rds shut out innings against the Reds, and, on Monday, followed that up with his league-leading 6th complete game shutout of the year against the Braves. His numbers now fall right in line with those of Halladay and Kershaw, and it’s impossible to exclude him from any discussion about the NL’s best pitcher in 2011.

Lee’s dominance this season, at least to me, has seemed under-advertised by media and fans. It’s understandable, to an extent. Everyone breathlessly waits to see what Roy Halladay can do next, and for good reason. Cole Hamels is presently filling in the zeroes on his next contract with each gem of a start, and, anticipating that he’ll stick around, we want to feel out just how devastating his new repertoire can be, as if 2010 wasn’t evidence enough. Vance Worley is the new rookie surprise story, on a staff that was hardly wanting for reasons to watch. Added to all that, Lee began the season with some starts that were a strange mix of high strikeout and earned run totals, just when the expectations for the new mega-rotation were fresh and uncompromising.

All of these factors have contributed to a strange notion that I’ve seen in more than a few places: Cliff Lee is “streaky,” or “inconsistent.” I’m not making this up:

Dare to say it; Lee has been somewhat inconsistent this season, with two historic months of dominance surrounded by some fairly modest months of performance.

He is at times the best pitcher in the world, and during others he’s just another pitcher. If you look at his monthly splits this season, Lee has put in two months of ridiculous, epic and historic work.

I want to stress here that this is not meant as a dig at Philadelphia Sports Daily or Jim McCormick. He’s a very good beat writer — one of the best, and one of my favorites, actually. This was just the easiest example to cite. Bill Petti, a writer at Beyond the Boxscore whose work I also enjoy, wrote about it too. They’re simply elaborating on something that a lot of other people on blogs, twitter, radio, newspapers, and in broadcasts have said at some point or other this season. In most cases they’ve said it because of this:

It’s easy to see why someone would look at this data and conclude that Cliff has had, at the very least, a strange year. The June through August stretch is particularly schizophrenic, at least when measured by ERA. Of course, ERA never tells the whole story. His BABIP fluctuated wildly over that period, from .191 to .359 to .237. His strikeout rate was actually at its lowest during his incredible June, and was lower in his 0.45 ERA August than it was in his 4.18 ERA April. A simple results-based evaluation is insufficient; it’s much more complicated than the number of earned runs he has allowed from one month to the next. If we fade ERA out a bit, and add FIP and xFIP to the above graph, this becomes all the more obvious:

That smoothes things out quite a bit, doesn’t it? For one thing, the big split between his FIP and xFIP in July indicates that home runs allowed per fly ball was the source of his outcome woes that month, and indeed that metric was severely inflated, at 18.8%. If his ERA had fallen in line with his FIP or xFIP, no one would seriously accuse him of streakiness. The first graph now looks like a superficial take, at best. We can’t say for sure that there weren’t some perfectly good reasons for his BABIP and HR/FB fluctuations month-to-month (in particular his pop-up rate spiked heartily in his low-BABIP August), but, really, isn’t that the point? When you chunk data out into such small samples, you’ll end up with the murkiest of portraits no matter what brushes you use.

This is especially true when the criteria for that chunking is as entirely arbitrary as calendar months. The Gregorian calendar was rolled out by the head of the Catholic Church almost 500 years ago, primarily because the previous calendar had a nasty habit of shifting the Spring equinox further out of alignment with Easter each year. Baseball evolved gradually in 19th century America from a variety of ancestral stick-and-ball games. The two have nothing to do with one another. There is no reason that Cliff Lee’s pitching ability should have anything to do with the ambitions of Pope Gregory XIII, or the orbital mechanics of the Moon. As Twitterati member @Everybody_Hits noted a while back, you can redefine the calendar months and Cliff Lee’s “consistency” problem disappears. What if each month began on the 25th instead of the 1st?

Now, instead of the wild month-to-month sine wave, Cliff has had a great 4 month stretch from April 25th to August 24th, bookended by a decent March/April and two fantastic August/September starts. It’s impossible to call this an up-and-down season. Even in moving the endpoints, though, we are still submitting to the tyranny of the Moon, sticking with 30-day periods to define our months. Again, there is no reason why we should do this. It has just as much to do with baseball as migratory bird patterns and seasonal wheat harvests. So, hey, let’s break out Lee’s performance according to the rotation of Lambda Andromedae, a G-type giant binary star located approximately 84 light years from Earth. It happens to have a rotational period of 54 days.

Now, if I were to claim that Cliff Lee draws his pitching abilities from the machinations of a distant star, strengthening his powers with each full rotation, I’d have just as strong a set of empirical legs to stand on as those that would look at his monthly splits and call him streaky. In analyzing baseball, we’re constantly limited by the fuzziness introduced by small sample size even when working with a full season of data (especially for pitchers). Splitting it up further only amplifies the problem. Anyone who chooses endpoints, be it a fan, writer, or broadcaster, does so with a certain agenda in mind, whether they know it or not. Even from those endpoints that seem perfectly natural on their face — monthly splits being an excellent example — there can emerge great thickets of coincidence that masquerade as narratives. If we fail to apply the utmost scrutiny to these, we may allow single season gems like Cliff Lee’s 2011 to be muddied with baseless criticisms, and that would be a true shame.

WAR Back in the News

At fellow Sweet Spot blog It Is About the Money, Stupid, Hippeaux has a post up critiquing the Sabermetric statistic Wins Above Replacement and its widespread use (or, in his estimation, misuse). Naturally, this spurred a lot of debate on the Internet. Among many others, Rob Neyer and Tom Tango have rebutted the IIATMS article.

I don’t want to rehash the debate as most of it has been said before. However, I read a comment on the Baseball Think Factory thread that I’d like to share, as I thought it was quite good, written by the user named “PreBeaneAsFan”.

I think this is a problem that I see a lot not just in relatively unimportant venues like sports, but also in more important arenas (popular discussions of science, economics, etc.) People correctly point out that we don’t have precise answers and that our best quantifications have error bars that are [larger] than the number of decimal places reported. That’s a valuable insight and worth discussing, but then people take it a step further and use that as an excuse to remain completely agnostic on things. By denigrating the best efforts of others to quantify difficult questions and insisting that “I don’t need all that fancy stuff, just give me the basics and I’ll take my own guess since no one knows” they give themselves a feeling of smugness and superiority to those bookish nerds vainly searching for answers they can’t pin down, but they also throw away valuable information that the effort to quantify those things tells us and in most cases behave as though the uncertainty is much greater than it actually is.