On The Future of Baseball Research

Earlier this morning, Adam Felder and Seth Amitin posted, in part, the results of a much-awaited study on the potential understated bias of the language of baseball television coverage at The Atlantic. When I made my thoughts on the subject clear here a few days ago, I was wishing desperately that this study had already been published, but now that it has, you can go read a little empirical justification for that thesis.

I don’t know Felder at all, and my interactions with Amitin have been limited to trading Dodgers jokes on the internet, so I’m not saying this out of a desire to pump up a friend, but you need to read that article. It’s important not only because of what it says, but because it represents of an underserved portion of baseball writing.

Most of you probably know this about me, but I spent three years as a political science grad student, and in that time I probably learned more about statistics, game theory and research methods than I actually did about politics, but I learned a great deal about what separates actual research from conjecture and speculation.

I think one of the best things about the advanced analytics movement in baseball is that it’s brought the rigor of social science research to sportswriting. It’s not perfect, but the average baseball fan knows way more about how to read statistics than he or she did ten or even three years ago. We’re slowly stamping out falsehoods based on preconceived notions whose factual underpinnings are either obsolete or nonexistent, and the positive effects of this movement cannot be overstated.

As scouting information gets democratized, as we debunk concepts like “clutch” and “small ball,” we’re replacing mythology with empirical study. I think this is, in part, why many former athletes and traditional sports media personalities hate advanced metrics and bloggers–they know the mythology and we’re killing God, so to speak. As someone who believes religion and science can co-exist in the real world, I think that creates a false choice when it comes to baseball, but that’s another story.

So why is this study so important? Because it’s empirical baseball research based on something other than game data. You can find enormous amounts of research based on game statistics, pitch f/x and BIS coding. And as much is out there, and as many conclusions as have been drawn by the public, you can bet that teams have even more.

But where we’re lacking, in my mind, is in qualitative analysis. Felder and Amitin’s study is still qualitative, but it’s based on coding of commentary, not box scores. That’s how we’re going to effect change–if media analysis is backed up with large-sample data from which we can draw meaningful conclusions.

Now, this study isn’t perfect. Even if all the concerns I have about their methodology (which is detailed in the post enough for a magazine article but not for a work of social science) are unfounded, what happens when you expand the sample? Or when you turn your attention to print media? Pre-game and post-game analysis? I buy the basic premise (partially, I fear, because I believed in their conclusions before the study), but it raises more questions than it answers. Which is kind of the point–you want knowledge that’s going to generate more knowledge.

So why don’t we have more work like this? Well, it’s absolutely not cost-effective. Game data leads to research that’s either valuable commercially (to ESPN, FanGraphs, Baseball Prospectus or whoever) or competitively to a team. But the only kind of qualitative data or media data that’s valuable (that we know of) is scouting data, and as much as I respect people who can evaluate young players and write coherently about them, I don’t think we’re drawing any scientifically rigorous conclusions there.

On the other hand, doing this kind of research right is expensive (it took upwards of $3,000 to fund this study) and requires people who know what they’re doing. As often as not, those people are doing real social science instead, or their work is stuck in academic journals and either unavailable to the public or off the beaten path. Make no mistake, it exists, but its effects aren’t showing up in places the average baseball fan is going to see it. I’m not sure what the solution is, but even though baseball produces more and better numbers than any other sport, we shouldn’t restrict serious baseball research to what we can count.

The Kevin Frandsen Illusion

Kevin Frandsen went 2-for-4 with two singles in yesterday’s series finale with the Washington Nationals, raising his average to .351 in his short time in Philadelphia. Spending all of 2011 with Triple-A Lehigh Valley and brandishing a career 68 OPS+ in his 626 Major League PA, the 30-year-old agreed to another Minor League contract with the Phillies, hoping to perform well enough to earn a promotion. He did just that, hitting .302 in 418 PA with the Iron Pigs, and the Phillies added him to the roster at the end of July. Since then, he has been one of the Phillies’ most productive players along with Erik Kratz.

I covered Frandsen briefly in a post on Wednesday, and Baumann did the same on Friday, but he is causing quite a stir and I figure explaining his performance is worth its own post.

As you may infer from his high batting average, Frandsen is sitting on a sky-high BABIP as well: .364. His career average BABIP is .272. While hitters have a lot more control over their BABIP than pitchers, they are still prone to the single-season flukes. Of the eight qualified MLB hitters with a .360 or higher BABIP in 2011, seven of them had a lower BABIP in 2012.

Name ’11 BABIP ’12 BABIP DIFF
Michael Young .367 .295 .072
Hunter Pence .361 .296 .065
Adrian Gonzalez .380 .327 .053
Emilio Bonifacio .372 .325 .047
Alex Avila .366 .322 .044
Miguel Cabrera .365 .325 .040
Michael Bourn .369 .361 .008
Matt Kemp .380 .387 -.007

In 2010, only four MLB hitters had a .360 or better BABIP: Austin Jackson, .396; Josh Hamilton, .390; Carlos Gonzalez, .384; and Joey Votto, .361. Each regressed the following year: Jackson, .340; Hamilton, .317; Gonzalez, .326; and Votto, .349. So you can bet on most if not all of the players at the top of the hitter BABIP leaderboard to regress the following year, Frandsen included.

Relative to his career averages, Frandsen hasn’t changed his batted ball splits all that much. The big difference is he has hit 7.5 percent fewer fly balls and six percent more line drives. The latter fall for hits about 60 percent more often than the former, so that’s explains a lot of Frandsen’s BABIP. But he’s also been lucky on fly balls and ground balls too.

 BABIP LD FB GB
2012 .688 .250 .267
Career .624 .128 .208
NL 2012 .712 .139 .236

If he had his career average BABIP this season rather than .364, he would have two fewer line drive hits, three fewer fly ball hits, and two fewer ground ball hits for a total of seven fewer hits. That would drive his average from .351 all the way down to .252, which says a lot about his luckiness but also about the small sample — 103 plate appearances.

It isn’t like Frandsen has suddenly made incredibly good contact with a majority of the pitches he’s swung at, either. Most of the balls he is putting in play are in front of the outfielders as his hit chart illustrates:

The idea of Frandsen at third base in 2013 has been thrown around a lot lately, but his production thus far is almost entirely a fluke and very unlikely to be repeated next year. He will go back to being a guy with an OPS in the mid-.600’s and the Phillies will still be left looking for a legitimate third baseman. Placido Polanco‘s combined OPS in 2011-12, by the way, is .658. He and his Gold Glove-caliber defense have a $5.5 million mutual option just begging to be picked up in the face of an abhorrent free agent class.

Hopelessness and You

The Phillies are on a four-game winning streak coming off of a three-game series sweep of the Washington Nationals at home, just their second sweep of the season. They may be 16.5 games out of first place in the NL East, but they are 9.5 games back of the second Wild Card. With memories of the 2011 St. Louis Cardinals, who memorably went on a late-season rampage into and through the post-season, Phillies fans are finding a small glimmer of hope in what has long been considered a failure of a season.

If you will, allow me to be a wet blanket. Baseball Prospectus still has the Phillies’ playoff odds at zero percent and they haven’t had a non-zero chance since July 26. After the division-leading Washington Nationals, Cincinnati Reds, and San Francisco Giants, there are four teams with significant playoff chances eyeing either of the two Wild Card slots beyond their division crown: the Atlanta Braves (87 percent), St. Louis Cardinals (63 percent), Los Angeles Dodgers (46 percent), and the Pittsburgh Pirates (25 percent). Two other teams have a small but non-zero shot: the Arizona Diamondbacks (3 percent) and the Milwaukee Brewers (0.1 percent).

Here’s a look at the standings:

Team W L PCT WCGB WCE # G Left
ATL 73 55 .570 - - 34
STL 70 57 .551 - - 35
LAD 69 59 .539 1.5 34 34
PIT 68 59 .535 2.0 34 35
ARI 64 64 .500 6.5 29 34
PHI 61 67 .477 9.5 26 34
MIL 59 67 .468 10.5 26 36
NYM 59 69 .461 11.5 24 34
SDP 59 70 .457 12.0 23 33
MIA 58 71 .450 13.0 22 33

The Phillies would have to topple all but one of those teams just for a shot at a one-game Wild Card playoff. They are 61-67 now, and let’s assume that 89 wins gets you in the second spot since the Cardinals’ .551 winning percentage yields 89 wins over 162 games. The Phillies would need to win 28 of their remaining 34 games, an .824 winning percentage. Even the red-hot 2011 Cardinals, to which many point as evidence of hope for the Phillies, went 18-8 in September, a meager .692 winning percentage. In 2007, the Phillies impossibly went 13-4 to close the season and steal the division crown from the New York Mets, yielding a .765 winning percentage. Just to have a realistic shot at making the playoffs, the Phillies would have to go on a historically-great month-long run.

But wait, that’s not all. All but one of the teams ahead of the Phillies would have to falter, finishing at 88 wins or below. Let’s assume the 73-55 Braves get the first spot. That means that the Cardinals would have to win no more than 18 of their remaining 35 games, the Dodgers 19 of 34, the Pirates 20 of 35, and the Diamondbacks 24 of 34. Additionally, none of the teams behind the Phillies could go on a historically-great run either, with the Brewers and Mets maxing out at 29 remaining wins. So you’re betting on the Phillies winning at least 82 percent of their games and the Cardinals less than 52 percent, Dodgers 56 percent, Pirates 58 percent, and the Diamondbacks 71 percent. You’d have a better time trying to make four of a kind on the river in Texas Hold’em.

It’s great that the Phillies have found their way up to third place in the division while rattling off three separate winning streaks of at least three games since August 12 (going 10-5 in the process), but it is too little, too late. They fell too far behind early, particularly with a 9-19 June that caused them to sell on Shane Victorino, Hunter Pence, and Joe Blanton. Besides, even if the Phillies were to somehow win a Wild Card spot, they would simply be put into a one-game playoff that is essentially a coin flip. And then if the Phillies won that, they would have to beat a team like the Reds, Giants, Dodgers, or Nationals, a tall task despite the recently-completed three-game sweep of the Nationals. It will be a while until the Phillies face mathematical elimination, but they are for all intents and purposes practically eliminated.