On The Future of Baseball Research

Earlier this morning, Adam Felder and Seth Amitin posted, in part, the results of a much-awaited study on the potential understated bias of the language of baseball television coverage at The Atlantic. When I made my thoughts on the subject clear here a few days ago, I was wishing desperately that this study had already been published, but now that it has, you can go read a little empirical justification for that thesis.

I don’t know Felder at all, and my interactions with Amitin have been limited to trading Dodgers jokes on the internet, so I’m not saying this out of a desire to pump up a friend, but you need to read that article. It’s important not only because of what it says, but because it represents of an underserved portion of baseball writing.

Most of you probably know this about me, but I spent three years as a political science grad student, and in that time I probably learned more about statistics, game theory and research methods than I actually did about politics, but I learned a great deal about what separates actual research from conjecture and speculation.

I think one of the best things about the advanced analytics movement in baseball is that it’s brought the rigor of social science research to sportswriting. It’s not perfect, but the average baseball fan knows way more about how to read statistics than he or she did ten or even three years ago. We’re slowly stamping out falsehoods based on preconceived notions whose factual underpinnings are either obsolete or nonexistent, and the positive effects of this movement cannot be overstated.

As scouting information gets democratized, as we debunk concepts like “clutch” and “small ball,” we’re replacing mythology with empirical study. I think this is, in part, why many former athletes and traditional sports media personalities hate advanced metrics and bloggers–they know the mythology and we’re killing God, so to speak. As someone who believes religion and science can co-exist in the real world, I think that creates a false choice when it comes to baseball, but that’s another story.

So why is this study so important? Because it’s empirical baseball research based on something other than game data. You can find enormous amounts of research based on game statistics, pitch f/x and BIS coding. And as much is out there, and as many conclusions as have been drawn by the public, you can bet that teams have even more.

But where we’re lacking, in my mind, is in qualitative analysis. Felder and Amitin’s study is still qualitative, but it’s based on coding of commentary, not box scores. That’s how we’re going to effect change–if media analysis is backed up with large-sample data from which we can draw meaningful conclusions.

Now, this study isn’t perfect. Even if all the concerns I have about their methodology (which is detailed in the post enough for a magazine article but not for a work of social science) are unfounded, what happens when you expand the sample? Or when you turn your attention to print media? Pre-game and post-game analysis? I buy the basic premise (partially, I fear, because I believed in their conclusions before the study), but it raises more questions than it answers. Which is kind of the point–you want knowledge that’s going to generate more knowledge.

So why don’t we have more work like this? Well, it’s absolutely not cost-effective. Game data leads to research that’s either valuable commercially (to ESPN, FanGraphs, Baseball Prospectus or whoever) or competitively to a team. But the only kind of qualitative data or media data that’s valuable (that we know of) is scouting data, and as much as I respect people who can evaluate young players and write coherently about them, I don’t think we’re drawing any scientifically rigorous conclusions there.

On the other hand, doing this kind of research right is expensive (it took upwards of $3,000 to fund this study) and requires people who know what they’re doing. As often as not, those people are doing real social science instead, or their work is stuck in academic journals and either unavailable to the public or off the beaten path. Make no mistake, it exists, but its effects aren’t showing up in places the average baseball fan is going to see it. I’m not sure what the solution is, but even though baseball produces more and better numbers than any other sport, we shouldn’t restrict serious baseball research to what we can count.