A Game for the Armchair Scouts

If there’s one group Saberists don’t like, it’s the people who back up their outlandish claims with “I watch the games”. Being able to pinpoint minute details in baseball players is a skill that takes years to master — that’s why professional scouts are so universally revered.

I came across an interesting game called “the eyeballing game”. You are asked to accomplish various tasks using your mouse and your eyeballs. For instance, the first task is to slightly adjust a shape to make a parallelogram. Click here to play the game. When you finish, you should have a greater appreciation for just how imperfect your eyeballs are, and why we should always defer to the facts and figures when possible.

(h/t Back She Goes)

My results:

  • Parallelogram: 2.8
  • Midpoint: 2.2
  • Bisect angle: 8.4
  • Triangle center: 5.5
  • Circle center: 4.0
  • Right angle: 10.5
  • Convergence: 4.1
  • Average error: 5.36

Leave a Reply


This site uses Akismet to reduce spam. Learn how your comment data is processed.


  1. Rob

    April 22, 2011 08:58 PM

    Mine average error was 5.39. Can I be a scout now?

  2. Rob

    April 22, 2011 08:59 PM

    Mine error? oops.

  3. feeox

    April 22, 2011 09:05 PM


    If we are so flawed that we invariably come to the wrong conclusion about what we are watching (which seems to be your position) remind me again why we watch? It seems you are asking us to watch a game, and say, I have no idea what I just saw, I can’t even be sure who won or not until I crunch the saber numbers. Ben Francisco just hit a home run to break a 0-0 tie. Was it meaningful? I won’t know until I crunch the numbers.

    Just poking a little fun. I really do enjoy the saber analysis, but I think the saberheads do take themselves a too little seriously at times.

  4. Bill Baer

    April 22, 2011 09:13 PM

    I’m talking more about the people who say they can make inferences about a player just by watching. For instance, some people think they know that Ryan Madson can’t handle the 9th inning because they watched him blow a couple saves last year. It’s the people who watch the game and nothing else, and think that gives them carte blanche.

  5. feeox

    April 22, 2011 09:20 PM

    Why did you delete his comments? There was nothing wrong in them. It makes you look bad Bill.

  6. Hugs not Boos for Hamels

    April 22, 2011 09:30 PM

    I’ll do the right angles and bisection of angles from here on out Bill. You can do the rest. We’d kick some tail.

    Your inaccuracy by category:

    Parallelogram 11.0 14.6 5.0
    Midpoint 12.6 3.0 9.1
    Bisect angle 0.7 4.0 1.8
    Triangle center 6.0 2.7 3.8
    Circle center 12.0 11.4 9.2
    Right angle 2.9 1.8 0.5
    Convergence 4.1 3.6 14.1

    Average error: 6.38 (lower is better)
    Time taken: 132.7

  7. Bill Baer

    April 22, 2011 09:37 PM

    @ feeox

    Check out the commenting policy in the “About” page. There were a lot of things wrong with those comments, and I am very happy with the current commenting atmosphere. No need to let one miscreant muck it up.

    @ HNBFH

    Sounds like a deal. I can’t believe how poorly I performed on those.

  8. feeox

    April 22, 2011 09:41 PM

    I think you’ve got the better of the Ryan Madson debate. And even though I am more inclined to side with the saber prediction (he’ll be great) over the small sample prediction (he’ll wet his pants) of how Madson would perform in the ninth over the long haul, we don’t know for sure until he is the closer. You’ve argued conclusively that a high leverage situation is a high leverage situation no matter what inning in which it happens, but does Madson know that? Will he treat the ninth differently than the eighth? Whether rational or not the baseball establishment does treat the ninth differently (see outrageous closer salaries). Is Madson’s mental make up tainted by the baseball establishment? Only time will tell. 🙂

  9. feeox

    April 22, 2011 09:46 PM

    Okay. Will check the “About” page. Thanks.

  10. Scott G

    April 22, 2011 10:02 PM

    Your inaccuracy by category:
    Parallelogram 4.1 2.2 14.0
    Midpoint 2.2 1.4 1.4
    Bisect angle 3.4 3.4 5.3
    Triangle center 1.6 3.2 3.3
    Circle center 3.2 1.4 3.6
    Right angle 0.4 3.5 4.0
    Convergence 0.0 1.4 2.8
    Average error: 3.13 (lower is better)
    Time taken: 123.9

    No big deal, gentlemen

  11. Matt

    April 22, 2011 10:33 PM

    I “watch the game” and it doesn’t take Sabermetric analysis to tell me that Raul Ibanez is done being a productive player. It is amazing that a contender can have so much dead weight and still be a contender (Ibanez, Herndon, Baez, and Kendrick….heck throw in Gload, Orr, and Valdez). To think that we are 12-6 and leading in game 19 as I type and the best hitter in the lineup is a guy named Placido Polanco.

  12. Jack

    April 22, 2011 11:37 PM

    I think you have to temper a rigid devotion to the data with the understanding that the data do not tell the whole story and can’t possibly do so, no matter how much you try to adjust for high/low leverage situations, things beyond a player’s control, etc… One of my big gripes with the SIERA metric is that it appears to assume 1) all pitchers are trying to miss bats all the time, and 2) pitching to contact isn’t something a pitcher can be particularly “good” at, rather it’s a matter of luck. Not on your life. Halladay isn’t primarily a strikeout pitcher. But just watch him. Every pitch runs to the plate screwy. There’s contact, and then there’s contact.

    Also, I continue to view pure sabermetrics with some credulity because its results tend to revere “toolsy” guys who can fill up a stat sheet but might not make a real impact in a game, or over a season. Stat-heads gush over Jayson Werth (or they did, til he turned 32), but in the world outside the box score, I don’t view him as a guy who can be the center cog in an offense. Stat-heads turn their noses up at Ryan Howard, but he is a guy that the other team has to game plan for, and he scares pitchers. There is no stat for this.

    But anyone who thinks sabermetrics are useless is simply being an ostrich and shouldn’t be taken seriously.

  13. Bill Baer

    April 22, 2011 11:46 PM

    @ Jack

    Re: SIERA

    1. A pitcher’s strikeout rate is inherently good because it always leads to outs and rarely leads to base-advancement. It is the best predictor of a pitcher’s success, more than any other statistic, slightly ahead of his walk rate.

    2. SIERA specifically accounts for a pitcher’s batted ball skill. FIP and xFIP do in a very vague sense, which is why I prefer to use SIERA to FIP and xFIP when possible.

    Re: Jayson Werth

    Not sure about this criticism. Do you have any specific quotes? Because it seems like a strawman. Werth has been objectively great over the past 3.5 seasons going into 2011, and is off to a slow start in a small three-week sample.

    I would argue that he has been a “center cog in an offense” while with the Phillies.

    Saberists don’t “turn their noses up at Ryan Howard”; rather, they “turn their noses up” at paying him $125 million over five years in his age 32-36 seasons while playing a non-premium position.

    he is a guy that the other team has to game plan for, and he scares pitchers. There is no stat for this.

    That’s an empty statement, in my opinion. Teams “game plan” for every hitter. They don’t skip looking over Raul Ibanez because Ryan Howard is this scary figure.

    In fact, it’s more likely that they “game plan” less for Howard, since opposing managers have been neutralizing him with LOOGY’s with great success.

  14. Dustin

    April 22, 2011 11:48 PM

    I happen to read your blog a lot. I find a lot of these sabermetrics to be frustrating. I believe there are a lot of thing that you see by watching and knowing what to watch for that you don’t see in the numbers, but I still find these new analysis to be interesting. It’s sot of a good point to use this game as an example. However I played it twice and noticed a significant improvement from the first try to the second. I’m sure that if I played this game as many times as baseball games I’ve watched in my lifetime then I could get a near perfect score. So I think the point is moot. Granted, baseball is significantly more complicated than this geometry eyeball game. With that said I think there are many intelligent people out there that have watched a lot of baseball and know the difference between gut judgment and critical visual analysis. I know we tend to think that science is objective but the reality is that nothing is absolutely objective. I would encourage you to read a book titled “The Structure of Scientific Revolutions” by Tomas Kuhn. Kuhn makes a strong argument about the non concrete nature of science. Ultimately it is human reactions and a human eye that puts a bat on a 96mph fast ball. The human eye and human judgment is much more sophisticated than sabermetrics gives it credit for. And as cliché as it sounds the game is won on the field not on the stat sheet.

  15. Bill Baer

    April 22, 2011 11:59 PM

    @ Dustin

    It’d be an interesting study — to see if people improve on multiple trials. I’ve actually gotten mixed results. My latest “average error” was 5.60. :-\

    EDIT: Just got a 6.52. Eek!

  16. Jack

    April 23, 2011 12:01 AM

    I don’t think the Werth comment is a strawman. His “objective greatness” is mutually preconditioned by loads of factors, not least of which is the fact that he got great protection in a great lineup. If he protected Howard, then that protection was a two-way street. You probably can pull this stat up in seconds, but (and this is coming from my eyes…) Werth saw a lot of fastballs while he was in the 5 hole, more than he would have seen in a weaker lineup (I know, he sees a lot of pitches because he’s terrific at working counts. But I still believe he saw more pitches to hit because of where he was). I won’t say anything about these initial weeks with the Nats. As you said, small sample size, doesn’t mean anything.

    I think your “teams game-plan for every hitter” is a bit of a purposeful misdirection. Obviously teams game plan for every hitter. All I meant by that statement is a lineup of 9 Nick Swishers isn’t enough, in my opinion, never mind gaudy OPS. Howard is one of the premier run-producers in the game. I know you probably think the RBI is effectively meaningless, but I don’t. And it would be a real distortion of what you said to start comparing Werth to Howard apples to apples, so don’t take this next bit as me doing that. But for all Howards faults, he drives in tons of runs. All those faults (all the strikeouts) gave Werth lots of chances to drive in runs as well. But am I right that Werth only drove in 100 once?

    I’m not trying to be contrarian or even play devil’s advocate. I just think it is foolish to discount statistics and equally foolish to discount the value of human judgment in context. Love the blog.

  17. Bill Baer

    April 23, 2011 12:13 AM

    Do you have any evidence that lineup protection exists, outside of “I watch the games and feel it in my gut?” Because study after study shows either zero evidence or as close-to-zero-as-possible.


    Werth saw 58 percent fastballs last year, which was within one percentage point (plus-minus) of Brian Schneider, Jimmy Rollins, Raul Ibanez, Carlos Ruiz, and Ross Gload.

    But for all Howards faults, he drives in tons of runs. All those faults (all the strikeouts) gave Werth lots of chances to drive in runs as well. But am I right that Werth only drove in 100 once?

    RBI opportunities are a function of the on-base percentage of the hitters in front. Werth will have fewer RBI opportunities, especially if, as you maintain, Howard is such a great run-producer.

    I’m not trying to be contrarian or even play devil’s advocate. I just think it is foolish to discount statistics and equally foolish to discount the value of human judgment in context.

    I’m not discounting it. I’m saying that a lot of people try to validate their assertions with “I watch the game” but don’t realize just how flawed their eyes are and how biased they themselves are. I’ll trust a seasoned scout when the numbers come up short, but I’ll never trust Joe Schmoe watching on his TV at home, even if it is in HD.

  18. Mratfink

    April 23, 2011 12:45 AM

    Your inaccuracy by category:

    Parallelogram 9.5 2.0 5.4
    Midpoint 5.8 4.0 6.3
    Bisect angle 1.3 2.1 2.2
    Triangle center 18.8 1.9 7.1
    Circle center 1.0 6.7 5.8
    Right angle 6.3 17.0 6.9
    Convergence 7.8 3.6 2.2

    Average error: 5.89 (lower is better)
    Time taken: 143.2

    I was completely guessing at the right angles. i know what they are but damn if i can see it.

    I mean the thing about the error of human eyes is that we remember what stands out to us. Joe Posnanski had a series of writing over the past few years about how some people thought Yuniesky Betancourt was a good-great fielder because every once in a while he made a spectacular play but he also didn’t make a lot of routine plays, so the good sticks out in those cases.
    Similarly with madson the bad is all people remember with him.
    That said i do think there are things that can be observed but i think Bill’s point here is that to observe trends takes a trained eye and most people at home do not have this training. So when we say Ibanez’s bat has slowed down its much less likely the statement has meaning coming from me then it does from a scout. but the numbers will be true no matter who looks at them so for those of us without the eyeball training of a lifetime the numbers make up 90% of our argument.

  19. Chris

    April 23, 2011 02:19 AM

    “I would encourage you to read a book titled “The Structure of Scientific Revolutions” by Tomas Kuhn. Kuhn makes a strong argument about the non concrete nature of science. Ultimately it is human reactions and a human eye that puts a bat on a 96mph fast ball. The human eye and human judgment is much more sophisticated than sabermetrics gives it credit for. And as cliché as it sounds the game is won on the field not on the stat sheet.”

    Actually the ideas in that book provide more support for sabermetrics than against it. Our interpretation of events we personally see are skewed by what we already think, such as every time I watch Victorino strike out or pop up, I cringe and curse about how terrible he is. In reality, he’s probably not as bad as I think he is (although he probably is overrated by the majority of baseball fans) but the only way I’d be able to see that is to view the statistics about him. Sabermetrics isn’t relativity; the statistics present what they were designed to present, regardless of who is viewing them, and it is up to individuals to interpret them. This, however, doesn’t diminish the objectivity of the statistics themselves, and in the case of sabermetrics, there are years worth of research behind designed statistics to try and determine players’ true talent levels. They’re not arbitrarily designed to favor certain players over others, as sometimes seems to be peoples complaints with them.

    For example, most Phillies’ fans’ views of Ryan Howard are skewed by the fact that he was a monster in ’06, and helped bring us a championship in ’08. His HR totals up until last year had been impressive, but his RBI totals said more about Utley and the other guys in front of him than they did about Howard. In reality, the league adjusted to Howard awhile ago and he hasn’t really shown any signs of adjusting back, although we shall see how he does through this year. He’s not as good as he was in ’06, and some fans (maybe most? I don’t really know) just can’t see it because of how they think of Howard. He’s still an above average hitter, but the truth is he’s realistically barely above average for a 1st baseman, if that, although that again could change over the course of this season.

  20. goat

    April 23, 2011 02:55 AM

    Your inaccuracy by category:

    Parallelogram 1.4 2.8 6.0
    Midpoint 1.4 0.0 5.1
    Bisect angle 0.3 4.0 0.8
    Triangle center 8.5 2.9 4.9
    Circle center 5.8 5.0 6.3
    Right angle 2.7 2.2 2.6
    Convergence 5.4 8.1 2.2

    Average error: 3.73 (lower is better)
    Time taken: 146.4

    And I’m currently drunk.

    Seeing if ryan howard has his head towards or away the pitch still counts when he has a hot or cold streak. Noticing if baez is in the game also counts as eyesight.

  21. Here

    April 23, 2011 09:40 AM

    Parallelogram 7.8 5.1 6.1
    Midpoint 3.0 2.2 8.1
    Bisect angle 6.6 0.4 3.2
    Triangle center 2.7 2.4 2.9
    Circle center 1.4 2.2 2.0
    Right angle 0.9 0.6 0.3
    Convergence 7.1 1.4 5.0
    Average error: 3.40 (lower is better)
    Time taken: 178.4

  22. Css228

    April 23, 2011 10:02 AM

    Yeah mine only got worse with each view

  23. Css228

    April 23, 2011 10:07 AM

    Wait nevermind that last statement
    Your inaccuracy by category:
    Parallelogram 4.2 6.4 13.0
    Midpoint 5.1 3.2 4.5
    Bisect angle 8.8 7.8 0.5
    Triangle center 4.9 1.1 3.4
    Circle center 3.2 2.2 6.3
    Right angle 4.4 6.8 0.7
    Convergence 1.4 5.1 6.7
    Average error: 4.75 (lower is better)
    Time taken: 107.4

  24. Will Stouffer

    April 23, 2011 11:30 AM

    Are scouts trying to judge the exact degree of the arc of rotation of a player’s turn around third base in order to judge his baserunning ability? I don’t think this game, or the precise accuracy of the human eye, is really all that analogous to you point, which is that by watching games we give to much weight to the small sample size that we see, rather than the mountain of data that we didn’t see for ourselves.

    That is definitely true, however has little to do with the benefits of visual scouting. For instance, Ibanez has clearly been struggling at the plate, and while statistical analysis will only tell you his current sample size is too small to judge his long term potential, just watching him swing the bat can help you *predict* whether or not we’re seeing a short term slump or a player who’s permanently lost a step or two.

  25. TESS

    April 23, 2011 11:37 AM

    It’s easy to stare at stats on a page.

    Your inaccuracy by category:

    Parallelogram 2.2 2.2 2.2
    Midpoint 0.0 3.2 4.1
    Bisect angle 1.4 2.2 3.5
    Triangle center 3.6 3.7 3.9
    Circle center 1.4 4.1 3.6
    Right angle 1.8 3.3 6.2
    Convergence 0.0 1.4 1.4

    Average error: 2.64 (lower is better)
    Time taken: 215.7

  26. LTG

    April 23, 2011 01:05 PM

    To Will’s point about Ibanez:

    If you meant to suggest that it is clear to the naked eye that Ibanez is in decline due to age and not a common slump, I disagree. To draw such a conclusion over a small sample size, we have to be able to distinguish properties of Ibanez’s swing that are clear indicators of age-related decline rather than a common slump and then detect them. What are the clear indicators? Perhaps bat speed: as players move into the mid and late 30s bat speed often declines and their numbers lag. But, 1) bat speed can decrease as the result of mechanical issues not related to age, 2) bat speed can decrease due to a temporary hindrance (injury, illness, need for new contact prescription), and most importantly 3) the naked eye of the average fan cannot pick up differences in bat speed without the aid of timing devices and lots of video, unless it is egregious (which in Ibanez’s case it isn’t). For any other candidate property you name I think the arguments against it would be similar.

    On the other hand, if you meant to suggest that *expert scouts* can just see whether Ibanez’s current slump is due to decline, then you are just agreeing with Bill when he says that professional scouts are revered in the game because they have such a specialized skill. We fans are not they.

    I too doubt that there is a significant analogy between the game and observing baseball-related details. But that’s because I suspect professional scouts would be just as bad as the rest of us at the game. How good we are at seeing things is object-related. We can train ourselves to see certain kinds of objects really well but that doesn’t mean we are just as good at seeing the details of every kind of object. (Object here can be anything we see, a ball, a motion, and feeling, etc.)

  27. Mike B.

    April 23, 2011 02:35 PM

    I don’t care what the computer says; that point was right in the middle of the damn circle. I’ve been looking at circles my whole life. I don’t need a computer to tell me where the middle is.


  28. Scott G

    April 23, 2011 05:32 PM

    Your inaccuracy by category:

    Parallelogram 2.2 1.0 1.0
    Midpoint 1.4 0.0 4.1
    Bisect angle 1.2 2.9 1.3
    Triangle center 1.3 4.0 3.5
    Circle center 5.1 2.0 1.4
    Right angle 6.5 0.8 2.0
    Convergence 1.4 2.0 3.2

    Average error: 2.30 (lower is better)
    Time taken: 172.0

    I’m addicted

  29. Dan

    April 23, 2011 07:01 PM

    I forgot to mark down the numbers, but I had a 2.43 average error which was inflated a bit by accidentally letting go on the right angle before I was ready.

    You made a very good point, although somewhat wasted on me. I’ve always had good eyesight and I’m pretty precise when it comes to these things. I wonder if I could be a scout one day, I’d love that job.

  30. Western Dave

    April 23, 2011 08:20 PM

    I can’t believe we are discussing Kuhn here, but it’s one of the reasons I read this blog.

    I just co-taught Kuhn with a mathematician in a History of Math class on Thursday so the text is pretty fresh in my mind. OTOH, Mathematics resists Kuhnian analysis. There have been few paradigm shifts in math (compared to Science) and event those haven’t really been true paradigms. Math seems to exist outside of human understanding (unlike Science).

    However, statistics isn’t real math (at least in the sense that mathematicians use the term). It’s applied math and has a human element to it and therefore it is verifiable. The Kuhnian piece fits well here in that Sabremetricians believe they are capable of identifying and naming what talented scouts perceive intuitively. However, the old guard are incapable of understanding what Sabremetrics are or what they do, or do not, do.

    The thing is, were I statistically literate enough, I am sure I could manufacture a stat that proves that Ryan Howard is a more productive player than Jayson Werth. And vice-versa. Sadly, I was the history half of the history of math class not the math half, but I do wonder why the stats heads don’t use terms like significant and standard deviation more in their stats to show how meaningful they are.

  31. Scott G

    April 23, 2011 10:11 PM

    Exhibit A that teams aren’t too scared of Howard. IBB Rollins to face Ryan Howard. Woof.

  32. jauer

    April 23, 2011 10:48 PM

    lineup protection? what is this, amateur hour?

  33. LTG

    April 24, 2011 12:41 AM

    Alright, I have to get into the Kuhn discussion. The claim that mathematics is a counter-example to Kuhn’s claims about paradigm shifts and revolutionary science has to be better grounded than just pointing to the apparent eternality of mathematical truths. For example, the appearance that mathematics exists outside of human understanding could be explained by things like Platonic forms of numbers and shapes and even axioms; or it could be explained by the claim that mathematics is about things that are produced by the human mind and so their constancy is a result of the our continuing to find those things relevant. Only the former explanation denies Kuhn’s results, not the latter.

    As for the non-purity of statistics. The mathematical theorems of statistics are perfectly pure. However, when those theorems are tailored to a particular set of phenomena in order to make predictions about them, then it becomes applied. Lots of pure math has been impurified in the pursuit of science. The mathematical disciplines are no less pure for that reason. String theory depended for its development on the discovery of new pure maths.

    To back up Chris, Kuhn’s results show that perception is theory-laden. This means that there is no self-evident seeing the truth of the matter. Everything we see depends on a background theory for the identification of what we see. So, whenever we make observations we have to ask whether the background theory provides the best account of what we are seeing or whether there is a better one. This requires close attention to detail through… quantificational experiment. To exaggerate, Kuhn was the first sabermetrist.

  34. Bill Baer

    April 24, 2011 12:49 AM

    I haven’t read Kuhn’s book, so I don’t feel fit to comment on it, but I would just like to say that I absolutely love the conversation that’s taken place here about Kuhn.

  35. jauer

    April 24, 2011 01:42 AM

    “but his RBI totals said more about Utley and the other guys in front of him than they did about Howard.”

    rob charry disagrees

  36. Cutter

    April 24, 2011 09:27 AM

    On the other hand, one thing that annoys non-saber fans is the sabermetrics crowd’s insistence that their analysis of the game is superior because they have statistics that tell them so.

    I realize that this is not the mindset of every stat-based fan, but it feels like in their attempt to get their points across, many of the more prominent sabermetric experts have made it seem that way.

    It seems ridiculous that anyone can point to a statistic to absolutely prove anything considering the way that statistical analysis is continually evolving.

    It’s very possible that today’s accepted statistic could very well be laughed at by the sabermetricians of tomorrow. Sure, OPS and SIERA seem like good ways of analyzing players now. But it wasn’t that long ago that everyone thought that wins and RBIs were good statistics too.

    And remember, as has been mentioned before, people can find statistics to prove just about anything.

    I could probably go through each team’s numbers and find that the teams that have the most RBIs are the teams that score the most runs. And the teams that have the fewest RBIs against give up the fewest runs. I could then conclude that RBIs are directly connected to scoring runs, and as a result winning games. Therefore, RBIs must be a very important statistic.

    At the risk of getting way off-topic here, it reminds me of a scientist saying, “Prove to me that God exists, since my calculations say that he does not.”

    Can we prove that Ryan Howard is a great player by using statistics like OBP? Probably not.

    But look at last night’s game. 1-5 with 4 strikeouts looks like a bad game statistically. And yet, most people would say that Howard was the star of the game.

  37. Scott G

    April 24, 2011 10:33 AM

    Most people would be wrong then. Howard should have been 0 for 5. Ludwick butchered that fly ball. Rollins was better offensively and defensively.

  38. Cutter

    April 24, 2011 07:16 PM

    Last I checked, “should have” doesn’t count for much in baseball.

    I mean, I figure the Phillies “should have” beaten the Giants in the NLCS last season. The Phillies “should have” been able to hit the Giants relievers. Uribe’s homer “should have” been a routine fly ball.

    But the parade was held in San Fran, not Philly.

  39. hk

    April 24, 2011 07:27 PM


    This is totally off topic, but I was wondering why inside-the-park HR’s are counted like other HR’s in the BABIP calculation. I know that they are infrequent enough to be statistically irrelevant, but they still should not count as HR’s in the calculation. Are they similarly counted as HR’s in the other calculations like HR/9 or HR/FB?

  40. LarryM

    April 24, 2011 08:09 PM

    You won’t convince the doubters on this, but …

    There are a number of areas where educated observers will provide information that statistical analysis won’t – but the emphasis is on the word educated. Not “I watch a lot of games, I’m a big fan” educated, but professional scout educated.

    The thing about statistical analysis in this context isn’t so much that it is ultimately “better” than subjective analysis*, but ANYONE with a who isn’t innumerate can intelligently evaluate the statistical arguments, whereas non-professionals are almost never qualified to evaluate players on a subjective basis.

    That explains why sabermetricly inclined fans are (correctly) convinced that their analysis is superior to the subjective opinions of the casual fan.

    * As a side note, there are things which statistical analysis does well, and things which traditional scouting does well. They complement each other.

  41. LarryM

    April 24, 2011 08:14 PM

    I would think that a quick look at Howard’s splits would settle the RBI argument, at least as it applies to him, beyond all dispute. Now, to be fair, the statement that “his RBI totals said more about Utley and the other guys in front of him than they did about Howard” is not entirely fair. They also say something about his raw power, which is prodigious. But what they DON’T say anything about is his “clutch” ability, which is … just about precisely average*. Except his BB numbers, but I’ll leave it as an exercise for the reader to untangle that knot.

    *I’m not making an argument about whether there is such a thing as clutch ability in the abstract, but if there is, looking at his career splits (runners on base, runners in scoring position, late/close, etc.), Howard does not exhibit it.

  42. Bill Baer

    April 24, 2011 08:34 PM

    @ hk

    The ITPHR is technically a ball in play — a ball that gives fielders an opportunity to convert an out.

    They are counted in HR/9, and depending on how they are classified as a batted ball (GB/FB/LD), they are also included in HR/FB as far as I know.

  43. Western Dave

    April 24, 2011 10:26 PM

    Well, I was hoping to catch some of the ambiguity you bring up with “seems to.” The seminar split evenly over the question of is math invented or discovered and I’m not really competent enough to make an argument not based on personal preference here. But my understanding for why Bill Baer’s preference that SIERA is a better measure of pitchers than xFIP comes down to he thinks it works better because it values some aspects of pitching differently not because someone has proven it’s better in a way that would hold up.

    Essentially, sabermetrics developed as a way to do arbitrage (find imperfections in the market and exploit them). They aren’t objective measurements per se; they are ways of measuring how people value players and then comparing those values to other values. That’s a lot of human inputs for something that has a veneer of objectivity.

    I did just do some more reading up on WAR and the graphs appear to work out to something that looks like a decent distribution, like most players don’t start and so on although the differences at the extremes are probably over-emphasized. But WAR doesn’t think very much of relief pitchers; is this a feature or a bug? I’m inclined to view the latter.

  44. Bill Baer

    April 24, 2011 11:30 PM

    But my understanding for why Bill Baer’s preference that SIERA is a better measure of pitchers than xFIP comes down to he thinks it works better because it values some aspects of pitching differently not because someone has proven it’s better in a way that would hold up.

    Not sure if I am interpreting this correctly, but I do prefer SIERA to xFIP because it has indeed been proven to better predict pitchers’ performance.


    Matt Swartz:

    Squeezing the data every which way, it remains true that 2010 continues to show SIERA to be the best ERA estimator. It is clear that xFIP is almost as good, though if left with one, I would prefer SIERA (perhaps obviously). Interestingly, running a regression of park-adjusted ERA on the previous year’s SIERA and xFIP shows that not only does SIERA to a better job, you should actually lower the expected ERA of a pitcher with a higher xFIP and the same SIERA. The formula given is:

    ERA (pk-adj) = 1.60 + .914*SIERA – .277*xFIP

    Both coefficients are statistically significant (p=.000, p=.013 for SIERA and xFIP respectively). This means that xFIP is not giving extra information beyond what SIERA does. This peculiar result of a negative coefficient is probably a result of sampling bias, but it is still worth reporting.

  45. LTG

    April 25, 2011 12:37 AM

    @Western Dave,
    The question of the objective status of sabermetrics is difficult because, as you point out, the knowledge of the statistics can change the way the game is played and this can change how the statistics ought to be evaluated. However, unlike the use of statistics in the social sciences and the market, the game of baseball has one value that is not optional: runs. If you don’t value runs, then you aren’t playing baseball. So, if we figure out the likelihood of runs being produced as the result of certain events and then find ways to measure players’ probabilities of creating those events, it seems to me we would have a way of making informed predictions about which players will produce what value (assuming they perform in the same way). This looks something like an objective measurement system, even though it depends on our valuing runs. What allows for the apparent objectivity is that valuing runs is non-optional in baseball. As I understand sabermetrics, they developed in precisely this way, even if they were originally intended as a form of arbitrage. So, sabermetrics has a little more than a veneer of objectivity.

    And, sorry for ignoring your hedge above.

  46. Western Dave

    April 25, 2011 07:59 PM

    @LTG and Bill,
    Thanks for helping to educate me more about these issues. I went and read the links at Baseball Prospectus and I am confused about the term estimator, and I couldn’t find a definition. Can y’all help.

    Also, I was thinking last night about the way relievers are valued/not valued. It would seem that since the reliever’s job is so specific, wouldn’t the proper valuation be Losses Prevented Above Replacement and not Wins Above Replacement? What would that metric look like?

  47. Bill Baer

    April 25, 2011 08:34 PM

    I am confused about the term estimator, and I couldn’t find a definition.

    SIERA removes luck from the equation in evaluating a pitcher. For instance, the fact that Cole Hamels was really BABIP-unlucky in 2009 had no effect on his SIERA. It only looked at his strikeout and walk rates, and his batted ball profiles. Pitchers have by far the most control on those three areas, which makes SIERA rather accurate in predicting future performance.

    As for relievers, I do think we don’t do a good enough job of evaluating them properly. There have been attempts (example) but nothing impressive yet.

  48. Chareth

    April 25, 2011 10:21 PM

    Your inaccuracy by category:

    Parallelogram 2.0 —- —-
    Midpoint 3.6 —- —-
    Bisect angle 1.3 —- —-
    Triangle center 2.5 —- —-
    Circle center 4.5 —- —-
    Right angle 1.1 —- —-
    Convergence 0.0 —- —-

    Average error: 2.14 (lower is better)
    Time taken: 76.3

    I seem to have the lowest score, also in the lowest time taken. That means I’m allowed to just “watch the games” right? 😉

Next ArticleShould Roy Halladay Have Thrown 130 Pitches?