For better and for worse, it’s projection season in baseball. Spring training has yet to begin, but rosters are gradually taking their final shape, or as close to the final shape as we’ll know before a single pitch is thrown. Consequently, computers are sorting through the past few seasons of statistical data and spitting out their best projections for what players will do in 2016. Roll those player projections in with roster projections and we’re beginning to get ideas of how computers “think” teams will perform this year. Unsurprisingly, the computers aren’t impressed with the Phillies but perhaps somewhat surprising is the degree to which they’re unimpressed.
Along with small sample sizes, one of the more common statistical traps we fall into with baseball analysis is arbitrary endpoints. When you think about it, every season stat line you’ve ever looked at is defined by arbitrary endpoints. Are stats from the 162 games played between April and October truly more meaningful than the stats for a 162 game set played between July and the following June? In most ways the answer is no. But season starts and finishes are extraordinarily convenient endpoints and so they regularly appear in analyses. There’s nothing wrong with season-to-season analysis as long as we’re aware that there’s an arbitrary nature to the statistics we’re dealing with. I say all of this to set up a quick look at an even more arbitrary endpoint: 1st and 2nd halves.