This article is part of our The Z Files series.
Whether it's a team's penchant for making the routine play, the outstanding play or simply better positioning, the best measure for our purposes is BABIP (batting average on balls in play). This is especially apropos for those generating pitching projections using component metrics. Each hurler's BABIP is usually regressed towards league mean for that type of pitcher (groundball, fly ball). Tempering towards team defense seems like a worthwhile endeavor.
Unfortunately, there's one catch. For BABIP to be reliable, it needs to be predictable. Today, team BABIP from the past five seasons will be investigated to gauge just how predictable, hence useful, it is in projecting pitcher performance.
The method will look at the correlation between projected team BABIP and actual, both in total and broken into components. The statistics will be kept relatively simple, gauging the linear relationship between projected and actual using the Pearson Coefficient (r). By means of explanation, r=1 means there is a direct relationship, actual is perfectly predicted by the past data. If r=0, the variables are completely random. When r=-1, there's a perfect inverse relationship. In this case, it would entail the worst BABIP becoming the best BABIP and vice versa.
To get things started, here's a table with each team's overall BABIP for the last five seasons:
Team |
Whether it's a team's penchant for making the routine play, the outstanding play or simply better positioning, the best measure for our purposes is BABIP (batting average on balls in play). This is especially apropos for those generating pitching projections using component metrics. Each hurler's BABIP is usually regressed towards league mean for that type of pitcher (groundball, fly ball). Tempering towards team defense seems like a worthwhile endeavor.
Unfortunately, there's one catch. For BABIP to be reliable, it needs to be predictable. Today, team BABIP from the past five seasons will be investigated to gauge just how predictable, hence useful, it is in projecting pitcher performance.
The method will look at the correlation between projected team BABIP and actual, both in total and broken into components. The statistics will be kept relatively simple, gauging the linear relationship between projected and actual using the Pearson Coefficient (r). By means of explanation, r=1 means there is a direct relationship, actual is perfectly predicted by the past data. If r=0, the variables are completely random. When r=-1, there's a perfect inverse relationship. In this case, it would entail the worst BABIP becoming the best BABIP and vice versa.
To get things started, here's a table with each team's overall BABIP for the last five seasons:
Team | 2017 | 2016 | 2015 | 2014 | 2013 |
Arizona Diamondbacks | 0.292 | 0.297 | 0.323 | 0.297 | 0.316 |
Atlanta Braves | 0.281 | 0.306 | 0.296 | 0.309 | 0.306 |
Baltimore Orioles | 0.312 | 0.303 | 0.301 | 0.299 | 0.283 |
Boston Red Sox | 0.295 | 0.304 | 0.295 | 0.307 | 0.303 |
Chicago Cubs | 0.287 | 0.288 | 0.257 | 0.290 | 0.308 |
Chicago White Sox | 0.293 | 0.282 | 0.300 | 0.314 | 0.309 |
Cincinnati Reds | 0.303 | 0.300 | 0.294 | 0.301 | 0.281 |
Cleveland Indians | 0.299 | 0.305 | 0.291 | 0.290 | 0.312 |
Colorado Rockies | 0.302 | 0.309 | 0.321 | 0.323 | 0.311 |
Detroit Tigers | 0.292 | 0.321 | 0.302 | 0.301 | 0.314 |
Houston Astros | 0.284 | 0.302 | 0.308 | 0.287 | 0.302 |
Kansas City Royals | 0.311 | 0.305 | 0.301 | 0.288 | 0.294 |
Los Angeles Angels | 0.295 | 0.291 | 0.303 | 0.288 | 0.287 |
Los Angeles Dodgers | 0.286 | 0.284 | 0.291 | 0.299 | 0.296 |
Miami Marlins | 0.293 | 0.302 | 0.307 | 0.297 | 0.315 |
Milwaukee Brewers | 0.282 | 0.301 | 0.303 | 0.307 | 0.293 |
Minnesota Twins | 0.304 | 0.298 | 0.320 | 0.303 | 0.317 |
New York Mets | 0.298 | 0.322 | 0.312 | 0.292 | 0.299 |
New York Yankees | 0.302 | 0.282 | 0.294 | 0.303 | 0.301 |
Oakland Athletics | 0.271 | 0.296 | 0.301 | 0.291 | 0.274 |
Philadelphia Phillies | 0.306 | 0.307 | 0.307 | 0.319 | 0.300 |
Pittsburgh Pirates | 0.299 | 0.310 | 0.311 | 0.305 | 0.295 |
San Diego Padres | 0.308 | 0.301 | 0.298 | 0.307 | 0.294 |
San Francisco Giants | 0.297 | 0.312 | 0.290 | 0.285 | 0.286 |
Seattle Mariners | 0.296 | 0.285 | 0.294 | 0.300 | 0.277 |
St. Louis Cardinals | 0.297 | 0.301 | 0.307 | 0.302 | 0.289 |
Tampa Bay Rays | 0.279 | 0.284 | 0.300 | 0.286 | 0.288 |
Texas Rangers | 0.301 | 0.289 | 0.293 | 0.296 | 0.311 |
Toronto Blue Jays | 0.308 | 0.304 | 0.283 | 0.280 | 0.295 |
Washington Nationals | 0.289 | 0.290 | 0.292 | 0.304 | 0.298 |
Three different tests will be run, determining how well the past three years, two years and previous campaign predicted the ensuing BABIP. The correlation coefficient is for the season following the years listed.
Three year
Year | r |
2015-2017 | 0.23 |
2014-2016 | 0.25 |
Two year
Year | r |
2016-2017 | 0.21 |
2015-2016 | 0.24 |
2014-2015 | 0.33 |
One year
Year | r |
2016 | 0.26 |
2015 | 0.34 |
2014 | 0.36 |
2013 | 0.26 |
While none of the studies demonstrate significant correlation, it appears the previous year's defense best predicts what will occur the following season. That said, we've come a long way since Voros McCracken first introduced DIPS theory (Defense Independent Pitching Statistics). Most notably, the BABIP on component batted balls differ. The BABIP on line drives is highest, followed by grounders and then fly balls. Let's look at the predictability of each, using the same set of data as above.
GROUND BALLS
Team | 2017 | 2016 | 2015 | 2014 | 2013 |
Arizona Diamondbacks | 0.216 | 0.236 | 0.251 | 0.236 | 0.257 |
Atlanta Braves | 0.223 | 0.249 | 0.256 | 0.251 | 0.270 |
Baltimore Orioles | 0.252 | 0.249 | 0.230 | 0.243 | 0.232 |
Boston Red Sox | 0.263 | 0.254 | 0.242 | 0.237 | 0.248 |
Chicago Cubs | 0.228 | 0.220 | 0.196 | 0.221 | 0.242 |
Chicago White Sox | 0.243 | 0.222 | 0.258 | 0.262 | 0.264 |
Cincinnati Reds | 0.247 | 0.244 | 0.246 | 0.239 | 0.235 |
Cleveland Indians | 0.247 | 0.233 | 0.219 | 0.225 | 0.251 |
Colorado Rockies | 0.226 | 0.249 | 0.251 | 0.247 | 0.251 |
Detroit Tigers | 0.248 | 0.286 | 0.266 | 0.257 | 0.272 |
Houston Astros | 0.228 | 0.236 | 0.236 | 0.219 | 0.241 |
Kansas City Royals | 0.270 | 0.265 | 0.268 | 0.232 | 0.255 |
Los Angeles Angels | 0.252 | 0.248 | 0.257 | 0.252 | 0.249 |
Los Angeles Dodgers | 0.226 | 0.227 | 0.229 | 0.238 | 0.226 |
Miami Marlins | 0.235 | 0.250 | 0.245 | 0.246 | 0.270 |
Milwaukee Brewers | 0.232 | 0.228 | 0.238 | 0.234 | 0.240 |
Minnesota Twins | 0.248 | 0.259 | 0.259 | 0.253 | 0.257 |
New York Mets | 0.252 | 0.274 | 0.265 | 0.253 | 0.254 |
New York Yankees | 0.259 | 0.237 | 0.245 | 0.243 | 0.251 |
Oakland Athletics | 0.208 | 0.238 | 0.254 | 0.242 | 0.209 |
Philadelphia Phillies | 0.254 | 0.263 | 0.242 | 0.278 | 0.242 |
Pittsburgh Pirates | 0.257 | 0.235 | 0.235 | 0.219 | 0.223 |
San Diego Padres | 0.250 | 0.265 | 0.260 | 0.248 | 0.236 |
San Francisco Giants | 0.239 | 0.245 | 0.222 | 0.217 | 0.216 |
Seattle Mariners | 0.243 | 0.250 | 0.244 | 0.241 | 0.232 |
St. Louis Cardinals | 0.235 | 0.235 | 0.238 | 0.255 | 0.245 |
Tampa Bay Rays | 0.226 | 0.249 | 0.279 | 0.260 | 0.270 |
Texas Rangers | 0.246 | 0.228 | 0.236 | 0.226 | 0.275 |
Toronto Blue Jays | 0.271 | 0.242 | 0.228 | 0.221 | 0.240 |
Washington Nationals | 0.248 | 0.253 | 0.243 | 0.257 | 0.246 |
Three year
Year | r |
2015-2017 | 0.15 |
2014-2016 | 0.53 |
Two year
Year | r |
2016-2017 | 0.22 |
2015-2016 | 0.59 |
2014-2015 | 0.63 |
One year
Year | r |
2016 | 0.38 |
2015 | 0.60 |
2014 | 0.61 |
2013 | 0.36 |
The ground ball data shows more correlation, to the point it's actionable. The correlation is still low, but in a couple of instances it exceeds 0.5. That said, part and parcel to this type of analysis is all other variables remain constant. With the number and extent of teams employing shifts, the BABIP on grounders may not be as stable as is typically necessary to use it in a predictive manner.
FLY BALLS
Team | 2017 | 2016 | 2015 | 2014 | 2013 |
Arizona Diamondbacks | 0.114 | 0.087 | 0.106 | 0.095 | 0.101 |
Atlanta Braves | 0.107 | 0.100 | 0.088 | 0.098 | 0.076 |
Baltimore Orioles | 0.132 | 0.098 | 0.109 | 0.099 | 0.084 |
Boston Red Sox | 0.117 | 0.097 | 0.110 | 0.107 | 0.115 |
Chicago Cubs | 0.099 | 0.087 | 0.067 | 0.116 | 0.085 |
Chicago White Sox | 0.131 | 0.085 | 0.093 | 0.092 | 0.092 |
Cincinnati Reds | 0.109 | 0.091 | 0.083 | 0.087 | 0.063 |
Cleveland Indians | 0.107 | 0.090 | 0.113 | 0.092 | 0.113 |
Colorado Rockies | 0.129 | 0.098 | 0.129 | 0.103 | 0.081 |
Detroit Tigers | 0.113 | 0.086 | 0.074 | 0.089 | 0.095 |
Houston Astros | 0.116 | 0.106 | 0.112 | 0.112 | 0.104 |
Kansas City Royals | 0.113 | 0.079 | 0.083 | 0.087 | 0.080 |
Los Angeles Angels | 0.090 | 0.073 | 0.089 | 0.074 | 0.063 |
Los Angeles Dodgers | 0.102 | 0.073 | 0.077 | 0.091 | 0.101 |
Miami Marlins | 0.109 | 0.078 | 0.089 | 0.078 | 0.074 |
Milwaukee Brewers | 0.091 | 0.074 | 0.100 | 0.084 | 0.084 |
Minnesota Twins | 0.114 | 0.080 | 0.106 | 0.111 | 0.111 |
New York Mets | 0.115 | 0.084 | 0.090 | 0.083 | 0.085 |
New York Yankees | 0.130 | 0.065 | 0.089 | 0.093 | 0.077 |
Oakland Athletics | 0.119 | 0.085 | 0.093 | 0.076 | 0.075 |
Philadelphia Phillies | 0.119 | 0.083 | 0.078 | 0.088 | 0.077 |
Pittsburgh Pirates | 0.103 | 0.101 | 0.107 | 0.105 | 0.087 |
San Diego Padres | 0.111 | 0.089 | 0.081 | 0.084 | 0.082 |
San Francisco Giants | 0.131 | 0.097 | 0.101 | 0.103 | 0.094 |
Seattle Mariners | 0.105 | 0.056 | 0.090 | 0.108 | 0.072 |
St. Louis Cardinals | 0.104 | 0.076 | 0.109 | 0.082 | 0.077 |
Tampa Bay Rays | 0.096 | 0.067 | 0.071 | 0.063 | 0.082 |
Texas Rangers | 0.118 | 0.082 | 0.090 | 0.079 | 0.091 |
Toronto Blue Jays | 0.122 | 0.097 | 0.072 | 0.097 | 0.117 |
Washington Nationals | 0.122 | 0.074 | 0.089 | 0.097 | 0.097 |
Three year
Year | r |
2015-2017 | 0.39 |
2014-2016 | 0.51 |
Two year
Year | r |
2016-2017 | 0.37 |
2015-2016 | 0.48 |
2014-2015 | 0.32 |
One year
Year | r |
2017 | 0.35 |
2016 | 0.37 |
2015 | 0.34 |
2014 | 0.44 |
Again, the component BABIP is a bit more correlated than overall. Shifts also influence fly balls, though likely not to the extent of grounders. That said, better positioning independent of shifts could be a factor. It's only recently that a big deal has been made of players referring to cards or wristbands, reminding them where to play for specific hitters.
LINE DRIVES
Team | 2017 | 2016 | 2015 | 2014 | 2013 |
Arizona Diamondbacks | 0.634 | 0.669 | 0.679 | 0.645 | 0.655 |
Atlanta Braves | 0.579 | 0.679 | 0.621 | 0.671 | 0.625 |
Baltimore Orioles | 0.625 | 0.678 | 0.683 | 0.660 | 0.626 |
Boston Red Sox | 0.586 | 0.656 | 0.671 | 0.685 | 0.660 |
Chicago Cubs | 0.592 | 0.638 | 0.614 | 0.615 | 0.669 |
Chicago White Sox | 0.608 | 0.666 | 0.657 | 0.691 | 0.665 |
Cincinnati Reds | 0.611 | 0.670 | 0.645 | 0.640 | 0.602 |
Cleveland Indians | 0.634 | 0.662 | 0.655 | 0.655 | 0.667 |
Colorado Rockies | 0.630 | 0.658 | 0.666 | 0.669 | 0.641 |
Detroit Tigers | 0.609 | 0.672 | 0.706 | 0.667 | 0.679 |
Houston Astros | 0.595 | 0.665 | 0.686 | 0.645 | 0.659 |
Kansas City Royals | 0.601 | 0.705 | 0.677 | 0.655 | 0.641 |
Los Angeles Angels | 0.597 | 0.652 | 0.669 | 0.655 | 0.657 |
Los Angeles Dodgers | 0.622 | 0.638 | 0.664 | 0.663 | 0.661 |
Miami Marlins | 0.637 | 0.682 | 0.664 | 0.643 | 0.648 |
Milwaukee Brewers | 0.578 | 0.690 | 0.670 | 0.676 | 0.635 |
Minnesota Twins | 0.629 | 0.647 | 0.690 | 0.621 | 0.673 |
New York Mets | 0.638 | 0.683 | 0.662 | 0.647 | 0.655 |
New York Yankees | 0.612 | 0.656 | 0.662 | 0.669 | 0.673 |
Oakland Athletics | 0.581 | 0.659 | 0.647 | 0.661 | 0.643 |
Philadelphia Phillies | 0.630 | 0.676 | 0.690 | 0.653 | 0.664 |
Pittsburgh Pirates | 0.597 | 0.674 | 0.670 | 0.675 | 0.659 |
San Diego Padres | 0.626 | 0.649 | 0.659 | 0.662 | 0.645 |
San Francisco Giants | 0.593 | 0.683 | 0.679 | 0.657 | 0.637 |
Seattle Mariners | 0.616 | 0.645 | 0.655 | 0.662 | 0.619 |
St. Louis Cardinals | 0.618 | 0.678 | 0.667 | 0.644 | 0.628 |
Tampa Bay Rays | 0.617 | 0.642 | 0.686 | 0.632 | 0.630 |
Texas Rangers | 0.618 | 0.664 | 0.671 | 0.676 | 0.642 |
Toronto Blue Jays | 0.635 | 0.692 | 0.667 | 0.640 | 0.652 |
Washington Nationals | 0.610 | 0.654 | 0.655 | 0.639 | 0.625 |
Three year
Year | r |
2015-2017 | -0.01 |
2014-2016 | 0.10 |
Two year
Year | r |
2016-2017 | 0.18 |
2015-2016 | 0.26 |
2014-2015 | 0.23 |
One year
Year | r |
2016 | -0.03 |
2015 | 0.21 |
2014 | 0.06 |
2013 | 0.05 |
Based on the groundball and fly ball data, it follows that line drive BABIP shows little, if any correlation. This makes intuitive sense, since they're the most difficult batted ball to defend.
APPLICATIONS
How one chooses to apply the above data revolves around the objective and the depth of statistical understanding. Here are some general considerations.
Even though it's not perfect, and is variable year-to-year, the best BABIP predictor is the previous season's data. Assuming the use of the shift stabilizes, the variance should reduce. That is, once all 30 clubs are comfortable with their deployment and use it consistently each season, the component BABIP should stabilize a bit.
A groundball pitcher also yields fly balls and vice versa, while both obviously surrender line drives. In order to incorporate this into a formulaic projection system, the GB/FB/LD distribution needs to be projected for each pitcher, with each component BABIP influencing overall BABIP in proportion to their individual hit distributions.
Keep in mind a team's BABIP is already baked into the player's BABIP, so depending on how a projection engine works, it could be double-dipping. The same holds true for park factors, which also affect BABIP. To properly regress towards team BABIP, this impact must first be neutralized, then accounted for in the final result.
CONCLUSIONS
Here's my gut feel. While I recognize the advantage of starting with the most accurate baseline, it comes down to balancing practicality with the time and effort required to code the regression of component player BABIP to team BABIP. Quants will argue every degree of decimal point accuracy is beneficial; I'm not so sure. There's so much inherent cloudiness in player projections already, that small degree of accuracy is totally consumed by the haze. The regression would be towards a BABIP with some additional precision, but still not an especially strong correlation. Because the correlation on line drives is random, luck in either direction throws off the overall player BABIP. As has been discussed previously, skills are only part of a projection; the playing time component is crucial. The effect of 10 percent more or fewer innings is a more significant factor than a slightly better ERA baseline.
From a projectionist perspective, a great deal of work needs to go into estimating GB/FB/LD distribution. Is it based on history? How much does the improved ability to track pitches, and hence changes in repertoire, take the process from objective to subjective? How should the addition or subtraction of an excellent or poor defender alter team BABIP?
Again, quants are thinking it's worth the effort. The more I play fantasy baseball, the more I realize it's what you do with the projection, not the projection itself. It's not a secret I do my own projections. Part of that is determining an expected pitcher BABIP, mostly based on their historical hit distribution. I do not directly incorporate team defense, at least not globally. As alluded to earlier, in part that's because it's already baked into the pitcher's BABIP, especially for those players toiling for the same team for multiple seasons. I will, however, make individual adjustments as needed, usually for individuals changing teams. I'll investigate the difference in the team defense between the clubs and massage subjectively as necessary.
Yes, defense matters, but so many things are more relevant.