CardRunners Promo Codes

More on regression, xFIP, HR/FB, BABIP, and the like

August 18th, 2010 by in Player Discussion, Prediction, Theoretical

Yesterday, Chris Liss penned a post at RotoWire's RotoSynthesis blog that mentioned my luck, randomness, and Dan Haren article.  This stemmed a little debate in the comments section, which I wanted to respond to here since I'm incapable of writing up a succinct response that's appropriate for a comments section.

One commenter was Yahoo!'s Scott Pianowski, who said:

I love xFIP. It's a way for otherwise intelligent people to convince themselves that bad pitchers are really not that bad and great pitchers aren't so special. In other words, let's grade on a scale and find a way to bridge the gap between Aaron Harang and Adam Wainwright.

The best pitchers in baseball consistently beat the league average in HR/FB. Go look it up for yourself.

As Scott suggested, I looked up the HR/FB numbers, and here's what I found:

From 2004 to 2009, there were 43 pitchers (who spent at least half of their games starting – relievers are a different beast) who posted HR/OF rates below 10% (league average is 11%-ish) in at least 200 IP (one full season-ish).  If we require 400 IP, we get 29 pitchers.  At 600 IP, we get 22 pitchers.  At 800 IP, we’re down to 13.  Of course, there's a little bias here, but I think it's a pretty decent argument in favor of regression (if the endless, more rigorous studies aren’t enough).  To phrase these results differently, the fewer innings he's pitched, the easier it is for a pitcher to beat a league average HR/FB – luck!  The more he pitches, the more he regresses and falls off our list.  At the 3-year mark, we only have 5 pitchers below 9% (Cain, Kelvim Escobar, Clemens, Wainwright, and Wang).  At the 4-year mark, it's only Cain.

Regression is real, but that's not to say that xFIP or LIPS or any other ERA estimator is the end all, because it's not.  It's a shortcut, a quick way of seeing what a pitcher's peripherals tell us about him for that particular year.  It’s not meant to be a forecast.  It uses one year of data and implies 100% regression to the mean for BABIP, HR/FB, and LOB%, which is incorrect (but not terribly so for the vast majority of cases, which is why these things are usable if we know what we’re looking at).

Regression is real, but I think a lot of analysts give us the wrong impression of it.  They either (out of laziness or ignorance) assume that every BABIP, HR/FB, LOB% should be league average at all times.  That’s not what regression is!  Regression means that the player’s numbers should move in the direction of a league or group average – how far towards that number depends on a number of factors (in some cases it may not move much at all from the player’s actual performance).

For a guy like Aaron Harang, he has a .310 career BABIP and a .329 or so BABIP over the past three years.  For a guy like this, with this much data, it would be foolish to assume he’ll post a .300 BABIP going forward.  But it’s also foolish to assume that he’ll post a .329 BABIP going forward as well (in the absence of some other data that says he deserves a high BABIP).  If we know something about Aaron Harang from scouting or other means, like I said about Haren, we can say that it’s best to regress Harang to, say a .320 BABIP.  But if we don’t have these things, the best we can do is regress to league average (or some group average).

This doesn’t, however, mean that we assume Harang will have a league average BABIP.  That’s not what regression is.  It just means our estimate will move some distance toward that number.  We take all the data we have on him, and based on that sample size and the league-wide variance in BABIP, we can come up with a good estimation of his BABIP going forward.  This will be far more accurate than simply saying, “For three years in a row, Harang has had a high BABIP and an ERA higher than his xFIP, so xFIP is useless (not just for Harang, but for all players) and we should just use Harang’s ERA or our gut impression of him.”

So to answer Chris’s question, yes, I would make the exact same case for Harang or Masterson.  That is, yes, it’s possible that these guys truly deserve high BABIPs (or HR/FBs or whatever), but unless you have some sort of information that shows me that they should, I’m simply going to take the data I have, and regress the proper amount (and, ideally, treat vs. LHP and vs. RHP separately (but not independently), especially for Masterson).  We’re just looking at a different magnitude here.  For Haren’s BABIP, it’s one year and will be nearly completely erased when we account for previous seasons and regression.  For Harang, it’s several years and will show up somewhat in our projection.

Just because there are these guys that look like they are “exceptions” doesn’t mean that they don’t follow the rules of regression.  They do – they just regress less the more data we have on them.  And if we have some scouting or other information, they regress to a number other than league average.  Everyone regresses, but a lot of people assume that everyone regresses to league average, when in fact they don’t.  In fact, very few players regress to league average.  Everyone, truly, regresses to their own absolute true talent level (which is unknown), so we do the best we can to estimate that.  League average is the bare minimum acceptable guess we can make, but once we know some things about the player, we can regress him to a group of players similar to him (for example, small-framed lefties with underwhelming stuff and a fastball-slider-change repertoire) or to some unique number that better suits him than league average.

4 Responses to “More on regression, xFIP, HR/FB, BABIP, and the like”

  1. Mike l says:

    Harang is a flyball pitcher in a homer friendly ballpark.  He's going to give up a high quantity of home runs because of that, and the ballpark can help explain the higher HR/FB ratio.  I'd be interested to see his HR/FB ratios broken down by Home/Away splits.  I'd expect him to have a league average HR/FB ratio on the road.

  2. Derek Carty says:

    The thing with park effects that rarely gets mentioned, Mike I, is that they are often overstated.  A pitcher is going to throw half of his games at home and half on the road.  Assuming a neutral road schedule, knowing that GAB has a HR/FB factor of 1.19, and estimating that league average is 11%-ish, we can figure out that the expected HR/FB rate for a typical Reds pitcher should be just 12.05% for the year (and 13.1% when pitching in GAB).  That's far from drastic.  That Harang is a FB pitcher will increase the total number of HRs he allows but will have almost no bearing on his rate of HRs per FB.  He is right around that for his career (and thus his xFIP should be okay in terms of HR/FB), and his main problem seems to be BABIP.

  3. Kyle says:

    I agree with a lot of what you are saying, but Harang was one of the more underrated pitchers from 05-07.  In those three seasons, his fastball and slider were plus pitches while leading the league in k's once.  He actually pitches better at Great American Ballpark(ERA is 4.19 while being around 4.45 everywhere else.  His best seasons were when he was striking out 200 guys a season, and Dan Haren is an example of a first half ace.  He's a bum every second half… not only does his ERA jump about a run(3.29 to 4.22), but his WHIP goes from 1.10 to 1.31.
    First half ERAs from 2006 to 2009:  3.52, 2.30, 2.72, 2.01. Second half from 2006 to 2009:  4.91, 4.15, 4.18, 4.62… so maybe it's the fact that hitters just get used to seeing a guy(especially one who throws strikes and doesn't have a 96 mph fastball.)  
    Or maybe if Harang pitched in Petco, Safeco or AT&T Park in San Francisco like Cain, he'd still be a quality pitcher.  Some pitchers(and hitters)let their ballparks get into their heads…  Oswalt dominated at Enron/Minute Maid Park while other pitchers like Lima and Elarton couldn't handle it.  Look at Jason Bay's power drop-off this season, it's obvious the ballpark plays a major factor.  Harang hasn't looked like a top of the rotation pitcher since 2007, but that could just be pressure from a huge contract, his lack of conditioning, or the fact that his innings jumped to fast.  He went from throwing 78 innings to 76 in 2003, and in 2004 threw 161.  After that his innings jumped for 3 seasons twice being 230 plus.
    Or maybe he's like Dan Haren(whom I doubt will be an ace in years to come), because he doesn't have an overpowering fastball, a pitch that puts people away for a whole season, and has a fastball that has gone from 92-95 to 89 to 92.  Haren barely throws his fastball 42 percent of the time often relying on his cutter that has gone from a devastating pitch(when hitting 90-91)to his worst pitch that rarely hits 87 mph.  Both also give up far too many homeruns.  I think that ballpark factors are usually overstated, but look at how Coors Field made mediocre hitters like Dante Bichette, Jeffrey Hammonds, Jay Payton, and others look like Mantle.  Or how Darryl Kile went there and watched his ERA jump from 2.57 to 5.20 then 6.61 only to go to the Cards and win 37 games with an ERA of 3.50.  Or Mike Hampton sees his ERA go from 3.14 to 5.41 but hits 7 homeruns(while never having hit one in 365 at-bats.)  He was hitting bombs every 11 at-bats(something A-Rod's barely done.)  

  4. Derek Carty says:

    I'm just going to touch on a few of these things, Kyle, because I think you're falling into a few of the traps that people often do.
    With Harang, he led the league in Ks and was K'ing 200 per year because he was throwing 230 innings per year.  That many innings will lead to a high volume of Ks.  His K rate has actually only declined a little bit (likely due to age) over the past few season, but he's pitched far fewer innings, and even at that, he had a K/9 over 8 just twice in his career anyway (2006 and 2007).  I don't think that either him or Haren "give up too many HRs."  Harang is flyball-ish pitcher, so he gives up more than other pitchers do, but his career HR/FBs is right where it should be (so is Haren's).  He's no different than a guy like Matt Garza or Cole Hamels when it comes to HRs, and I doubt you'd be arguing that they are bad pitchers.

    You say that Haren is a bum every second half, but there's very little evidence to suggest that first half/second half splits tell us much about a pitcher.  And even if Haren has had four seasons with a worse second half (and you managed to leave out 2005 – when his 2nd half was much better – and this season – when his ERAs have been identical), we're looking at about 175 second-half innings .  That's not even two full seasons worth of innings!  Do you know how much variation there is in a pitcher's ERA over two seasons?  A TON. You wouldn't write a pitcher off as a viable option after a season and a half of a less-than-stellar ERA, would you?  But because these are 1st half/2nd half splits, our human nature tries to convince us that there must be some important pattern here.

    As to park factors/Coors/etc, I'm much less of a fan of park factors for hitters, especially for HRs, because parks don't affect all hitters the same, or to the extent that they do pitchers, but that's another discussion entirely.  Still, you're cherry-picking here, using circumstancial and anecdotal evidence.  For every Darryl Kile, I'm sure I could find an opposite example for you. Let's see: Jorge Julio, Jorge de la Rosa, Jason Marquis, Mike DeJean, Joe Kennedy, Jason Hammel, Dan Miceli, etc etc

    And the Mike Hampton/A-Rod example, that only serves to prove my point – that these things are prone to lots of variance.  Surely you don't think that Hampton suddenly became A-Rod because of Coors Field, that he had that kind of power all along and it could only manifest itself in Coors Field.  Of course he didn't.  Coors likely helped him a bit, but an AB/HR of 11 is ridiculous and was surely a small sample size abberation, as most things like those discussed in our comments are.

Leave a Reply

Or Log in

DraftDay

Draft Day. Every Day. Play daily fantasy sports and win real money.

Twitter Facebook