BC Defensive Preview for the Statistically Inclined
For those of you that have read some of my previous comments, you know I'm a big stats nerd. Stats can be a powerful tool for making inference and drawing meaningful conclusions about the future. That being said, in this article we will use historical data of Boston College's defense from last year to make a prediction about how our offense will do this year.
In this article we will conduct a multiple linear regression analysis to come up with a statistical model to predict how many points our offense should have scored last season. Based on trends we have seen in the off season, we will also use the model to predict how many points we may score this season.
First of all, don't be intimidated by the math terms I threw up above. This is pretty easy-to-grasp stuff. Basically, when I say a statistical model, all I mean is an equation that relates a response variable to one or more predictor variables. Here, the response variable is the number of points allowed by the BC defense last year. The predictor variables that I am including are the FEI efficiency (http://bcftoys.blogspot.com/) and the possession time of Boston College's opponents from last year's regular season games (note: I couldn't find Rhode Island's FEI ranking so I excluded that game).
The equation will be of the following form:
y(X1, X2) = B0 + B1*X1 + B2*X2 + E;
An explanation of these terms is:
y(X1, X2) = The predicted number of points allowed by the BC defense
B0 = The intercept term (a grand average of points allowed by BC's defense for all observations).
B1 = The coefficient (this is like a slope parameter) for the FEI Efficiency.
X1 = The FEI Efficiency.
B2 = The coefficient for the time of possession.
X2 = The time of possession.
E = Random error (we will account for this error by looking at a prediction interval).
OK! So if you are still with me, let's build the model. Using some stat software I have at work (using ordinary least squares regression), the model is:
y(X1,X2) = -12.6 + 75.9 * X1 + 0.844 * X2 + E
Based on our model, let's make a prediction about how many points we should have scored last year. Our FEI index was 0.205, and our time of possession was 21.9 minutes. Plugging in 0.205 for X1 and 21.9 for X2, the predicted number of points allowed by BC's defense is: 21.4. We actually scored 17 points. Our model was close, but why was it off? This is where the E (error) term comes in to play.
Football is a game of great variability. On paper, we probably should have scored more than 17 points last year, but as we all know this wasn't the case. Why? There are probably a number of reasons for this, such as the inconsistency of a young offense or BC having a stronger day defensively. We can account for these types of error by looking at a confidence interval. This interval helps to quantify uncertainty in a measurement based on different types of variation, referenced above. Let's look at the 95% confidence interval associated with our prediction of 30 points:
95% CI:
(6.92, 36.04)
So, we are 95% confident that BC's defense would allow somewhere between 6 and 37 points to an offense with an FEI score of 0.205 that held onto the ball for 21.9 minutes. Our prediction of 21 points (and our observation of 17 points) certainly falls in this interval. This is one way of validating our model.
So what does all of this mean for next year? Some pretty solid off season trends that are well referenced on this site are that:
1) Our offense should be better next year.
2) BC's defense should be worse next year.
So, let's use our model to predict next year's score. To do this, I will make two assumptions:
1) Our FEI score next season will improve from 0.205 to 0.23.
2) We will possess the ball for exactly half of the game (30 minutes).
Based on this, our model predicts that we will score 30 points. Since we are predicting future observations, we need to use a prediction interval instead of a confidence interval. Prediction intervals encapsulate future, unobserved, values. The 95 % prediction interval is:
95% PI:
(5.28, 55.14).
Thus, we are 95% confident that an offense with an FEI score of 0.23 that held on to the ball for 30 minutes would score somewhere between 5 and 56 points on last year's BC defense. Since we think that BC's defense is taking a turn for the worst this season, I would imagine that our offense will probably be somewhere on the higher end of this prediction interval.
Practically speaking, my guess is that 30 points would be about right. 56 seems a bit excessive. Chances are, we will be somewhere in the 30-56 range and not the 6 - 30 range. Feel free to use this model with your own settings of input values (if you think our FEI ranking bay be different than 0.23 or if you think we will have a different time of possession than 30 minutes).
What are your predictions?
23 comments
|
0 recs |
Do you like this story?
Comments
cool article
First of all, don’t be intimidated by the math terms I threw up above. This is pretty easy-to-grasp stuff. Basically, when I say a statistical model, all I mean is an equation that relates a response variable to one or more predictor variables.
That part made me laugh a little bit.
I am really getting into the statistical side of football analysis, thanks in large part to this site, Rock M Nation, and some others.
You seem to be more of a “pure” stats guy, meaning that you are actually trying to come up with concrete score projections, rather than simply validate an observation or theory. I think you have a much harder task, and one that is much more readily critiqued or proven correct.
The only trouble I have with the projections is that the 95% confidence level produces a huge range of scores, as you have observed. Is there a way to narrow the band down (without turning down the confidence level) by introducing another variable, in order to get a tighter score range? I’m not sure how that might be possible, or what that would entail, but I’m going to think about it for a while and see if I can come up with anything.
Wow
Thus, we are 95% confident that an offense with an FEI score of 0.23 that held on to the ball for 30 minutes would score somewhere between 5 and 56 points on last year’s BC defense.
Ok where to start. Um ya um. there are a lot between 5 and 56points. If we have the ball for 30+min and only score5 points Im going to be drunk and screeming because we mostly lost.
Why is the sky blue? Because, God Loves the Infantry
5-56 is a big interval..
..but I did mention that practically speaking, we will probably see a score closer to the predicted value of 30 (probably >30 but < 56 considering our offense and their defense). Just trying to get in the ball park with this.
I'm going to need to buy a new tv
because I will have shot up my flatscreen like Elvis used to do when Robert Goulet came on.
wife would kill me if I got us another TV
But thats a dif story.
Why is the sky blue? Because, God Loves the Infantry
This brings up an interesting point.
The predictor variables that I used here are both numerical, or continuous. We could definitely use categorical, or attribute, factors as well. For example, we could investigate if conference is a statistically significant factor. We can investigate any number of predictors, and if we find influential ones, I would say that it would be possible to narrow these 95% confidence / prediction intervals. Finding influential ones is the key, and I agree that this makes for a difficult task.
That's my take on the defense.
http://www.tomahawknation.com/2009/8/21/997539/nole-your-enemy-the-shrinking
Mike this is a great post. Thanks. Trying to digest. I am just amazed at how good BC’s defense was last year.
Just having...
B.J. Raji out of there will help immensley. As a Packers fan, I watched him push around some NFL linemen like scrubs Saturday versus Buffalo. Happy to have him on the Packers, glad he’s gone from BC…
i saw that...
it was ridiculous. great get for the packers.
Not an alcoholic, just an FSU grad.
by onebarrelrum on Aug 24, 2009 10:53 AM EDT up reply actions
Found this video...
which has a couple plays from FSU-BC last year that stood out. One was of Raji bursting through the line and taking down Antone with one hand, another couple he bullrushed our guys right off their feet, starting around the 1:25 mark.
I forgot how painful that game was until I watched this… Amazing what you can block out ;-)
man
every tackle BC makes, the black uniforms just vanish under all that white. monsters. i know the video is just BC highlights but it seriously looked like a JV squad go up against varsity.
Not an alcoholic, just an FSU grad.
by onebarrelrum on Aug 24, 2009 4:08 PM EDT up reply actions
Here's a thought...
What about developing one of these for each of Bradford, McCoy, Harrell last season (maybe use QB rating instead of points scored, and use defensive FEI and TOP (any others?) as some of the independent variables).
Then apply to each of FSU’s opponents last year and tell what we would expect their QB rating to have been against that competition.
I dunno, is that worth exploring? We’ve had these recent ‘Ponder will be as good as McCoy/Bradford’ arguments. Maybe this would add some perspective on how productive Ponder needs to be this year to substantiate those arguments.
We could also
See how Ponder would have faired against the defenses that McCoy, Harrel, and Bradford faced. Between this type of analysis and the one you mentioned we may find that he is not that far from them in the first place.
...But
Does anyone have a good link to 1) a breakdown of QB efficiencies for McCoy, BRadford, Harrell, and Ponder by game and 2) defensive efficiencies for all 08 teams?
We could always construct our own QB efficiency if this isn't already listed somewhere:
It’s fairly easy to construct is we have game by game attempts, completions, yards, TDs, INTs.
Wiki Article on QB efficiency
I looked at this and here is a summary of what I found...
I used defensive efficiencies to predict the QB efficiency of Bradford, McCoy, and Ponder for the 08 season. In doing so, I found the defensive efficiency to be a reasonable predictor for Ponder (he did better against teams that were not as good on defense). However, both McCoy and Bradford posted great QB efficiencies against good and bad defenses. Therefore, defensive efficiency is not a reasonable predictor for these two QBs (predictions made from my models are not legitimate).
I would like to compare Ponder’s 08 numbers to McCoy’s 06 numbers and Bradford’s 07 numbers, but I’ll be damned if I can a good list of team defensive efficiencies for these seasons. I think this would be a more fair comparison anyways, so if anyone has a link to 07 and 06 defensive efficiencies I’d appreciate it!
Hmm, maybe I can find that.
http://spreadsheets.google.com/pub?key=pkpk-Zkv_WsK8yPxzZcyiXw&gid=0
:)
Oh, make sure you use AOE not OE (since it’s adjusted). They are aligned in different columns in this link.
Thanks FSUn!
These efficiencies are different than the ones I am using for 2008:
http://web1.ncaa.org/mfb/natlRank.jsp?year=2009&div=IA&rpt=IA_teamdefpasseff&sit
I want to make sure I do a fair, “apples-to-apples” comparison between the Ponder of 08, the McCoy of 06, and the Bradford of 07.
So do you have the ADE and AOE for 08 as well?


































