For those of you that have read some of my previous comments, you know I'm a big stats nerd. Stats can be a powerful tool for making inference and drawing meaningful conclusions about the future. That being said, in this article we will use historical data of Boston College's defense from last year to make a prediction about how our offense will do this year.
In this article we will conduct a multiple linear regression analysis to come up with a statistical model to predict how many points our offense should have scored last season. Based on trends we have seen in the off season, we will also use the model to predict how many points we may score this season.
First of all, don't be intimidated by the math terms I threw up above. This is pretty easy-to-grasp stuff. Basically, when I say a statistical model, all I mean is an equation that relates a response variable to one or more predictor variables. Here, the response variable is the number of points allowed by the BC defense last year. The predictor variables that I am including are the FEI efficiency (http://bcftoys.blogspot.com/) and the possession time of Boston College's opponents from last year's regular season games (note: I couldn't find Rhode Island's FEI ranking so I excluded that game).
The equation will be of the following form:
y(X1, X2) = B0 + B1*X1 + B2*X2 + E;
An explanation of these terms is:
y(X1, X2) = The predicted number of points allowed by the BC defense
B0 = The intercept term (a grand average of points allowed by BC's defense for all observations).
B1 = The coefficient (this is like a slope parameter) for the FEI Efficiency.
X1 = The FEI Efficiency.
B2 = The coefficient for the time of possession.
X2 = The time of possession.
E = Random error (we will account for this error by looking at a prediction interval).
OK! So if you are still with me, let's build the model. Using some stat software I have at work (using ordinary least squares regression), the model is:
y(X1,X2) = -12.6 + 75.9 * X1 + 0.844 * X2 + E
Based on our model, let's make a prediction about how many points we should have scored last year. Our FEI index was 0.205, and our time of possession was 21.9 minutes. Plugging in 0.205 for X1 and 21.9 for X2, the predicted number of points allowed by BC's defense is: 21.4. We actually scored 17 points. Our model was close, but why was it off? This is where the E (error) term comes in to play.
Football is a game of great variability. On paper, we probably should have scored more than 17 points last year, but as we all know this wasn't the case. Why? There are probably a number of reasons for this, such as the inconsistency of a young offense or BC having a stronger day defensively. We can account for these types of error by looking at a confidence interval. This interval helps to quantify uncertainty in a measurement based on different types of variation, referenced above. Let's look at the 95% confidence interval associated with our prediction of 30 points:
So, we are 95% confident that BC's defense would allow somewhere between 6 and 37 points to an offense with an FEI score of 0.205 that held onto the ball for 21.9 minutes. Our prediction of 21 points (and our observation of 17 points) certainly falls in this interval. This is one way of validating our model.
So what does all of this mean for next year? Some pretty solid off season trends that are well referenced on this site are that:
1) Our offense should be better next year.
2) BC's defense should be worse next year.
So, let's use our model to predict next year's score. To do this, I will make two assumptions:
1) Our FEI score next season will improve from 0.205 to 0.23.
2) We will possess the ball for exactly half of the game (30 minutes).
Based on this, our model predicts that we will score 30 points. Since we are predicting future observations, we need to use a prediction interval instead of a confidence interval. Prediction intervals encapsulate future, unobserved, values. The 95 % prediction interval is:
Thus, we are 95% confident that an offense with an FEI score of 0.23 that held on to the ball for 30 minutes would score somewhere between 5 and 56 points on last year's BC defense. Since we think that BC's defense is taking a turn for the worst this season, I would imagine that our offense will probably be somewhere on the higher end of this prediction interval.
Practically speaking, my guess is that 30 points would be about right. 56 seems a bit excessive. Chances are, we will be somewhere in the 30-56 range and not the 6 - 30 range. Feel free to use this model with your own settings of input values (if you think our FEI ranking bay be different than 0.23 or if you think we will have a different time of possession than 30 minutes).
What are your predictions?