Decision Trees as the Best way to Predict Wins

Here at TN we pride ourselves on rational approaches to predicting wins. Much criticism has been showered upon those who simply count up all the games FSU will be favored in and predict that as the number of wins in a season. In the past, Bud and the gang here have emphasized proportional win shares as a means of predicting the outcome of season, and this is a great tool. However, this is certainly a limited tool and it has a flaw. It forces you to take a view on the whole season when you make your predictions- it is allows for no flexibility. There is a way to add flexibility into the model.

Think about all the statements such as:

"If we can only beat Oklahoma, then look out!"

"How would your prediction of the UF game change if we were 10-1 going in?"

"If we lose to Clemson, then we could easily lose to UNC"

..and other similar statements.  A decision tree allows you to incorporate all of these "what ifs" into your model. Decision trees are an important tool in business. They allow you to make decisions and predictions about strategic investments, competitor modeling, and research and development. Probably their most common use is in finance to price options using stochastic interest rate models (side note: I studied finance in business school under one of the inventors of this option valuation method, which is used much more commonly on Wall Street than the Black Scholes). The point is to break a linear prediction into several branches as you hit important milestones. You can then get a more accurate view of the future by looking at the valuations along the various branches of the tree. For our purposes le’ts assume we are back in August and we are trying to get a handle of the 2010 season (I have to use 2010 as this is sequential model and we don’t know the order of next year’s schedule yet). I choose 3 milestone games that will have a big influence in how I predict the other games in the season: OU, Miami, and Clemson. At each of those games I branch our standard proportional win shares up and down for a win and loss respectively, and assign each branch the percentage based on which way I think it goes. As we move along the season, one branch turns into two, which turns into four which turns into eight. See in the image below:


(NOTE: this were my predictions from back in August before we played or saw any opponents play)

As you can see, using the standard proportional win shares method, I had to take a bit more pessimistic view of the season and predicted barely 8 wins. We knew there was potential, but the huge uncertainty didn’t allow us to capture that in the static model. Extra branches allow you to model upside better and my prediction got ~.2 bump.

The great thing about this is allows you to model a lot of different scenarios. For instance, adding up the top 4 branches, I could say that I thought there was a 35% chance we got to 9 wins. Or I could say that there was only a 5% chance that we’d beat Oklahoma but then lose to Miami and Clemson (4th branch down) or that there was an 18% chance that we’d lose to all 3 in one season (bottom branch).

(NOTE: Here is a link for you to manipulate the spreadsheet yourself to estimate your own win outcome. I'm using the site given below, but if anyone knows a better way to post it for you to download let me know)

Binomial Decision Tree Spreadsheet

These aren’t the only type branch splits that you can model. You could have branches having to do with your opponents too. For instance you could split the UF game into two with one prediction for a good UF team and one prediction for a bad UF team and then assign them percentages based on how you think the new coaches do in the first year. You also aren’t limited to how many branches you create, just remember that you will end up with 2^(x) branches at the end, so you better be prepared to spend a lot of time if you start creating a lot of splits.

How would you guys structure a tree, and how do your predictions change using this model?

Fanposts are a section for the fans and do NOT reflect the views of Tomahawk Nation.