Decision Trees as the Best way to Predict Wins
Here at TN we pride ourselves on rational approaches to predicting wins. Much criticism has been showered upon those who simply count up all the games FSU will be favored in and predict that as the number of wins in a season. In the past, Bud and the gang here have emphasized proportional win shares as a means of predicting the outcome of season, and this is a great tool. However, this is certainly a limited tool and it has a flaw. It forces you to take a view on the whole season when you make your predictions- it is allows for no flexibility. There is a way to add flexibility into the model.
Think about all the statements such as:
"If we can only beat Oklahoma, then look out!"
"How would your prediction of the UF game change if we were 10-1 going in?"
"If we lose to Clemson, then we could easily lose to UNC"
..and other similar statements. A decision tree allows you to incorporate all of these "what ifs" into your model. Decision trees are an important tool in business. They allow you to make decisions and predictions about strategic investments, competitor modeling, and research and development. Probably their most common use is in finance to price options using stochastic interest rate models (side note: I studied finance in business school under one of the inventors of this option valuation method, which is used much more commonly on Wall Street than the Black Scholes). The point is to break a linear prediction into several branches as you hit important milestones. You can then get a more accurate view of the future by looking at the valuations along the various branches of the tree. For our purposes le’ts assume we are back in August and we are trying to get a handle of the 2010 season (I have to use 2010 as this is sequential model and we don’t know the order of next year’s schedule yet). I choose 3 milestone games that will have a big influence in how I predict the other games in the season: OU, Miami, and Clemson. At each of those games I branch our standard proportional win shares up and down for a win and loss respectively, and assign each branch the percentage based on which way I think it goes. As we move along the season, one branch turns into two, which turns into four which turns into eight. See in the image below:
(NOTE: this were my predictions from back in August before we played or saw any opponents play)
As you can see, using the standard proportional win shares method, I had to take a bit more pessimistic view of the season and predicted barely 8 wins. We knew there was potential, but the huge uncertainty didn’t allow us to capture that in the static model. Extra branches allow you to model upside better and my prediction got ~.2 bump.
The great thing about this is allows you to model a lot of different scenarios. For instance, adding up the top 4 branches, I could say that I thought there was a 35% chance we got to 9 wins. Or I could say that there was only a 5% chance that we’d beat Oklahoma but then lose to Miami and Clemson (4th branch down) or that there was an 18% chance that we’d lose to all 3 in one season (bottom branch).
(NOTE: Here is a link for you to manipulate the spreadsheet yourself to estimate your own win outcome. I'm using the site given below, but if anyone knows a better way to post it for you to download let me know)
Binomial Decision Tree Spreadsheet
These aren’t the only type branch splits that you can model. You could have branches having to do with your opponents too. For instance you could split the UF game into two with one prediction for a good UF team and one prediction for a bad UF team and then assign them percentages based on how you think the new coaches do in the first year. You also aren’t limited to how many branches you create, just remember that you will end up with 2^(x) branches at the end, so you better be prepared to spend a lot of time if you start creating a lot of splits.
How would you guys structure a tree, and how do your predictions change using this model?
27 comments
|
5 recs |
Do you like this story?
Comments
try this
http://sheet.zoho.com/excelviewer
"History I believe furnishes no example of a priest-ridden people maintaining a free civil government." — Thomas Jefferson to Baron von Humboldt, 1813
"MacGyver is the Jesus Christ of Science" — me
Love this.
let me think about it more before I post a real response.
"History I believe furnishes no example of a priest-ridden people maintaining a free civil government." — Thomas Jefferson to Baron von Humboldt, 1813
"MacGyver is the Jesus Christ of Science" — me
I like this.
Just took a class on this stuff and this application of it is great!
>>>-----------;;;-->"I guess they have a reputation of being more of a tricky team and not being tough. You hit 'em in the mouth, and they don't like it. Other teams that have beat them just hit them in the mouth, so that's what we started out with.'' -Florida State safety Nick Moody >>>-----------;;;-->
>>>-----------;;;-->"It means so much to me. Just beating those guys. They were recruiting me so heavy. I remember when I didn’t go there, they said, ‘You will never beat us.' For me to do it, it just shows them that they were wrong, you know? Words can’t really explain the way I feel right now. This is why I came here. I had an opportunity to go to Florida, but I chose to come here because I felt it was my home. I haven't seen this since I was in middle school. Words can't explain the way I feel right now." -Nigel Bradham>>>-----------;;;-->
Excellent idea.
I routinely, as I’m sure most do, adjust my expectations throughout the season. With the tree I’d probably add branches along the way. Like say, when a key injury happens (Ponder) or an upcoming opponent is ripe for a letdown. I’d probably get way too carried away though.
"You should always swing as hard as you can...Just in case you hit the ball." - Dale Murphy
by Dr.KennethNoisewater on Dec 16, 2010 9:58 AM EST reply actions
Business ?
Was the model you were taught used more frequently on Wall Street b/c it is easier to price American options?
I was only taught the Black Scholes in college, which I believe is used to price European call options. I always wondered what the best way would be to price American options, since they can be called at any time unlike Euro options, but I was always too lazy/stupid to find out.
What was the name of his model?
PS – kudos to applying this to FSU
Almost no one uses
B-S in real life. It’s an elegant theory that is easy to calculate butis a bit impractical and inaccurate due to some assumptions (like dynamic hedging). Most people use some form of binomial tree like I have above (except a lot more complex). Popular examples are the Black-Derman-Toy model or the Ho-Lee model.
These models are used for both European and American, but are especially useful for American because you can make assumptions along various branches that it will be exercised if the expected hold value falls below the exercise value.
by TuckNole on Dec 16, 2010 11:31 AM EST via mobile up reply actions
To clarify
Black-Derman-Toy and Ho-Lee are specific interest rate determined derivatives. For an equity derivative, you would probably use something more proprietary based on your assumptions about the movement of he underlying asset.
by TuckNole on Dec 16, 2010 11:45 AM EST via mobile up reply actions
I love this
If someone took the time to do this for a whole season, the server might explode on account of how awesome that would be.
by Sem1nole on Dec 16, 2010 12:46 PM EST reply actions 1 recs
This is actually quite elegant
This gives the prediction maker more control over the variance that comes with winning and losing.
I’ll argue that winning/losing key games does not have the cumulative magnitude of that you add/subtract in the branches of your tree – except in the instances where “the wheels come off the bus” type losing streaks.
FSU Defense 2010: Taking back 1st down.
I think maybe he is setting this up
as in “if FSU loses to BC, X scenerio will occur because before I was over valuing FSU (if I think BC isn’t that great in this example).”
I could be wrong here, but I thats the assumption I am making.
I'm NOT a stats guy, but have a question:
does the decision tree method treat games as independent?
Accountabilty is back in Tallahassee....
Correct me if I am wrong but
The proportional win share method treats games as independent because each % that you assign to a game does not change based upon what happened prior to it. If you update it throughout the season, then your new outlook, which was affected by what you saw during the previous games, is no longer independent from the games before it (but looking forward the games are still independent of each other).
The decision tree can do the opposite and treats games as dependent whenever you throw another branch into the mix. This is because the percentage you assign to the game before the branch affects the amount of weight given to each of the two paths.
That's what I thought
and I think dependence is an implausible assumption for a predictive model for CFB wins and losses.
Accountabilty is back in Tallahassee....
I definitely agree about dependence
You can always throw arguments for/against doing better/worse after a win/loss. “They will be pumped up from a win.” “They will be let down after a loss.” “They will be complacent after a win.” “They will be fired up after a loss.” etc.
However, what if you would end up with two different proportional win share models depending on how well EJ takes over the reigns (or how well the offensive line reloads, or how well the defense progresses over the off-season, etc.)? You could account for these variables by creating multiple proportional win share models and assigning each one a different probability (and therefore a different weight). Ex. If my two models predict 9.3 wins under the assumption that EJ succeeds and 7.5 wins under the assumption that he doesn’t and I have a 75% certainty that he succeeds, I can multiply my 9.3 wins by 0.75 (netting 6.975 expected wins) and 7.5 wins by 0.25 (netting 1.875 expected wins) and add them together to get 8.85 expected wins overall. This is basically what the decision tree does and it doesn’t have to be based solely upon the outcomes of games (and I wasn’t really that clear on my prior post).
Regarding games, some people believe that dependence is a valid assumption. While the outcome of a game does affect future games (by changing attitudes, practice routines, etc.), I do not think it has any predictive value (especially this far in advance). However, I think each game gives you another data sample towards understanding how good or bad the team actually is (or how likely it is that EJ succeeds, to use the same example). If we win against Oklahoma, what would that tell you about your underlying assumption about EJ’s success? What if we beat Miami, but lose to Oklahoma? etc. The decision tree as applied to CFB games allows you to use each game’s information to better adjust your prediction. At least, that is my take on it.
That is just a binomial tree
with one branch at the beginning of the year.
This is great stuff
I’m in my first semester of b-school right now and just covered this in my econ class (take finance in spring) and I never thought about applying it to sports until I saw your comment the other day.
In addition to the share method we should have a parallel competition for folks to try this method next fall to predict out our season.
Hi Urban, meet Jimbo.
It seems to me that maybe there should be more splits earlier?
Or larger?
I just think we can learn a lot about teams in the first 2-3 weeks of the season, especially when there are unanswered questions about improvement, etc.
by BenDNole on Dec 17, 2010 1:18 PM EST via mobile reply actions
It's possible that would yield better results
but remember that you are going to have 2^x end branches so the more you split the more contingent win shares you will need to think about. Gets hairy pretty quick.
Also, it’s better to branch at “important” events as these have a bigger effect on the end value mathematically fairly certain outcomes. If I branched at Wake for instance, the entire value of the “lose” branch and all it’s subsequent splits would be multiplied by 10%, causing you to do a lot of estimation work for a highly unlikely series of events.
But in theory you could be right.
by TuckNole on Dec 17, 2010 1:53 PM EST via mobile up reply actions
I definitely agree about branching at important events
Especially since they often coincide with the greatest gain in the ewp model (meaning winning a game your expecting to will likely only gain .2-.3 ‘wins’ … whereas an unexpected win will gain anywhere from .5-.75 ‘wins’.
But I also think there can be branches at non-important events early in the season as you see how the team is performing.
It's a novel idea, but I'm not a fan.
Winning against teams we predict will be tough definitely correlates to additional victories, but there is no reason why this can’t be taken into account in the more simple model. Since winning against Clemson meant a lot less than I thought it would in August, you add increased error by giving us more credit for a given win. For instance, even though we thought OK was a top 10 team they could have turned out to be terrible (e.g., the gators). If OK was bad, we may have beaten them and it wouldn’t have said anything about our team. Despite this, the model would give us on average .7 more wins against future opponents. See how this adds error?
I think it would be more logical to assign a “tolerance” to each win based on how certain you are in your guess. If you believe there are things you don’t know about Clemson or FSU you can assign .6 +/- .1 wins. For instance, you might make your projections based on EJ being only as good as we’ve seen him, but you might believe there is a chance he will be much better. Based on this, all of your selections could have a tolerance of +.1 / 0. Then at the end you end up with projected number of wins and a tolerance based on how good or bad we might be. My projection of 7.85 wins may have changed to 7.85 +/ 1.15, indicating a projected range of 6.70 to 9.00 wins.
You add a novel thought
But your model is still static. Error terms do not make it dynamic. Basically boils down to: If we beat Oklahoma, will you revise any subsequent win shares? If the answer is yes, then you need to account for that dynamism somehow.
My answer is "Perhaps." That's my point.
You don’t necessisarily learn anything about FSU if they beat OK. You certainly don’t learn any more about FSU then you learn about OK. When I made my projections I didn’t realize NCST would be nearly as good as they were, nor did I imagine Miami would be so inconsistent. By the time we played Clemson we had learned a lot about FSU, but we had also learned a lot about Clemson. To only consider what we learned about FSU and not what we learned about Clemson (or OK or Miami) seems silly. If OK had gone the way of Texas and completely fallen apart we would be crazy to say that our beating them signaled the fact that we were a better team.
If you want to say that the more wins we have the more likely we are to win the next game I will grant you that. My disagreement is in considering our record against projected quality opponents without weighing it against the record those opponents actually achieve. Adding branches to the tree to account for various potential strengths of our opponents and our success against those opponents would lead to more accurate estimating, but the model becomes so complex that it starts to lose its intended purpose: entertainment. I believe that adding branches only to games against specific opponents based only on pre-season assements of the quality of that win makes the model less accurate because of the added error.
But if it makes it more fun for you, go at it. I’m only pointing out my disagreement because it’s fun to debate. I thought we’d struggle for 8 wins, so what do I know! :)
Thank you
for saying what I was trying to say more eloquently.
HOW we play against OK next September is probably more important than the actual outcome when it comes to accurately predicting our final record. (A well played game that we lose in overtime would mean infinitely more than if we win after knocking Landry out of the game in the 1st quarter.)
FTR, not saying I disagree with the branches, but maybe just how you decide why and when to branch.

by 

































