FanPost

Beating an Opponent a Second Time in One Season

With the Miami victory over UNC there has been more talk of a possible rematch between FSU and Miami in the ACCCG. This invariably leads to comments about it being difficult for a team to beat another team a second time in one season. Usually these comments are propped up with nebulous statements like "everyone knows" or "statistics show".

These statements have been repeated so many times that they have passed into conventional wisdom. After all, when was the last time you saw a team beat another team twice in one season?

Most of the debate, if any, around the conventional theory seems to involve discussions of why this should be. Some of the reasons given seem sensible:

  • The losing team has time to shore up any weaknesses that were exposed in the first game.
  • A team learns more from a loss than a win.
  • The losing team is more motivated than the winning team.

The list goes on from there. Most of these explanations seem plausible or even likely to me, but only if the original premise is actually sound.

The more I thought about it, however, the less sense the original premise made to me. Sure, in some games the motivation from a previous loss or adjustments made by the previous loser might sway the outcome, but on average, aren't most games simply won by the better team? In which case, wouldn't the better team be more likely to win a second game (again, on average)?

And what about margin of victory? If the winner absolutely destroyed the loser in the first game, wouldn't you expect them to win the second? Besides, whenever someone says "statistics show" without citing a source I tend to get suspicious.

So I decided to look for myself.

The Analysis

James Howell has compiled a comprehensive database of game scores going back to the 19th century. I used his data for this analysis. You can find his data here. I have not verified the accuracy of this data, but a quick spot check of some of last year's scores showed them to be correct.

I am far too lazy to sort this data by hand, so I wrote a short ruby script to parse the data files and determine the number of repeat match ups and corresponding double victories. I'll post the code for this script at the end of this writeup for anyone that might want to repeat the calculation.

I first analyzed the games from the 2000 season through the 2009 season for various margins of victory. The results are given in the following table.

2000 - 2009

Margin of Victory # of rematches # of repeat wins %
1 pt or more 24 14 58%
3 pts or more 24 14 58%
7 pts or more 18 12 67%
14 pts or more 10 8 80%
21 pts or more 8 7 88%
28 pts or more 2 1 50%

 

The first column indicates the range of the margin of victory. The second column gives the number of rematches involving that original margin of victory. The third column is the number of times the original winner won the rematch. The fourth column is the percentage of second victories based on the previous two columns.

So, from the first row we see that the from 2000 to 2009, the number of rematches between teams where the original margin of victory was at least one point, i.e., not a tie, was 24. Of those 24 rematches, the original winner won the second game 14 times or 58% of the time.

As the margin of victory goes up, the likelihood of the original winner winning the second game goes up. For instance, when the original winner won by two touchdowns or more, they won the second game 80% of the time. The exception to this is when the original margin of victory was four touchdowns or more (the sixth row). Here the original winner won the second game only 50% of the time. I think the explanation for this dip is straightforward: there were only two such rematches from 2000 to 2009, which are not enough sample points.

Extending the analysis to include games form 1980 to 2009 yields the following:

1980 - 2009

Margin of Victory # of rematches # of repeat wins %
1 pt or more 35 21 60%
3 pts or more 34 20 59%
7 pts or more 24 16 67%
14 pts or more 14 11 79%
21 pts or more 8 7 88%
28 pts or more 2 1 50%

 

The results are remarkably consistent with the results from 2000 to 2009, although, admittedly, the last two victory margins involve exactly the same games, so they should be the same.

Extending the analysis all the way from 1900 to 2009 yields the following:

1900 - 2009

Margin of Victory # of rematches # of repeat wins %
1 pt or more 226 155 69%
3 pts or more 218 148 68%
7 pts or more 173 129 75%
14 pts or more 101 85 84%
21 pts or more 67 62 93%
28 pts or more 40 37 93%

Using this extended date range we see even stronger evidence to dispel the myth. In rematch games where the original game did not end in a tie, i.e., the margin of victory was 1 point or greater, the original winner won the second game 69% of the time. When the margin of victory was four touchdowns or more, the original winner won 93% of the time.

The evidence seems pretty clear to me. I think it comes down to this: more often than not, the better team wins.

So why does the myth persist? For one thing, I think it is because at first blush it seems plausible. For another, I think rematches are so rare that people tend to confuse the rarity of even playing the same team twice in one season with the likelihood of the first winner winning again. Most importantly, though, I think it comes down to the fact that people are generally really bad at understanding conditional probability. People tend to confuse the prior likelihood of a team winning both games with the conditional probability of a team winning the second game once they have won the first.

The prior likelihood of a team winning twice against the same opponent in a season is small. After all, for every time a team won both games the other team lost both. Once a team wins the first game, however, it has won the second 69% of the time. The historical evidence is pretty compelling.

 

The Code

I have tested my analysis on a small sample file, but I have not done rigorous testing. It's quite possible I have completely screwed up somewhere, so please feel free to give me some peer review here. To make this easier the script I used is given below. The data files must be downloaded from the original site. I ran the analysis on a MacBook Pro, but any Unix variant with ruby installed should work. I'm not sure about Windows machines.

#!/usr/local/bin/ruby

# total number of rematches for various margins of victory
rematch_count_by_threshold = {1=>0, 3=>0, 7=>0, 14=>0, 21=>0, 28=>0}

# total number of rematches won by the original winner for various margins of victory
double_winner_count_by_threshold = {1=>0, 3=>0, 7=>0, 14=>0, 21=>0, 28=>0}

# iterate over all the data files - one file per year
Dir["./*.txt"].each do |file|
  year = file[4,4]
  
  # map of games played by each team in the given year
  team_games = {}
  
  # open the data file and get its data
  IO.foreach(file) do |line|
    
    teamA = line[11,28].strip
    teamB = line[43,28].strip
    teamAScore = line[39,2].strip.to_f
    teamBScore = line[71,2].strip.to_f
    
    
    if team_games[teamA].nil?
      team_games[teamA] = {teamB=>(teamAScore-teamBScore)}
    else
      if team_games[teamA][teamB].nil?
        team_games[teamA][teamB] = teamAScore - teamBScore
      else
        # found a repeated game
        # uncomment the following line for verbose information about rematches
        #puts "#{teamA} played #{teamB} twice in #{year}"
        
        original_victory_margin = team_games[teamA][teamB]
        
        if original_victory_margin.abs > 0
          rematch_count_by_threshold[1] += 1
          if (teamAScore > teamBScore && team_games[teamA][teamB] > 0) || (teamAScore < teamBScore &&  team_games[teamA][teamB] < 0)
            double_winner_count_by_threshold[1] += 1
          end
        end
        if original_victory_margin.abs >= 3
          rematch_count_by_threshold[3] += 1
          if (teamAScore > teamBScore && team_games[teamA][teamB] > 0) || (teamAScore < teamBScore &&  team_games[teamA][teamB] < 0)
            double_winner_count_by_threshold[3] += 1
          end
        end
        if original_victory_margin.abs >= 7
          rematch_count_by_threshold[7] += 1
          if (teamAScore > teamBScore && team_games[teamA][teamB] > 0) || (teamAScore < teamBScore &&  team_games[teamA][teamB] < 0)
            double_winner_count_by_threshold[7] += 1
          end
        end
        if original_victory_margin.abs >= 14
          rematch_count_by_threshold[14] += 1
          if (teamAScore > teamBScore && team_games[teamA][teamB] > 0) || (teamAScore < teamBScore &&  team_games[teamA][teamB] < 0)
            double_winner_count_by_threshold[14] += 1
          end
        end
        if original_victory_margin.abs >= 21
          rematch_count_by_threshold[21] += 1
          if (teamAScore > teamBScore && team_games[teamA][teamB] > 0) || (teamAScore < teamBScore &&  team_games[teamA][teamB] < 0)
            double_winner_count_by_threshold[21] += 1
          end
        end
        if original_victory_margin.abs >= 28
          rematch_count_by_threshold[28] += 1
          if (teamAScore > teamBScore && team_games[teamA][teamB] > 0) || (teamAScore < teamBScore &&  team_games[teamA][teamB] < 0)
            double_winner_count_by_threshold[28] += 1
          end
        end
        
      end
    end
    
    # insert entries for team B - no need to count the rematches here as they will have already been detected above
    if team_games[teamB].nil?
      team_games[teamB] = {teamA=>(teamBScore-teamAScore)}
    else
      if team_games[teamB][teamA].nil?
        team_games[teamB][teamA] = teamBScore - teamAScore
      end
    end
    
  end
  
end

puts "The original winner won in #{double_winner_count_by_threshold[1]} of #{rematch_count_by_threshold[1]} rematches where the original victory was by 1 point or greater (#{double_winner_count_by_threshold[1].to_f / rematch_count_by_threshold[1].to_f * 100.0} %)."
puts "The original winner won in #{double_winner_count_by_threshold[3]} of #{rematch_count_by_threshold[3]} rematches where the original victory was by 3 points or greater (#{double_winner_count_by_threshold[3].to_f / rematch_count_by_threshold[3].to_f * 100.0} %)."
puts "The original winner won in #{double_winner_count_by_threshold[7]} of #{rematch_count_by_threshold[7]} rematches where the original victory was by 7 points or greater (#{double_winner_count_by_threshold[7].to_f / rematch_count_by_threshold[7].to_f * 100.0} %)."
puts "The original winner won in #{double_winner_count_by_threshold[14]} of #{rematch_count_by_threshold[14]} rematches where the original victory was by 14 points or greater (#{double_winner_count_by_threshold[14].to_f / rematch_count_by_threshold[14].to_f * 100.0} %)."
puts "The original winner won in #{double_winner_count_by_threshold[21]} of #{rematch_count_by_threshold[21]} rematches where the original victory was by 21 points or greater (#{double_winner_count_by_threshold[21].to_f / rematch_count_by_threshold[21].to_f * 100.0} %)."
puts "The original winner won in #{double_winner_count_by_threshold[28]} of #{rematch_count_by_threshold[28]} rematches where the original victory was by 28 points or greater (#{double_winner_count_by_threshold[28].to_f / rematch_count_by_threshold[28].to_f * 100.0} %)."

Fanposts are a section for the fans and do NOT reflect the views of Tomahawk Nation.