Skip to main content

How Good are Preseason Rankings?

Around this time of year, the various pre-season college football publications start appearing on the shelves. For some time now, I have often wondered if there was a good way to evaluate how good or bad the various preseason rankings really are.  This year, I decided that I would try to figure it out.  Now, it would straightforward to simply compare the various preseason rankings to the post-season CFB playoff ranking, AP ranking, or coaches poll.  But, that only tells the story for about 1/3 of all Division 1, and I was looking for something a bit more comprehensive.

From time to time, I have discussed and posted data based on an algorithm that I have developed to generate my own power rankings.  Since my method does assign a ranking to all 128 Div. 1 teams, is typically a reasonable predicter of Vegas spreads (more on that later), and since I also tabulate preseason predictions from various sources to support my annual preseason analysis (coming soon to a message board near you), it occurred to me that I had all the data that I needed to make this comparison. So, I went back over the data from the last 10 years or so and compared various full 128 team preseason rankings (from sources such as Phil Steele, Athlon’s, Lindys, ESPN (FPI), and SP+) and tabulated the average absolute difference between their rankings and my algorithm’s post-season rankings for all Division 1 teams.  The results are shown below:



Now, as you can see, I do not have a perfect data set to work with. I only have multiple source rankings for the last 5 years, and you also must trust that my algorithm is reasonable approximation of the relative strength of teams.  In any event, there are several interesting observations from this table:

First, for the limited data that I have, Phil Steele’s publication appears to give consistently smallest error between the preseason and my simulated post-season rankings.  I have his data as the best in 4 of the 5 years where I have rankings from multiple sources. He always advertises that his rankings are the most accurate, and I cannot dispute that with this analysis.  Second, that said, there is not a huge difference between the different publications.  So, there is no strong reason to rush out and buy any one of these publications over the other based on the rankings alone (I will comment a little more on this later). Third, all the publications don’t seem to get that close to the final rankings.  The average deviations are all in the range of 15-20 slots which is an average error of ~15%.  That does not seem great to me.

I wanted to dive a little deeper into the third point.  As the table indicates, I have the most historical data on Phil Steele’s rankings, so I decided to go back ten years and compare the all of his preseason rankings to all my post-season rankings.  There are several ways to look at this data, but I find the most informative to be a histogram of the deviations, a scatter plot, and a plot of the average post season ranking as function of the initial Phil Steele ranking (basically the scatter plot data where the y-axis instead contains the average and standard deviation / error bars for each rank instead of each individual data point). Once again, there are several conclusions we can draw from this data.
First, the histogram gives us an idea of the distribution of the variance. It is fairly bell shaped, with 24% of the picks falling within +/- 5 slots of final rankings, and 41% falling between +/- 10 slots.  But, the tails of the distribution are also fairly long.  23% of all of Steele’s picks are not within 30 slots of the final ranking.  The scatter plot tells a very similar story and in this case, we can see that the correlation (R squared = 0.66) is OK, but not that great. The scatter plot also tends to highlight the real misses, like when Steele ranks a team in his top 20 (like Illinois in 2009) but then this team winds up 3-9 with a ranking in the 80s by my algorithm, or when teams like Utah St and San Jose St. in 2012 are ranked around 100 by Steele, but wind up ranked in the top 25 by my algorithm and the national polls.  The plot of the average ranking vs. initial ranking data shows the Phil Steele data in perhaps the best light.  This plot shows that for any given ranking, on average, Steele is pretty close, but the error is still quite large.  Notably, the deviation is much smaller for teams in Phil Steele’s ~Top 5.  Historically, those teams do usually wind up having great seasons, but there are exceptions (like the 2007 Louisville team, which started ranked #4, but who ended 6-6).  That said, the same trend is also found at the bottom end of the chart, so it might have more to do with the fact that teams ranked high (or low) only really have one direction to go: down (or up).  That fact is best illustrated by a plot of standard deviation of post season ranking as compared to preseason rankings (basically, the plot of the error bars as a function of preseason rankings) which is show here with a clear parabolic trend.





What is perhaps the most interesting aspect of all of this to me harkens back to my second observation shown in the 1st table above: the fact that the deviations from the different publications are all basically the same for a given year.  To visualize this, I plotted the predictions from the two publications for which I have the most data tabulated (Phil Steele and Athlons) and plotted that in a scatter plot, which is shown here:



Not surprisingly, the correlation between the two predictions is rather high (R-squared = 0.91) and much higher than the correlation to reality, so to speak.  So, as my first conclusion, I think that we can say that pre-season predictions are OK, but not great (they are certainly not destiny) and they agree with each other far more than they will agree with the actual results on the field.

This analysis led me to think about another interesting topic which is related to the first.  Now that we have looked at the robustness of preseason rankings, what about in-season predictions?  More specifically, what about metrics such as EPSN’s vaunted FPI?  In the 2016 season, I decided to put the FPI to the test alongside my own algorithm to see how they performed. As it turns out, this is a tricky question because defining “performance” in this context is not as easy as you might think.  A big part of the reason why is that there is generally a very poor correlation between any predicted margin of victory and the actual result.  The best predictor, I suppose not surprisingly, is the Vegas Spread, and a plot of the scatter plot of the actual game margins vs. the opening Vegas spreads for the entire 2016 season is shown here.  As you can see the R-squared is a pathetic 0.214.  But, this is better than the FPI, which only mustered an R-squared of 0.196 and, sadly, my algorithm, which only mustered an R-squared of 0.167.  I won’t bother to show you those plots, as they both look like shotgun blasts. 



Last year as I poured through the FPI data, I noticed something odd: it was quite rare for the FPI to predict a Vegas upset.  I only counted 37 predicted upsets total out of over 750 games (5%), which is interesting because historically about 25% of all college games wind up being upsets per Vegas.  2016 saw over 200 upsets total.  My algorithm picked over 80 upsets for the season.  Granted, it was only right concerning the upset 37% of the time (which is below my algorithm’s historical average of 40%), but the FPI only got 46% of its upset picks correct.  When I plotted the full year projected margins from the FPI versus the Vegas Spread (See below), you see that the correlation is quite good (R-squared = 0.86).  By comparison, my algorithm did not do quite as well, but it still fairly highly correlated (R-squared = 0.72). 




From all of this, I come to my second main conclusion from all this analysis:  In-season algorithms don’t do a good job of predicting the outcomes of actual games, but they can do a good job of predicting the Vegas spread.  In this regard, the FPI (and to a lesser extent, my algorithm) does have value in doing things such as projecting point spreads out 2-3 weeks in advance.  That type of analysis is appears to be fairly robust.  I also must concede that the FPI does a better job of predicting these spreads than my algorithm does (which I would expect considering they most likely have more than one dude working on it in his spare time).  But, you could argue that the FPI is so good at predicting the spread that it doesn’t add much to the discussion.  It is on some level too conservative.  At least my algorithm takes some chances and will make more than 1-2 upset picks a week.  But, at the end of the day, the gold standard is the Vegas spread, which honestly makes sense.  After all, if there was a computer program out there that could beat Vegas, somebody would be very rich and they would certainly not tell the rest of us about it.

So, with this knowledge, perhaps the most useful figure that I can leave you with is the following:  the 5-point boxcar averaged plot of the probability of the favored team winning as a function of the opening Vegas spread for all college games back to 2009.  As you can see, if the data is smoothed, it forms a nice quadratic curve from a 50-50 toss-up to a virtual sure thing once the spread reaches around 30.  (In reality, there have been a total of 2 upsets in games where the spread exceeds 30 since 2009, but the frequency is less than 1%).  The fit is not perfect, but the equation on the chart is very simple and easy to remember.  I would imagine the line should asymptotically approach 100%, but never actually reach it, because in college football, I believe the underdog always has a chance.



This brings us to my final conclusion for this piece:  college football is unpredictable, and that is why we love it. 


Comments

Popular posts from this blog

Dr. Green and White Helps You Fill Out Your Bracket (2024 Edition)

For as long as I can remember, I have loved the NCAA Basketball Tournament. I love the bracket. I love the underdogs. I love One Shining Moment. I even love the CBS theme music. As a kid I filled out hand-drawn brackets and scoured the morning newspaper for results of late night games. As I got older, I started tracking scores using a increasing complex set of spreadsheets. Over time, as my analysis became more sophisticated, I began to notice certain patterns to the Madness I have found that I can use modern analytics and computational tools to gain a better understanding of the tournament itself and perhaps even extract some hints as to how the tournament might play out. Last year, I used this analysis to correctly predict that No. 4 seed UConn win the National Title in addition to other notable upsets. There is no foolproof way to dominate your office pool, but it is possible to spot upsets that are more likely than others and teams that are likely to go on a run or flame out early.

The Case for Optimism

In my experience there are two kinds of Michigan State fans. First, there are the pessimists. These are the members of the Spartan fan base who always expect the worst. Any amount of success for the Green and White is viewed to be a temporary spat of good luck. Even in the years when Dantonio was winning the Rose Bowl and Izzo was going to the Final Four, dark times were always just around the bend. Then, there are the eternal optimists. This part of the Spartan fan base always bets on the "over." These fans expect to go to, and win, and bowl games every year. They expect that the Spartans can win or least be competitive in every game on the schedule. The optimists believe that Michigan State can be the best Big Ten athletic department in the state. When it comes to the 2023 Michigan State football team, the pessimists are having a field day. A major scandal, a fired head coach, a rash of decommitments, and a four-game losing streak will do that. Less than 24 months after hoi

2023 Final Playoff and New Year's Six Predictions

The conference championships have all been played and, in all honesty, last night's results were the absolute worst-case scenario for the Selection Committee. Michigan and Washington will almost certainly be given the No. 1 and No. 2 seed and be placed in the Sugar Bowl and the Rose Bowl respectively. But there are four other teams with a reasonable claim on the last two spots and I have no idea what the committee is going to do. Florida State is undefeated, but the Seminoles played the weakest schedule of the four candidates and their star quarterbac (Jordan Travis) suffered a season ending injury in the second-to-last game of the regular season. Florida State is outside of the Top 10 in both the FPI and in my power rankings. I also the Seminoles ranked No. 5 in my strength of record metric, behind two of the other three candidates. Georgia is the defending national champions and were previously ranked No. 1 coming into the week. But after losing to Alabama in the SEC Title game,