The calendar says March, and for sports fans, that means that the Madness has arrived. I have never been that interested in gambling (despite my fascination with the predictive power of the Vegas spread) but every year at this time, I make sure to enter as many on-line March Madness and office pools as I can.
Over the years, I never had that much luck. I did well a few times, but most years my picks flamed out early. But recently, as my interest in sports analytics increased, I developed a certain strategy to make my picks. In 2019, this new methodology worked very well.
It correctly predicted that Virginia would win the National Title. It predicted that No. 4 seed Auburn was a dark horse Final Four team, and it suggested that the winner of regional final games between No. 1 seed Duke and No. 2 seed Michigan State and No. 1 Gonzaga and No. 3 seed Texas Tech would likely join Virginia and Auburn in Minneapolis.
What I have learned from years of studying the tournament is that while the NCAA Tournament is chaotic, the chaos is predictable, on average.
The reason that tournament games are predictable is that the odds of an upset follow the same "rules" as any other college basketball game. That is, the odds of an upset can be predicted based on the point spread. Furthermore, since the structure of the tournament tends to pair teams together with a similar historical relative strength (i.e. spreads), the "chaos" tend to follow a pattern of sorts.
While Vegas spreads are only available for first round games, predictive tools such as Kenpom efficiencies allow us to project spreads for any arbitrary NCAA Tournament match-up. With these tools in hand, it is possible to both simulate the full tournament and to understand where the upsets are more (or less) likely to occur.
Overall Upset Probabilities
Before we start to break down the 2021 bracket, it is important to understand the way that a typical NCAA bracket progresses. As an introduction, let's first take a bird's-eye view of the total number of upsets to expect in each round. Figure 1 below presents this data. Note that in all cases, an "upset" refers only to relative seeds of each team and not the Vegas line in any particular game.
|
Figure 1: Average number of seed upsets per round projected from a simulation of the 2021 Tournament, the averages from the last 18 Tournaments, and the actual number of upsets |
There are three different sets of data here that all give similar information. In blue is the number of upsets per round predicted by my most recent simulation of the 2021 NCAA Tournament. In red is the average number of upsets per round from the set of simulations of all past tournaments back to 2002 (when Kenpom data is easily available). Finally, the green bar shows the average number of actual upsets in that same set of Tournaments.
This Figure already tells us a lot. First, I think that it clearly shows the power and accuracy my Monte Carlo simulations of the Tournament. The historical simulation results agree very closely with the actual number of upsets observed. Second, it provides a clear guide for knowing exactly how many upsets to expect.
Specifically, the first round usually has between six and ten upsets per year. The second round typically has between three and seven. The Sweet 16 has one to three, and the regional final round usually has between zero and two. By the time we reach the Final Four, the higher seeds win most of the time.
As a subtle point to Figure 1, the 2021 Tournament may have slightly fewer upsets than expected in all rounds except the second round, which may be slightly above average.
Upsets Rates Based on Seed Combinations
While knowing the total number of upsets is useful, in order to start making our picks, it is necessary to understand the odds for upset in any given match-up. Figure 2 is my ultimate guide to understanding these odds.
|
Figure 2: Actual upset frequency for selected seed combinations relative to the odds predicted based on average spreads. |
I mentioned above that the upset frequency is completely predictable based on the historical odds derived from the Vegas lines. This Figure demonstrates this fact. For example, everyone's favorite upset pick is for the No. 12 seed to beat a No. 5 seed. This specific upset has occurred in all but five of the past 35 Tournaments. History shows that roughly one-third of all No. 5 seeds lose in the first round.
Based on the Vegas spread, this makes complete sense. The average point spread for a No. 5 / No. 12 match-up is right around five points, and teams favored by five points in college basketball win 69 percent of the time. The upset rate in the Tournament is exactly where it should be.
This logic extends perfectly even to the most rare and exciting of all upsets (at least when they happen to somebody else's team). The upset of a No. 2 seed by a No. 15 seed has only happened eight times in history out of 140 games back to 1985 (when the Tournament expanded to 64 teams). That is an upset rate of a little over five percent. The average point spread in a No. 2 / No. 15 game is around 16.5 points which corresponds to an upset rate of... five percent.
Even the most rare upset of all, No. 16 UMBC's epic upset of No. 1 seed Virginia in 2018 was somewhat predictable. With only one occurrence in 140 games, the odds of this type of upset must be around one percent. This also happens to be exactly the odds predicted in games where the spread is about 24 points, which is the case in games between No. 1 and No. 16 seeds, historically.
Upset Rules of Thumb
Without any knowledge at all of the reason behind these upset rates, it is possible to develop a set of good rules of thumb in order to generate a bracket with a historically accurate number and distribution of upsets. Here are some rules that I like to use:
- For the No. 8 versus No. 9 games, this is basically a toss-up. I usually consult the Vegas line for these games and then go with my gut on each one.
- The odds of a first round upset of a No. 7, No. 6 or No. 5 seed are all similar at between 35 and 40 percent. This means that between four and five upsets total in this group are expected in any given year.
- For the teams seeded No. 4 and above, the upset rate drops to 20 percent or less. That said, one or two "big" upsets a year is normal, usually with the No. 3 or No. 4 seeds.
- For second round games, No. 1 seeds get upset prior to the Sweet 16 almost exactly once every-other year, on average. The rate has been a little higher than that recently. Exactly one No. 1 seed has been knocked out of the second round in seven of the last 10 tournaments. However, this came right after a stretch where all the No. 1 seeds advanced to the Sweet 16 in five straight tournaments.
- Roughly one-third of all No. 2 seeds do not make it to the Sweet 16. Chances are, at least one will lose to a No. 7 or No. 10 seed in the second round. In 2019, all four No. 2 seeds advanced for the first time since 2009.
- As for the No. 3 and No. 4 seeds, basically only half of them survive the first weekend, on average.
- In the Sweet 16 round, the upset rate for No. 1 seeds is about one-in-six and it is about one-in-five for No. 2 seeds. Basically, only two-thirds of all No. 1 seeds make to the regional final and a little under half of the No. 2 seeds usually make it.
- As for the regional finals, the surviving No. 1 seeds get eliminated in a quarter of these games. Only 40 percent of all No. 1 seeds make it to a Final Four.
The Final Four and Champion
The rules of thumb above work well when applied to each individual region, but the crown jewel of every office pool bracket in the Final Four and eventual champion. Fortunately, there are several of pieces of historical data that can guide this decision making process as well.
For the teams that make up the Final Four, I find the figure below to be the most helpful.
|
Figure 3: Distribution of seeds in the Final Four from 1979 to 2019 |
In this case I have grouped the seeds based on the highest seed, second highest seed, third highest, and lowest appearing in the Final Four. For example, 93 percent of the time (in all but three years: 1980, 2006, and 2011) at least one No. 1 seeds makes it to the Final Four.
However, the odds that the second high seed in the Final Four is also a No. 1 seed (i.e. at least two No. 1 seeds survive to the last weekend) only happens slightly more than half of the time (54 percent). Having three or more No. 1 seeds in the Final Four has only happened six times since the era of Magic Johnson.
As for the third highest seed, this distribution peaks at the No. 2 seed, but the No. 3 and No. 4 seeds also have fairly high odds. As for the lowest seed to appear in any given Final Four, that is most often a No. 3 seed, but almost all seeds down to a No. 11 seed have a reasonable probability. Only three times in history has than been a Final Four without a team seeded No. 3 or lower.
In other words, a typical Final Four consists of at least one No. 1 seed, another No. 1 seed or a No. 2 seed, another No. 2 seed or a No. 3 seed, and then some other lower seed.
In selecting the eventual National Champion, there is a very good rule of thumb based on Kenpom efficiency data. In 15 of the past 18 Tournaments, the eventual champion entered the Tournament ranked in the top six of Kenpom overall. In addition, 17 of the past 18 champions have ranked in the top 21 of offensive efficiency, and 16 have ranked in the top 31 of adjusted defensive efficiency. However, only three entered the tournament ranked No. 1 overall Kenpom.
For reference, the current top six teams in Kenpom are Gonzaga, Baylor, Michigan, Iowa, Illinois, and Houston. All six teams in in the top 10 in offense, but Baylor and especially Iowa are outside of the top 30 in defense. This leaves Gonzaga, Michigan, Illinois, and Houston as the most likely national title contenders in 2021.
The analysis above is great for putting together a bracket that looks like it is feasible based on historical trends. I have given you the blueprint to pick the correct distribution of upsets. But, the trick to winning the office pool is to pick the correct upsets, period. Just because we know that one or two No. 12 seeds are likely to update a No. 5 seed doesn't help us if we don't know which one to pick.
Fortunately, I have developed a method that might give you an edge. By carefully applying Kenpom efficiency data to any given bracket, it is possible to spot which upsets are more likely than others. It is possible to identify which region are more likely to proceed according to seed, and which regions are more likely to blow up. It is possible to predict which No. 1 seed is likely to get upset first and which dark horse team is likely to reach the Final Four instead.
In part two of this analysis, I will walk you through this analysis for the 2021 bracket. Stay tuned.
Really enjoyed your two parter on this. I’ve been trying to develop some similar analyses for my yearly Calcutta auction but my math is clearly not as strong as yours. If you have the time, I’d love to get some advice from you. Thanks.
ReplyDelete