Skip to main content

Why 1-seeds (almost) always win, but 2-seeds don't

It has been almost three weeks since Villanova cut down the nets in San Antonio, and with that, three weeks since the beginning of great annual 5 month gap between meaningful college sports. While we patiently pass the time this summer until the real fun begins again this fall, I thought that I would reflect back on the NCAA basketball tournament with a series of my math-infused musings.  Today's topic: why did it take 34 years and 136 attempts before a 1-seed finally lost to a 16-seed while a 2-seed has fallen on average about once every 4 years? The answer, as it turns out, is the same answer that I often find when I decide to dig deep into some of these topics: because, math.

My journey to solve this mystery began a few years ago when I made the observation that the probability of an upset in the NCAA tournament occurring scales roughly linearly with the difference in value between the two seeds.  The correlation is quite strong for the most common seed pairs (i.e. those found in first round games), but it also holds loosely for all seed combinations. This correlation is shown here (where the size of each marker is scaled to the relative frequency of that pairing occurring in past tournaments:)


Although this plot is a pretty good rule of thumb, it did slightly trouble me that there really is no clear reason for the correlation to be as linear as it is. A hint to understand this all came earlier this year, when a quite different sports math question was rattling around in my brain: what is the correlation between the Vegas spread of a given basketball game and the probability of an upset? Fortunately, I was able to use properties of the normal distribution to answer this question, which I summarized in a bit more detail here.

Earlier this year, it occurred to me to try to compare the probability of an upset based on the spread to the probability of an upset based on the seed differential. It was a bit tricky to find good historical data on NCAA tournament lines, but I did find one website that has enough data to look at the numbers. When I plotted the seed differential against the spread, the data looks like this:


Once again, the data is a bit scattered, but there is a decent linear correlation.  Where the correlation gets even stronger is when you consider the probability of an upset derived from the average spread for a particular seed combination to the actual upset rate for that combination in tournament play.  That correlation is shown here (where the probability of the favorite team winning are plotted as opposed to the upset rate)

Similar to the other plots shown above, most of the data fall on or near the central line, with a few notable exceptions. Namely, 1-seeds have an oddly good record against 9-seeds, while 2-seed have a surprisingly hard time with 10-seeds.

But, I was still a bit troubled by the fairly thin nature of the spread data that I could find. The website I found listed only the 30 most recent incidents of a given seed pair.  That works fine for the 2 vs. 6 match-up, but is a bit sparse for the 5 vs. 12 match-up. However, my previous study of the Vegas spread also suggested that the average margin of victory in a given match-up is actually very highly correlated to the final Vegas spread.  That data for over 45,000 college games is shown here:


With this in mind, I was able to predict the average spread of each seed combination using the data for all games back to 1979 when seeding began.  Using this "spread" to calculate the upset rate (as opposed to the actual spread) results in this plot, which does have a better R2 compared to the sparse spread data.


With this all in mind, it is time to come back to the original question: why do 2-seeds get upset so much more often than 1-seeds do.  The answer is: because games in the NCAA tournament behave in exactly the same way that they do in the regular season when considering upsets as a function of the Vegas line.  

When it comes to the 2 vs. 15 match-up, both the spread data that I could find and the average margin of victory data suggest 2-seeds are, on average 16.5 point favorites over 15-seeds.  This corresponds to a 93.8% chance that the 2-seed advances.  Based on this probability, one would expect a total of 8.5 upsets in the 136 times this match-up has occurred.  In reality, there have been 8 since the 64 team tournament expansion in 1985. Score one for math.

As for 1-seeds, it is slightly less clear, as the sparse actual spread data that I found suggests 1-seeds are favored an average by 22.3 points, while the margin of victory data suggests the spread should be a bit higher at 24.5 points.  Taken together, this suggests that a 16-seed has between a 1.8% and 1.1% chance of an upset.  Over 34 years and 136 attempts, we should have observed between 1.5 and 2.5 16-seed over 1-seed upsets.  We, of course, have only observed one: this year's UMBC upset over the University of Virginia.  Those probabilities suggest that a 16 over 1 upset should be observed somewhere between once every 14 years and once every 23 years.  In other words, we were a bit over due before this year.  That said, it is likely that it will be at least another decade (or 2 or maybe even 3) until we can experience a UMBC-sized upset again in March.  But, the next 15 over 2-seed upset is basically due any year now. After all, it's just math.







Comments

Popular posts from this blog

Dr. Green and White Helps You Fill Out Your Bracket (2024 Edition)

For as long as I can remember, I have loved the NCAA Basketball Tournament. I love the bracket. I love the underdogs. I love One Shining Moment. I even love the CBS theme music. As a kid I filled out hand-drawn brackets and scoured the morning newspaper for results of late night games. As I got older, I started tracking scores using a increasing complex set of spreadsheets. Over time, as my analysis became more sophisticated, I began to notice certain patterns to the Madness I have found that I can use modern analytics and computational tools to gain a better understanding of the tournament itself and perhaps even extract some hints as to how the tournament might play out. Last year, I used this analysis to correctly predict that No. 4 seed UConn win the National Title in addition to other notable upsets. There is no foolproof way to dominate your office pool, but it is possible to spot upsets that are more likely than others and teams that are likely to go on a run or flame out early.

The Case for Optimism

In my experience there are two kinds of Michigan State fans. First, there are the pessimists. These are the members of the Spartan fan base who always expect the worst. Any amount of success for the Green and White is viewed to be a temporary spat of good luck. Even in the years when Dantonio was winning the Rose Bowl and Izzo was going to the Final Four, dark times were always just around the bend. Then, there are the eternal optimists. This part of the Spartan fan base always bets on the "over." These fans expect to go to, and win, and bowl games every year. They expect that the Spartans can win or least be competitive in every game on the schedule. The optimists believe that Michigan State can be the best Big Ten athletic department in the state. When it comes to the 2023 Michigan State football team, the pessimists are having a field day. A major scandal, a fired head coach, a rash of decommitments, and a four-game losing streak will do that. Less than 24 months after hoi

2023 Final Playoff and New Year's Six Predictions

The conference championships have all been played and, in all honesty, last night's results were the absolute worst-case scenario for the Selection Committee. Michigan and Washington will almost certainly be given the No. 1 and No. 2 seed and be placed in the Sugar Bowl and the Rose Bowl respectively. But there are four other teams with a reasonable claim on the last two spots and I have no idea what the committee is going to do. Florida State is undefeated, but the Seminoles played the weakest schedule of the four candidates and their star quarterbac (Jordan Travis) suffered a season ending injury in the second-to-last game of the regular season. Florida State is outside of the Top 10 in both the FPI and in my power rankings. I also the Seminoles ranked No. 5 in my strength of record metric, behind two of the other three candidates. Georgia is the defending national champions and were previously ranked No. 1 coming into the week. But after losing to Alabama in the SEC Title game,