Skip to main content

Why 1-seeds (almost) always win, but 2-seeds don't

It has been almost three weeks since Villanova cut down the nets in San Antonio, and with that, three weeks since the beginning of great annual 5 month gap between meaningful college sports. While we patiently pass the time this summer until the real fun begins again this fall, I thought that I would reflect back on the NCAA basketball tournament with a series of my math-infused musings.  Today's topic: why did it take 34 years and 136 attempts before a 1-seed finally lost to a 16-seed while a 2-seed has fallen on average about once every 4 years? The answer, as it turns out, is the same answer that I often find when I decide to dig deep into some of these topics: because, math.

My journey to solve this mystery began a few years ago when I made the observation that the probability of an upset in the NCAA tournament occurring scales roughly linearly with the difference in value between the two seeds.  The correlation is quite strong for the most common seed pairs (i.e. those found in first round games), but it also holds loosely for all seed combinations. This correlation is shown here (where the size of each marker is scaled to the relative frequency of that pairing occurring in past tournaments:)


Although this plot is a pretty good rule of thumb, it did slightly trouble me that there really is no clear reason for the correlation to be as linear as it is. A hint to understand this all came earlier this year, when a quite different sports math question was rattling around in my brain: what is the correlation between the Vegas spread of a given basketball game and the probability of an upset? Fortunately, I was able to use properties of the normal distribution to answer this question, which I summarized in a bit more detail here.

Earlier this year, it occurred to me to try to compare the probability of an upset based on the spread to the probability of an upset based on the seed differential. It was a bit tricky to find good historical data on NCAA tournament lines, but I did find one website that has enough data to look at the numbers. When I plotted the seed differential against the spread, the data looks like this:


Once again, the data is a bit scattered, but there is a decent linear correlation.  Where the correlation gets even stronger is when you consider the probability of an upset derived from the average spread for a particular seed combination to the actual upset rate for that combination in tournament play.  That correlation is shown here (where the probability of the favorite team winning are plotted as opposed to the upset rate)

Similar to the other plots shown above, most of the data fall on or near the central line, with a few notable exceptions. Namely, 1-seeds have an oddly good record against 9-seeds, while 2-seed have a surprisingly hard time with 10-seeds.

But, I was still a bit troubled by the fairly thin nature of the spread data that I could find. The website I found listed only the 30 most recent incidents of a given seed pair.  That works fine for the 2 vs. 6 match-up, but is a bit sparse for the 5 vs. 12 match-up. However, my previous study of the Vegas spread also suggested that the average margin of victory in a given match-up is actually very highly correlated to the final Vegas spread.  That data for over 45,000 college games is shown here:


With this in mind, I was able to predict the average spread of each seed combination using the data for all games back to 1979 when seeding began.  Using this "spread" to calculate the upset rate (as opposed to the actual spread) results in this plot, which does have a better R2 compared to the sparse spread data.


With this all in mind, it is time to come back to the original question: why do 2-seeds get upset so much more often than 1-seeds do.  The answer is: because games in the NCAA tournament behave in exactly the same way that they do in the regular season when considering upsets as a function of the Vegas line.  

When it comes to the 2 vs. 15 match-up, both the spread data that I could find and the average margin of victory data suggest 2-seeds are, on average 16.5 point favorites over 15-seeds.  This corresponds to a 93.8% chance that the 2-seed advances.  Based on this probability, one would expect a total of 8.5 upsets in the 136 times this match-up has occurred.  In reality, there have been 8 since the 64 team tournament expansion in 1985. Score one for math.

As for 1-seeds, it is slightly less clear, as the sparse actual spread data that I found suggests 1-seeds are favored an average by 22.3 points, while the margin of victory data suggests the spread should be a bit higher at 24.5 points.  Taken together, this suggests that a 16-seed has between a 1.8% and 1.1% chance of an upset.  Over 34 years and 136 attempts, we should have observed between 1.5 and 2.5 16-seed over 1-seed upsets.  We, of course, have only observed one: this year's UMBC upset over the University of Virginia.  Those probabilities suggest that a 16 over 1 upset should be observed somewhere between once every 14 years and once every 23 years.  In other words, we were a bit over due before this year.  That said, it is likely that it will be at least another decade (or 2 or maybe even 3) until we can experience a UMBC-sized upset again in March.  But, the next 15 over 2-seed upset is basically due any year now. After all, it's just math.







Comments

Popular posts from this blog

Dr. Green and White Helps You Fill Out Your Bracket (2024 Edition)

For as long as I can remember, I have loved the NCAA Basketball Tournament. I love the bracket. I love the underdogs. I love One Shining Moment. I even love the CBS theme music. As a kid I filled out hand-drawn brackets and scoured the morning newspaper for results of late night games. As I got older, I started tracking scores using a increasing complex set of spreadsheets. Over time, as my analysis became more sophisticated, I began to notice certain patterns to the Madness I have found that I can use modern analytics and computational tools to gain a better understanding of the tournament itself and perhaps even extract some hints as to how the tournament might play out. Last year, I used this analysis to correctly predict that No. 4 seed UConn win the National Title in addition to other notable upsets. There is no foolproof way to dominate your office pool, but it is possible to spot upsets that are more likely than others and teams that are likely to go on a run or flame out early....

2024 Week Eight Preview: OK Computer

Playing the first game after a bye week is like waking up from a nap. It is a little tough to predict how the body will respond. If a nap comes at just the right time and lasts for just the right length of time, it can be very refreshing and rejuvenating. But sometimes waking up for a nap can be rough. It can cause a disorienting, groggy feeling like suddenly two plus two equals five and that down is the new up. Based on the way the three weeks prior to the bye week went, last week's break at the midpoint of the season came at exactly the right time for the Spartans. Facing one top five team is challenging enough. Facing two top five teams on consecutive weekends including almost 5,000 miles of travel is something else entirely. But how will the rested Spartans look on the field come Saturday night? It is hard to predict what we are going to get. It is the classic "rest versus rust," million dollar question.  I prefer to be optimistic and to believe that the Spartans will...

2024 Week Seven Preview: Intermission

It is hard to believe that we are already halfway through the Michigan State Spartans' 2024 season. The Green and White currently sit at 3-3, having just lost two games straight to teams both ranked in the top three nationally.  Despite the current losing streak, Michigan State is actually slightly ahead of schedule. While the Spartans' schedule currently grades out to be harder than expected when I conducted the analysis this summer (by 0.7 games), Michigan State's current odds to go to a bowl game (46%) are 10 percentage points higher than what I projected.  In Week Seven, Michigan State has drawn a much needed bye. Think about it as an intermission of sorts. The Spartans' mission this weekend is to rest, heal, reflect on the first half of the season, and prepare for back half of the schedule with the goal of qualifying for the bowl game. Michigan State's team and staff may be taking it easy, but data and Vegas never sleep. Today's piece will focus more on the...