Skip to main content

NCAA Tournament Analysis: The Sweet Sixteen

Basketball season may be over for the Michigan State Spartans, but the NCAA Tournament will be continuing with the Sweet 16 this coming Saturday. After the bracket was released, I presented my detailed analysis of the bracket and made some math-based predictions about how the first weekend and entire tournament might play out.

While it would be more fun to write about a potential MSU-Alabama match-up in the Sweet 16 (which might have actually come to fruition had the Spartans simply boxed out properly on a rebound in the final seconds of the First Four contest against UCLA) it is still fun to reflect on the results of the first weekend and to take another math-based looked at the remaining tournament field. If nothing else, in the great words of Coach Mark Dantonio, it is time to complete this circle.

Let's start with a review of the wild action of the first two rounds.

Results of Rounds One and Two

In my analysis of the bracket, I presented data that showed that the average number of upset to expect in the first round of the NCAA Tournament is eight, and is the second round, that number is five. When I looked at the projected odds for each of the first round and projected second round games, I identified a few match-ups with better than average odds for an upset. 

Based on this analysis, I made a historically average number of upset picks and then carried this analysis through to the Final Four and eventual champion. In the real tournament, there were an above average number of seed upsets in both rounds (ten and six to be exact). Table 1 below summarizes my upset picks and the actual upsets through two rounds.

Table 1: Summary of NCAA Tournament upset picks and upset results through two rounds

Of the 13 total upset predictions that I made this year, a total of six were correct, two I give myself partial credit (the yellow 'O's) and five were wrong. There were eight additional upsets that I did not pick. On balance I think that my method did OK.

The biggest success was that I correctly predicted one of the biggest upsets of the weekend: No. 14 Abilene Christian's upset win over No.3 Texas. I also picked No. 13 Ohio to upset No. 4 Virginia. I am also giving myself credit for taking UCLA/MSU to beat BYU, even though I clearly thinking that it would be the Spartans and not the Bruins to win both of those games. Most office pools take the First Four winner as either/or, so it still counts in my book. Hey, there has to be some benefit of the First Four, right?

The other partial credit comes from the fact that I correctly bounced No. 3 West Virginia and No. 4 Oklahoma State in the second round, I just had the triumphant opponent wrong. From an office pool point of view, this also has some value.

As for a more visual view of the upsets, I repeat below one of the main figures that I used to make picks last week, with the actual upsets highlighted in bold, red tex.

Figure 1: 2021 odds for the first round games compared to the average historical odds for each seed pair

From a certain point of view, in retrospect, my analysis perhaps did a better job than I originally thought. Of the 32 total first round games, only eight contest clearly fell below the average line, which denotes a more likely upset. Five of those games ended in an upset and Liberty and Colgate were both very competitive in their games. It was only the LSU - Saint Bonaventure game that bucked this trend and I didn't even make that pick.

I probably should have more seriously considered the possible Purdue upset by North Texas, but I decided to ignore the warnings of my own analysis. As for the other five upsets, three of them (Maryland, Syracuse, and UCLA) all lie close to the average line.

Only two of the 10 first round upsets were truly surprising: Oregon State and Oral Roberts. As for Oregon State, their upset of Tennessee perhaps could have been predicted had I simply remembered the Coach Rick Barnes was on the bench and he is absolutely notorious for losing to lower seeds. As for Ohio State, my math suggests that upset on the No. 1 and No. 2 line are simply random bad luck. It's happened to the best of us...

Figure 2 gives a similar retrospective analysis of the second round games.

Figure 2: 2021 odds for second round games compared to the average historical odds for each seed pair

As for the predictability of the six second round upsets, the results are less clear. In total, seven of the 16 games had above average upset odds and only three of those games ended in upsets. In this case, I did correctly pick USC's upset of Kansas and West Virginia's loss, but Texas Tech and Maryland (actually UCONN) let me down on the upset front.

It is also clear that I let my belief in the strength of the Big Ten cloud my analysis a bit. The data did suggest that Wisconsin had a shot to beat Baylor, and that was the pick that I made. BUT, the data suggested that Loyola beating Illinois was actually more likely. If I couple that with the in-state rivalry aspect (similar to my analysis of Texas and Abilene Christian) then perhaps I should have seen than one coming. My faith in the Big Ten also caused two of my Final Four picks: Illinois and Ohio State to be knocked out very early.

As for the other three upsets, the Oregon State - Oklahoma State game was right on the average line, but Iowa and Florida both had better than expected odds to avoid an upset. Once again, you win some and you lose some.

How Mad Was It?

Based on a few different measures, such as the number of double digit upsets, the 2021 Tournament looks to be one of the most chaotic Tournaments on record. That said, measures like just counting double digit seed underdogs are not very mathematically precise. Fortunately, there is a better way to compare the relative madness of different months of March.

In order to quantify the relative likelihood of a specific first and second round outcomes in any given year, one just need to know the odds of each individual game outcome. You can then multiple those probabilities together to get the overall odds.

Fortunately, I happen to have just these odds, as derived from Kenpom efficiency margin data. In fact, these are exactly the numbers that I use to run my Monte Carlo simulations. I also happen to have performed the same calculation on each Tournament back to the beginning of the Kenpom era (2002).

The result tell me that the odds for the specific first round outcome in 2021 were:

1 in 81.5 million.

That is on the high side. The first round odds in 2013, 2016, and 2018 were similar in magnitude, but a little lower. However, there is still one other year, 2012, that still holds the record for the least likely first round outcome at:

1 in 800 million.

This was the year where both Duke and Missouri were upset as No. 2 seeds by No. 15 seeds Lehigh and Norfolk State respectively. The year 2012 also had 10 total first round upsets, including a No. 4 seed and two No. 5 seeds. However, the second round in 2012 recorded only two additional upsets, and the odds of the specific outcome after two rounds was "only"

1 in 1.3 trillion.

This is actually slightly lower than the odds after two rounds in 2018 when No. 1 Virginia was upset in the first round by No. 16 UMBC, and then the second round saw the upset of a second No. 1 seed (Xavier) and half of the No. 2 seeds (Cincinnati and North Carolina). The odds of seeing the exact scenario in 2018 were:

1 in 1.8 trillion.

But, that pales in comparison to the tally from 2021. The qualitative estimates are, in fact, correct. The odds that I calculate for the current tournament results after two rounds are:

1 in 6.5 trillion, 

which are the longest odds of the Kenpom Era by a factor of three.

Analyzing the Sweet 16

So, what's next? With 16 teams remaining it is time to wipe the slate clean and try to make some new predictions about how the rest of the tournament will play out. I will start with the results of a new Monte Carlo simulation of the remainder of the tournament.

Table 2: Monte Carlo Simulation results starting form the Sweet 16

I decided to keep the pre-tournament Kenpom efficiency values in this case, so I don't want to get too hung up on the details. What this tells me is that Gonzaga is still a heavy favorite to win it all (43 percent) and that the Zags have about a 75 percent chance to reach the Final Four.

Then, there are three teams next in line with similar odds to cut down the nets: Michigan, Houston, and Baylor (around 13 percent odds each). Each of those teams is 50-50 to advance to the Final Four. Then, there is a group of dark horse teams (Loyola, Alabama, Arkansas, USC, and Villanova) with between two and five percent odds to win the Title.

I also included a column in this table labeled "normalized final four odds." This is my attempted to estimate the relative ease or difficultly of each teams path to the Final Four. The calculation involves estimating the odds of each team to advance to the Final Four if they were only as good as a benchmark team with an efficiency margin of +19.00 (an average high-major team).

Higher percentages mean an easier path, which is the case for Arkansas and Houston, as they both will face double-digit seeds Oral Roberts and Syracuse in round three. On the opposite end of the spectrum is Creighton (who will face Gonzaga). The Blue jays grade out to have the most difficult remaining path.

As for potential upsets to look out for in the next few rounds, Figure 3 below compares the odds in each contest relative to the historical average for each given seed combination. This is essentially the same analysis shown above in Figures 1 and 2.

Figure 3: 2021 odds for the Sweet 16 games (left) and potential regional final games (right) compared to the average historical odds for each seed pair

In this case, for the Sweet 16 games, I am using the odds from the actual opening Vegas lines, as opposed to the Kenpom projected odds. For the region final round (Figure 3, right) I revert back to the odds from Kenpom.

Based on both the original simulation results and the expected value calculations, two upsets are expected in the Sweet 16 round. Based on the left panel of Figure 3, the most likely upsets are for No.1 Michigan to lose to Florida State and No. 6 USC to lose to No. 7 Oregon.

That said, USC actually has better odds than an average No. 6 versus No. 7 seed match-up, which makes me balk at that pick a little. The next most likely upset would be for No. 11 Syracuse to beat No. 2 Houston, which just feels annoyingly correct. If this were to come to pass, Jim Boeheim would surpass Tom Izzo with the most upset wins in Tournament history at 16. Dislike.

As for the regional final round, the odds suggest one out of the four games will end in an upset. On the right panel of Figure 3, I compare the teams under the assumption that the higher seeds all advance. In this scenario, the most likely upset in No. 8 Loyola to beat No. 2 Houston (if the Cougars can solve the Syracuse zone). After the beat-down that the Ramblers gave to the Illini last weekend, I would totally buy that. 

If I were to start again from the Sweet 16 round, I believe that I would take Florida State and Syracuse to win, and then just the top seeds in the next round, which would give me a Final Four of
  • No. 1 Gonzaga
  • No. 1 Baylor
  • No. 8 Loyola-Chicago
  • No. 2 Alabama
Alternatively, I could see Oregon beating USC, but Houston beating Syracuse. I would take the same Final Four in both scenarios.

This Final Four is a reasonable distribution of seeds and I think that it is total reasonable based on the eyeball test from last weekend. I would take Gonzaga over Alabama, and then I will take a flyer on Loyola to upset Baylor before succumbing to the machine that is Gonzaga.

That is all for today. Enjoy what is left of March Madness and as always, Go Green.

Comments

Popular posts from this blog

2025 College Football Analysis, Part Two: A Deep Dive into MSU's Schedule

In part one of this year's math-based preseason analysis of the college football season, we looked back at the 2024 season. Through that analysis, we learned about the historical accuracy of preseason polls (plus-or-minus 25 positions) and regular season win totals (plus-or-minus 2.5 wins). We also explored the impact of changes in ability, schedule, and luck. Now it is now time to shift focus to the 2025 season. Over the years I have developed and refined a way to simulate the entire college football season using schedule information and preseason rankings as the only inputs. I will soon go through the full details of what I learned from this exercise.  For today, I will focus exclusively on what it says about the Michigan State Spartans. We will take a close look at the Spartans' 2025 schedule from three different points of view. Opponent Overview The best place to start this analysis is with the simulation's inputs. Figure 1 below summarizes the preseason rankings (w...

2025 College Football Analysis, Part Three: The Big Ten Race

So far in this year's math-driven college football preview, we have explored the historical accuracy of the preseason rankings, taken a look back at last year, and conducted a thorough analysis of the potential paths for the 2025 Michigan State Spartans' season. In today's installment, the focus shifts to the Big Ten conference race. Strengths of Schedule The first thing to cover today is relative strength of schedule. Figure 1 below summarizes my calculations for the 18 Big Ten teams' overall schedules. Figure 1:  Overall strengths of schedule for the 2025 Big Ten conference. The overall FBS rank is shown in each bar. There are several different ways to calculate strength of schedule. I use my preseason estimations of power rankings to generate estimated point spreads for all potential college football matchup. I then calculate the number of expected wins that a reference, borderline top 25 team would have if that reference team were to play the schedule in quest...

Bad Betting Advice, 2025 Week Four: MSU Up All Night (with Dr. Green and White)

After three straight game in the friendly confines of Spartan Stadium, Michigan State is off this week for the first road game and first conference game of the season. The Spartans will be traveling across the USA to sunny Los Angeles to face the USC Trojans. But the bigger opponent on this trip might be time itself. It takes a long time just to get to Southern California. A direct flight from Michigan takes over four hours. That's enough time to fit in a B movie, a cult classic, and maybe a few comedy skits on the infight entertainment system. It is also a long time for an athlete to be couped up in a small space. Once the team gets there and gets settled in L.A., the game itself does not kick off until 8 p.m. local time. That's 11 p.m. back home. That a lot of time to kill. By the time the team finishes with the game, gets cleaned up, and returns to the hotel, it will likely be close to 5 a.m. according to their body clocks. I am not as young as I used to be, but just thinkin...