The 2021 NCAA Basketball Tournament and season have come to a close, but a new season always provides new data and new stories to tell about that data. In the 2021 Tournament, one story line was the apparent ease at which No. 2 seed Houston was able to reach the Final Four. The Cougars' path to the final weekend went through a No. 15 seed (Cleveland State), a No. 10 seed (Rutgers), a No. 11 seed (Syracuse), and finally a No. 12 seed (Oregon State).
This marked the first time in history that a team had reached the Final Four without facing a single-digit seed. By some measure, this implies that Houston had the easiest path in history to the Final Four. But, for me, this type of discussion always begs the question of how to quantify something like the difficulty of a given NCAA Tournament path.
My approach to try to answer this question is to try to define a benchmark or reference team and to then calculate the odds that this hypothetical team would reach the Final Four given any arbitrary tournament path. Fortunately, tempo-free metrics such as Kenpom efficiency margins provide just such an opportunity to quantify these odds.
Historical data suggests that an average Final Four team since 2002 has a Kenpom adjusted efficiency margin of around +25.4. (This means that an average Final Four team would be expected to beat an average division one team by about 25 points in a game made up of 100 total possessions for each team.) This value is very similar to the pre-tournament efficiency of MSU's 2005 Final Four team. So, this is effectively the reference team.
Using efficiency data, it is possible to a project a point spread and therefore a victory probability for any arbitrary team versus this reference team as long as the efficiency margin data is available. This is generally the case for all teams back to 2002 on Kenpom.com.
Calibrating the effect of bracket position
As a first step, I wanted to understand the general benefit that teams get from earning a higher seed. To achieve this, I set up a simulation of sorts involving a theoretical bracket made up entirely of teams with the historically average efficiency margin for teams of that seed.
For example, the average efficiency margin of all No. 1 seeds back to 2002 is +28.90. This corresponds to a team such as Michigan State's No. 1 seeded team in 2012. An average No. 2 seed historically has an efficiency margin of +23.5, which is similar to MSU's 2009 team, and so on. These teams make up the theoretical bracket.
I then calculated the odds that the reference team (MSU's 2005 team) would make the Final Four if they were inserted into this bracket of average teams as any seed, No. 1 all the way to No. 16. I also assumed that in every round the reference faces the highest seeded available opponent (i.e. that there are no upsets). The result of this set of calculations is shown below in Figure 1.
Figure 1: Odds of a reference, average Final Four team making the Final Four in a bracket of historically average teams if the reference teams were to be inserted as any seed and no upsets occur. |
This figure shows us the true benefit of being a top seed. In this scenario, the No. 1 seed has a shade over a one-in-five chance to win the four games needed to make the Final Four. As a No. 2 seed, those odds fall by five percentage points to 15 percent. The odds continue to drop to 12 percent for a No. 3 seed, 11 percent for a No. 4 seed, and 10 percent for a No. 5 seed.
Interestingly, once a team drops to a No. 6 seed, the odds for the reference team to reach the Final Four are essentially equal (eight to nine percent) for all seeds from the six-line down to the 16-line. As a reminder, this calculation assumes that the efficiency of the reference team is fixed. So, whether they are a No. 6 seed or a No. 11 seed, they are still equally as good.
This analysis already gives valuable insight. Basically, there is a clear advantage to being a No. 1 or a No. 2 seed. There is a slight advantage to being a No. 3, No. 4 or No. 5 seed, but after that it really doesn't matter with regards to the odds of making a Final Four.
The history of the NCAA Tournament is filled with examples of teams that cycle up right before the tournament, either due injuries that heal or simply inconsistencies. These teams are likely better than the seed that they have been given and the average efficiency that the metrics assign to them. The good news for teams is this position is that whether they are given a No. 3 seed or a No. 11 seed, their odds to make the Final Four are roughly the same, and paths being equal.
Easy Paths and Hard Paths
The problem is, not all paths are equal. Using a similar method, it is possible to estimate the relative ease or difficulty of any of the paths that previous Final Four teams have traveled on their way to the final weekend. In this case, the reference team (as good as MSU in 2005) is used, but instead of calculating that team's Final Four odds against a theoretical average bracket, the efficiencies of the teams from real, historical NCAA Tournament paths are used.
For example, to compare the paths of both Baylor and Houston in the 2021 Tournament, I first looked up the pre-tournament efficiencies margins for the four opponents of each of those teams en route to their meeting in the 2021 Final Four. As mentioned above, for Houston, these teams were Cleveland State, Rutgers, Syracuse, and Oregon State. For Baylor, these teams were Hartford, Wisconsin, Villanova, and Arkansas.
Then, I estimated the odds that the reference team (Michigan State in 2005) would have to win games against each set of four teams. The product of the odds of each set of four games give the odds to make it to the Final Four on that path.
I made the same calculation for each Final Four team's path to the Final Four back to 2002. I also pulled the numbers for MSU's Final Four teams in 1999, 2000, and 2001 for reference. For comparison, I also calculated the Final Four odds for each path assuming that each opponent was an average team for that seed and not the actual opponent.
For example is the case of Houston, I calculated the odds for the reference team to beat an average No. 15, No. 10, No. 11, and No. 12 seed instead of Houston's actual opponents. The results of this calculation are shown below in Figure 1.
Figure 2: Comparison of the difficulty of different paths to the Final Four, based on the odds that a reference team would reach the Final Four using the path of each team |
All 76 teams to play in a Final Four since 2002 (plus the three additional MSU Final Four teams) are shown in this Figure, and there is a lot to observe.
The x-axis shows the actual odds or true normalized difficulty of each team's path. Based on this analysis, it is true the 2021 Houston team did, in fact, have the easiest Final Four path of any team in history. The reference team had a 39 percent chance to win those four games.
The most difficult tournament path in recent history belongs to the 2019 Texas Tech squad. In this case, the reference team only had a seven percent chance to reach the final weekend. Houston's path was five-and-a-half times easier than Texas Tech's path two years prior.
Note that dotted orange line represents the median of the data sets. So, the teams to the right of this line had a path that was easier than average, while the teams on the right side of the graph had a harder than average path.
As for MSU, Coach Izzo has cleared experienced both some of the easiest, as well as some of the most difficult tournament paths in history. Half of Izzo Final Four teams fall to the right of the orange line, while the other half are on the left.
MSU's most difficult path in the Izzo era was in 2015 when the Spartans faced No. 10 Georgia, No. 2 Virginia, No. 3 Oklahoma, and No. 4 Louisville. The Spartan's softest Final Four path was in 2001 when the Spartans faced No. 16 Alabama State, No. 9 Fresno State, No. 12 Gonzaga, and No. 11 Temple. Coach Izzo's other six paths are closer to the median.
The y-axis on Figure 1, which gives the odds for the reference team if they were to face an average version of each seed, gives us some additional insight. If a data point falls above the diagonal line, this implies that the path that team took in reality is actually harder than it appears based simple on the seeds. The opposite is also true. Data points that fall below the diagonal line represent teams whose Final Four path was easier than expected, based on the seeds of the opponents.
These differences can be more easily understood by looking at a selection of the Final Four paths in more detail. Tables 3 and 4 below give the opponent details for the teams that took the 20 easiest and 20 hardest paths to the Final Four.
Table 2: Detailed opponent data for the 20 easiest paths to the Final Four |
- North Carolina in 2016 (22.8 percent)
- UCLA in 2006 (21.7 percent)
- Texas in 2003 (21.7 percent)
- Michigan in 2018 (20.2 percent)
- Illinois in 2005 (21.1 percent)
- UCLA in 2006 (11.7 percent)
- North Carolina in 2016 (10.6 percent)
- Florida in 2006 (8.9 percent)
- Villanova in 2018 (8.0 percent)
- Louisville in 2013 (7.8 percent)
Comments
Post a Comment