Skip to main content

NCAA Tourney Over/Under Seeded

OK, I really thought that I was done with basketball analysis for the year.  But then, I received a comment that really got me thinking. The comment asked the fairly straightforward question as to how would my PAD and/or PARIS metrics (explained in detail in this article) compare to the performance of each coach relative to their Ken Pomroy (KenPom) ranking.  My gut feeling was that Kenpom's ranking correlates to the Vegas spread and the Vegas spread correlates to seed differential (as explained in this article) and therefore Kenpom's ratings would correlate to my Performance Against exact seed Differential (PAD) metric.  I decided to test this theory.

As luck would have it, I just this year downloaded the set of pre-tournament Kenpom data from 2002 to 2018, which is as much data as is currently available on Kenpom's site. Then, I just had to figure out how to make the mathematical comparison.  The first step was to correlate the Kenpom data to the probability of the favored team winning.  In general, the best way to do this seems to be to use Kenpom's adjusted efficiency margin.  The differential in adjusted efficiency margin between each team in a given contest should correlate to a point spread / probability of victory.  So, I took all the data from the 2002-2018 tournaments, placed it into bins, and plotted the correlation to the probability that the team with the higher adjusted efficiency margin actually won. That data is shown here:


In general, the shape of the scattered data points strongly resemble my plot of victory probability versus the Vegas spread (shown in detail here), so I decided to use the same mathematical framework (namely the normal distribution) to generate the best fit line for this data set (shown in the solid line).  This essentially gave me a simple formula to calculate the expected probability of victory for the favored team in any given NCAA tournament match-up in the KenPom data era (2002-2018).

With this data in hand, it was then straightforward to calculate a "PAD-like metric."  With PAD, a coach's performance against any team could be compared to the average performance of all teams in a given seed pairing.  In this new metric, the coach's performance against any team can be compared to the above trend line.  For example, for a game in which the adjusted efficiency margin differential is 10, the favored team wins 75% of the time, based on the curve above. In a given game with this differential, if the favored team wins, that coach gets a "score" of 0.25 for doing basically 25% better than average.  If the favored team losses, that coach gets a score of -0.75.

When this methodology is applied to all tournament games since 2002, a single cumulative score can be generated for each coach that I will now refer to as their Performance Relative to the Average KenPom adjusted Efficiency Margin differential, or "PRAKEM" (because every metric needs a silly acronym, obviously).  Below, I have plotted my PAD metric as a function on this new PRAKEM metric.


As I expected, the correlation between PAD and PRAKEM is very high (R2 ~ 0.92).  In addition, there are a couple of other interesting things to observe from this plot. On the positive side, if the data is limited to 2002-2018, Tom Izzo is notably still the coach with the highest PAD score, checking in at just over 6.0, despite the fact that this does not include his 4 straight Sweet 16s, 3 Final Fours, and National Title win from 1998 to 2001.  But, based on the PRAKEM metric, Izzo drops to 5th place behind Roy Williams, Boeheim, Beilein, and Calipari.  Also notable are the fact that Bo Ryan and Bill Self are flat average relative to KenPom (a PRAKEM of zero), while Coach K has one of the worst PAD and PRAKEM metrics of any active coach since 2002.

As we consider the difference between the PAD and PRAKEM metrics, I believe that we can draw one more conclusion from this data set.  KenPom's average efficiency margin, in my opinion, gives about as close as possible to a true measure of the actual prowess of a given team.  The PAD, however, is dependent on each team's seed, which itself is dependent on the Committee's judgment of the actual prowess of each team.  So, the difference in the PAD and PRAKEM value is really just the difference in each team's real strength vs. their seed.  So, I would propose that if a coach's PAD is greater than their PRAKEM (i.e. they are above the best fit line above) this suggests that the committee seeded their teams, on average, too low.  Conversely, if their PRAKEM value is higher than their PAD value, the committee probably seeded them, on average, too high.

If we accept this to be true, we can then take the difference between the PAD and PRAKEM as a quantitative measure how much a coach has been under-seeded or over-seeded in their career.  A histogram showing this data for all coaches since 2002 is shown below:


As one might expect, this distribution is rather Gaussian / Normal in shape and once again it is the outliers that are interesting.  Based on this measure, Tom Izzo is the most under-seeded coach in the past 16 years, followed closely by Gregg Marshall, Mark Few, and Bo Ryan, with Rick Pitino and Billy Donovan also in the under-seeded category.  As for the over-seeded coached, that would include John Beilein, Sean Miller, Lute Olson, Jim Calhoun, Roy Williams, Jimmy Boeheim, and Bill Self in a distant last in that category.

That is all that I have for now, so unless someone else asks a super interesting question that results in me diving back down into the math rabbit hole, it is time to start thinking about football.  Until then, enjoy!

Comments

Popular posts from this blog

Dr. Green and White Helps You Fill Out Your Bracket (2024 Edition)

For as long as I can remember, I have loved the NCAA Basketball Tournament. I love the bracket. I love the underdogs. I love One Shining Moment. I even love the CBS theme music. As a kid I filled out hand-drawn brackets and scoured the morning newspaper for results of late night games. As I got older, I started tracking scores using a increasing complex set of spreadsheets. Over time, as my analysis became more sophisticated, I began to notice certain patterns to the Madness I have found that I can use modern analytics and computational tools to gain a better understanding of the tournament itself and perhaps even extract some hints as to how the tournament might play out. Last year, I used this analysis to correctly predict that No. 4 seed UConn win the National Title in addition to other notable upsets. There is no foolproof way to dominate your office pool, but it is possible to spot upsets that are more likely than others and teams that are likely to go on a run or flame out early.

The Case for Optimism

In my experience there are two kinds of Michigan State fans. First, there are the pessimists. These are the members of the Spartan fan base who always expect the worst. Any amount of success for the Green and White is viewed to be a temporary spat of good luck. Even in the years when Dantonio was winning the Rose Bowl and Izzo was going to the Final Four, dark times were always just around the bend. Then, there are the eternal optimists. This part of the Spartan fan base always bets on the "over." These fans expect to go to, and win, and bowl games every year. They expect that the Spartans can win or least be competitive in every game on the schedule. The optimists believe that Michigan State can be the best Big Ten athletic department in the state. When it comes to the 2023 Michigan State football team, the pessimists are having a field day. A major scandal, a fired head coach, a rash of decommitments, and a four-game losing streak will do that. Less than 24 months after hoi

2023 Final Playoff and New Year's Six Predictions

The conference championships have all been played and, in all honesty, last night's results were the absolute worst-case scenario for the Selection Committee. Michigan and Washington will almost certainly be given the No. 1 and No. 2 seed and be placed in the Sugar Bowl and the Rose Bowl respectively. But there are four other teams with a reasonable claim on the last two spots and I have no idea what the committee is going to do. Florida State is undefeated, but the Seminoles played the weakest schedule of the four candidates and their star quarterbac (Jordan Travis) suffered a season ending injury in the second-to-last game of the regular season. Florida State is outside of the Top 10 in both the FPI and in my power rankings. I also the Seminoles ranked No. 5 in my strength of record metric, behind two of the other three candidates. Georgia is the defending national champions and were previously ranked No. 1 coming into the week. But after losing to Alabama in the SEC Title game,