NCAA Tourney Over/Under Seeded

OK, I really thought that I was done with basketball analysis for the year. But then, I received a comment that really got me thinking. The comment asked the fairly straightforward question as to how would my PAD and/or PARIS metrics (explained in detail in this article) compare to the performance of each coach relative to their Ken Pomroy (KenPom) ranking. My gut feeling was that Kenpom's ranking correlates to the Vegas spread and the Vegas spread correlates to seed differential (as explained in this article) and therefore Kenpom's ratings would correlate to my Performance Against exact seed Differential (PAD) metric. I decided to test this theory.

As luck would have it, I just this year downloaded the set of pre-tournament Kenpom data from 2002 to 2018, which is as much data as is currently available on Kenpom's site. Then, I just had to figure out how to make the mathematical comparison. The first step was to correlate the Kenpom data to the probability of the favored team winning. In general, the best way to do this seems to be to use Kenpom's adjusted efficiency margin. The differential in adjusted efficiency margin between each team in a given contest should correlate to a point spread / probability of victory. So, I took all the data from the 2002-2018 tournaments, placed it into bins, and plotted the correlation to the probability that the team with the higher adjusted efficiency margin actually won. That data is shown here:

In general, the shape of the scattered data points strongly resemble my plot of victory probability versus the Vegas spread (shown in detail here), so I decided to use the same mathematical framework (namely the normal distribution) to generate the best fit line for this data set (shown in the solid line). This essentially gave me a simple formula to calculate the expected probability of victory for the favored team in any given NCAA tournament match-up in the KenPom data era (2002-2018).

With this data in hand, it was then straightforward to calculate a "PAD-like metric." With PAD, a coach's performance against any team could be compared to the average performance of all teams in a given seed pairing. In this new metric, the coach's performance against any team can be compared to the above trend line. For example, for a game in which the adjusted efficiency margin differential is 10, the favored team wins 75% of the time, based on the curve above. In a given game with this differential, if the favored team wins, that coach gets a "score" of 0.25 for doing basically 25% better than average. If the favored team losses, that coach gets a score of -0.75.

When this methodology is applied to all tournament games since 2002, a single cumulative score can be generated for each coach that I will now refer to as their Performance Relative to the Average KenPom adjusted Efficiency Margin differential, or "PRAKEM" (because every metric needs a silly acronym, obviously). Below, I have plotted my PAD metric as a function on this new PRAKEM metric.

As I expected, the correlation between PAD and PRAKEM is very high (R2 ~ 0.92). In addition, there are a couple of other interesting things to observe from this plot. On the positive side, if the data is limited to 2002-2018, Tom Izzo is notably still the coach with the highest PAD score, checking in at just over 6.0, despite the fact that this does not include his 4 straight Sweet 16s, 3 Final Fours, and National Title win from 1998 to 2001. But, based on the PRAKEM metric, Izzo drops to 5th place behind Roy Williams, Boeheim, Beilein, and Calipari. Also notable are the fact that Bo Ryan and Bill Self are flat average relative to KenPom (a PRAKEM of zero), while Coach K has one of the worst PAD and PRAKEM metrics of any active coach since 2002.

As we consider the difference between the PAD and PRAKEM metrics, I believe that we can draw one more conclusion from this data set. KenPom's average efficiency margin, in my opinion, gives about as close as possible to a true measure of the actual prowess of a given team. The PAD, however, is dependent on each team's seed, which itself is dependent on the Committee's judgment of the actual prowess of each team. So, the difference in the PAD and PRAKEM value is really just the difference in each team's real strength vs. their seed. So, I would propose that if a coach's PAD is greater than their PRAKEM (i.e. they are above the best fit line above) this suggests that the committee seeded their teams, on average, too low. Conversely, if their PRAKEM value is higher than their PAD value, the committee probably seeded them, on average, too high.

If we accept this to be true, we can then take the difference between the PAD and PRAKEM as a quantitative measure how much a coach has been under-seeded or over-seeded in their career. A histogram showing this data for all coaches since 2002 is shown below:

As one might expect, this distribution is rather Gaussian / Normal in shape and once again it is the outliers that are interesting. Based on this measure, Tom Izzo is the most under-seeded coach in the past 16 years, followed closely by Gregg Marshall, Mark Few, and Bo Ryan, with Rick Pitino and Billy Donovan also in the under-seeded category. As for the over-seeded coached, that would include John Beilein, Sean Miller, Lute Olson, Jim Calhoun, Roy Williams, Jimmy Boeheim, and Bill Self in a distant last in that category.

That is all that I have for now, so unless someone else asks a super interesting question that results in me diving back down into the math rabbit hole, it is time to start thinking about football. Until then, enjoy!

Dr. Green and White Sports Authority

Search This Blog

NCAA Tourney Over/Under Seeded

Comments

Post a Comment

Popular posts from this blog

March Madness Analysis: Did the Selection Committee Get it Right in 2025?

Dr. Green and White Helps You Fill Out Your Bracket (2025 Edition)

2025 College Football Analysis, Part Two: A Deep Dive into MSU's Schedule