Skip to main content

NCAA Tourney Over/Under Seeded

OK, I really thought that I was done with basketball analysis for the year.  But then, I received a comment that really got me thinking. The comment asked the fairly straightforward question as to how would my PAD and/or PARIS metrics (explained in detail in this article) compare to the performance of each coach relative to their Ken Pomroy (KenPom) ranking.  My gut feeling was that Kenpom's ranking correlates to the Vegas spread and the Vegas spread correlates to seed differential (as explained in this article) and therefore Kenpom's ratings would correlate to my Performance Against exact seed Differential (PAD) metric.  I decided to test this theory.

As luck would have it, I just this year downloaded the set of pre-tournament Kenpom data from 2002 to 2018, which is as much data as is currently available on Kenpom's site. Then, I just had to figure out how to make the mathematical comparison.  The first step was to correlate the Kenpom data to the probability of the favored team winning.  In general, the best way to do this seems to be to use Kenpom's adjusted efficiency margin.  The differential in adjusted efficiency margin between each team in a given contest should correlate to a point spread / probability of victory.  So, I took all the data from the 2002-2018 tournaments, placed it into bins, and plotted the correlation to the probability that the team with the higher adjusted efficiency margin actually won. That data is shown here:


In general, the shape of the scattered data points strongly resemble my plot of victory probability versus the Vegas spread (shown in detail here), so I decided to use the same mathematical framework (namely the normal distribution) to generate the best fit line for this data set (shown in the solid line).  This essentially gave me a simple formula to calculate the expected probability of victory for the favored team in any given NCAA tournament match-up in the KenPom data era (2002-2018).

With this data in hand, it was then straightforward to calculate a "PAD-like metric."  With PAD, a coach's performance against any team could be compared to the average performance of all teams in a given seed pairing.  In this new metric, the coach's performance against any team can be compared to the above trend line.  For example, for a game in which the adjusted efficiency margin differential is 10, the favored team wins 75% of the time, based on the curve above. In a given game with this differential, if the favored team wins, that coach gets a "score" of 0.25 for doing basically 25% better than average.  If the favored team losses, that coach gets a score of -0.75.

When this methodology is applied to all tournament games since 2002, a single cumulative score can be generated for each coach that I will now refer to as their Performance Relative to the Average KenPom adjusted Efficiency Margin differential, or "PRAKEM" (because every metric needs a silly acronym, obviously).  Below, I have plotted my PAD metric as a function on this new PRAKEM metric.


As I expected, the correlation between PAD and PRAKEM is very high (R2 ~ 0.92).  In addition, there are a couple of other interesting things to observe from this plot. On the positive side, if the data is limited to 2002-2018, Tom Izzo is notably still the coach with the highest PAD score, checking in at just over 6.0, despite the fact that this does not include his 4 straight Sweet 16s, 3 Final Fours, and National Title win from 1998 to 2001.  But, based on the PRAKEM metric, Izzo drops to 5th place behind Roy Williams, Boeheim, Beilein, and Calipari.  Also notable are the fact that Bo Ryan and Bill Self are flat average relative to KenPom (a PRAKEM of zero), while Coach K has one of the worst PAD and PRAKEM metrics of any active coach since 2002.

As we consider the difference between the PAD and PRAKEM metrics, I believe that we can draw one more conclusion from this data set.  KenPom's average efficiency margin, in my opinion, gives about as close as possible to a true measure of the actual prowess of a given team.  The PAD, however, is dependent on each team's seed, which itself is dependent on the Committee's judgment of the actual prowess of each team.  So, the difference in the PAD and PRAKEM value is really just the difference in each team's real strength vs. their seed.  So, I would propose that if a coach's PAD is greater than their PRAKEM (i.e. they are above the best fit line above) this suggests that the committee seeded their teams, on average, too low.  Conversely, if their PRAKEM value is higher than their PAD value, the committee probably seeded them, on average, too high.

If we accept this to be true, we can then take the difference between the PAD and PRAKEM as a quantitative measure how much a coach has been under-seeded or over-seeded in their career.  A histogram showing this data for all coaches since 2002 is shown below:


As one might expect, this distribution is rather Gaussian / Normal in shape and once again it is the outliers that are interesting.  Based on this measure, Tom Izzo is the most under-seeded coach in the past 16 years, followed closely by Gregg Marshall, Mark Few, and Bo Ryan, with Rick Pitino and Billy Donovan also in the under-seeded category.  As for the over-seeded coached, that would include John Beilein, Sean Miller, Lute Olson, Jim Calhoun, Roy Williams, Jimmy Boeheim, and Bill Self in a distant last in that category.

That is all that I have for now, so unless someone else asks a super interesting question that results in me diving back down into the math rabbit hole, it is time to start thinking about football.  Until then, enjoy!

Comments

Popular posts from this blog

Dr. Green and White Helps You Fill Out Your Bracket (2024 Edition)

For as long as I can remember, I have loved the NCAA Basketball Tournament. I love the bracket. I love the underdogs. I love One Shining Moment. I even love the CBS theme music. As a kid I filled out hand-drawn brackets and scoured the morning newspaper for results of late night games. As I got older, I started tracking scores using a increasing complex set of spreadsheets. Over time, as my analysis became more sophisticated, I began to notice certain patterns to the Madness I have found that I can use modern analytics and computational tools to gain a better understanding of the tournament itself and perhaps even extract some hints as to how the tournament might play out. Last year, I used this analysis to correctly predict that No. 4 seed UConn win the National Title in addition to other notable upsets. There is no foolproof way to dominate your office pool, but it is possible to spot upsets that are more likely than others and teams that are likely to go on a run or flame out early....

2024 Week Eight Preview: OK Computer

Playing the first game after a bye week is like waking up from a nap. It is a little tough to predict how the body will respond. If a nap comes at just the right time and lasts for just the right length of time, it can be very refreshing and rejuvenating. But sometimes waking up for a nap can be rough. It can cause a disorienting, groggy feeling like suddenly two plus two equals five and that down is the new up. Based on the way the three weeks prior to the bye week went, last week's break at the midpoint of the season came at exactly the right time for the Spartans. Facing one top five team is challenging enough. Facing two top five teams on consecutive weekends including almost 5,000 miles of travel is something else entirely. But how will the rested Spartans look on the field come Saturday night? It is hard to predict what we are going to get. It is the classic "rest versus rust," million dollar question.  I prefer to be optimistic and to believe that the Spartans will...

2024 Week Seven Preview: Intermission

It is hard to believe that we are already halfway through the Michigan State Spartans' 2024 season. The Green and White currently sit at 3-3, having just lost two games straight to teams both ranked in the top three nationally.  Despite the current losing streak, Michigan State is actually slightly ahead of schedule. While the Spartans' schedule currently grades out to be harder than expected when I conducted the analysis this summer (by 0.7 games), Michigan State's current odds to go to a bowl game (46%) are 10 percentage points higher than what I projected.  In Week Seven, Michigan State has drawn a much needed bye. Think about it as an intermission of sorts. The Spartans' mission this weekend is to rest, heal, reflect on the first half of the season, and prepare for back half of the schedule with the goal of qualifying for the bowl game. Michigan State's team and staff may be taking it easy, but data and Vegas never sleep. Today's piece will focus more on the...