Skip to main content

Stats-Based Bracketology (2019 Edition)

It is officially the most wonderful (sports) time of the year!  As we all think about our Final Four and upset picks for our office pools, I thought that I would provide some unsolicited, stat-based advice on how to fill out your bracket.  Full disclosure right off the bat:  I am actually not that good at winning my office pool.  I think that last time I actually finished first was 2002.  But, I certainly have a lot of data that sheds some insight as to how likely certain events are.  I personally think that data, combined with the good old fashioned eye-test, is the best combination.  I will do my best to provide the first part.

General Methodology

Over the past few decades I have slowly accumulated data and developed more and more advanced tools to understand the role of variance and probability as it applies to college basketball.  I could go on about this for days, so I will try to highlight the key points here that underpin my entire philosophy:

1) Overall, the best predictor of the probability that a team will win or lose is the Vegas spread.

I have developed a formula based on the normal distribution that fits the data spanning decades and thousands of games.  I use that for pretty much all of my calculations

2) The results of games have a lot of variance

The standard deviation of an actual result from the spread is just over 10 points.  Actually, I even can tell you why it is 10 points, but that is another story for another day.

3) NCAA tournament games behave exactly as regular season games do, statistically speaking.  In other words, the two "rules" above also apply.

4) Advanced metrics, such as Kenpom efficiencies, are a good predictor of spreads, and therefore they can be used to project win probabilities.

With all this said, one of my favorite charts that I have developed is the following, which compares the observed rate of the higher seed winning in a given seed pairing to the probability predicted based on the average spread of that same pairing.


As you can see, the correlation is very good, with a few exceptions.  For example, 1-seed do better than expected vs. 5-seeds and 9-seed, and 2-seed fair surprisingly poorly vs. 10-seed.  But, I think that this is simple normal statistical variance due to a smallish sample size, and nothing more.  If nothing else, this chart is a handy reference on how often upset occur for a wide variety of seed pairings.

A "Normal" Tournament Bracket

So, as you prepare to fill out your bracket, one obvious question is how many upsets can you expect in a "normal" year and what does a "typical" Final Four look like?

As for the first question, here are a few rules of thumb

1) In the first round, one typically sees 8 ± 2.5 upsets, with over half of those happening to 6-, 7-, or 8-seeds.

On average, the 8-9 games are a coin flip, you would expect 3 upsets between the 6- and 7-seeds, 2 more upsets between the 4- and 5-seeds, and maybe 1 "major" upset of a 1-, 2-, or 3-seed.

2) In the 2nd round, one would expect to see 4.5 ± 2 upsets.  

The rate of which a 1-seed loses in the second round is almost exactly once every 2 years.  For the 2-, 3-, and 4-seeds, one would expect to see 4 total upsets in that group in a year, and they are oddly pretty evenly distributed between those 3 seeds.  

Also note that if there are an above average number of upsets in the first round, there tends to be fewer second round upsets and vice versa.  This is practically just a simple result of the fact that the weaker teams tend to get picked off in the second round if they survive the first round.  Because I am a nerd, I like to call this the "Law of the Conservation of Upsets."  You're welcome.

3) As for the Sweet 16 games, things get a bit messy

A general rule of thumb is that a little over half (55%) of 1-seeds make the Regional Final, while a little less than half (42%) of 2-seeds make it, and usually only one 3-seed per year gets that far.

4) As for the Final Four, while it is tempting to take something like three 1-seeds, that rarely happens (only 13% of the time)

History shows that one 1-seed almost always makes it.  Two 1-seeds only happens half the time.  The average seed of the 3rd best Final Four team is 3, while the lowest average seed is an astounding 5.7 and is no better than a 4-seed in over half of all tournaments. 

So an "average" Final Four is a 1-seed, a 1- or 2-seed, a 3-seed, and something lower than a 4-seed.  That exact scenario had happened in both of the two previous years.

With that (already too long intro) let's now dive into the 2019 bracket, starting with the South

East Region Breakdown

This year, I came up with two new "tools" to help visualize the strengths and weaknesses of each bracket.  The first set of data compares the current Kenpon rating (shorthand for the adjusted efficiency margin) for each seed to the average and standard deviation of the rating historically (back to 2002, the beginning of Kenpom's data set)  For the East, these data are shown here:


This plot alone gives us quite a bit of information about how this region might play out.  For example, LSU is a pretty below average 3-seed and in the first round, they play an above average 14-seed in Yale.  Uh oh...

In order to take this analysis a bit further, I conducted a series of simulations of the East Region, using Kenpom data to project point spreads and therefore win probabilities.  I tabulated the probability that each team would survive each round up to advancing to the Final Four.  I then also compared this probability to the probability that I got when I simulated a "generic" bracket made up of teams with a statistically average Kenpom rating for their seed.  So, this gives us an idea if a team has an above or below average chance to advance.  My thought is that this should provide hints as to where those exciting upsets might happen.  Here is the table for the East:



At this point, I want to say that I also tried this analysis using the actual spreads for the first round games that are currently available.  Based on my own methodology, this should be more accurate.  In general the trend were the same, with a few exceptions that I will note along the way.  But, the spreads this year seem oddly conservative and since there will be no spreads for any games after the first round until it is too late, I decided to stick to a purely Kenpom-based analysis.

Back to the table.  There is a lot going on here.  

The first thing to note about the East is that with the exception of one team (LSU) the Top 8 seeds are all "above average" and therefore first round upsets are less likely in this Region.  The opening Vegas lines agree with this assessment.  But, as we will see, LSU is the only Top 3 seed projected to have less than an 80% chance to win in the first round.  So, if you were forced to pick one "big" 1st round upset (and history suggests we will see one), Yale over LSU is the most likely.

As we move onto the 2nd round, again only LSU (if they make it) would be expected to be a possible upset victim.  In the Sweet 16, MSU's odds look good (mostly due to LSU), while Dukes's odds are surprisingly depressed, due to the fact that both Virginia Tech and (to a lessor extent) Mississippi State are above average 4- and 5-seeds.  Might Duke be one of the 1-seeds to fail to make the Regional Final?  (One can hope!)

Finally, in the chance that Duke and MSU meet in the Regional Final, Kenpom projects that Duke would only be a 0.5-point favorite over MSU.  (You can choose to believe that or not, but that is what the math says).  As a result, Duke's odds to make a Final Four are a bit low for a 1-seed, while MSU's odds are quite high, and just a shade behind Duke.

I hate to say this, but based on all the probability and historical data I have shown so far, MSU is a reasonable bet to win this Region, and that is not (just) a homer pick.

South Region Breakdown

Following the same path as above, here is the historical seed comparison chart followed by the simulation results table for the South Region



Already, the South seems like it might be more interesting than the East, and perhaps also very kind to the Big Ten.  Purdue, Wisconsin, and Iowa all show better than average odds to advance, while K-State, Cincinnati, and especially Villanova all might be on upset alert.  Kenpom also likes the 9-seed (Oklahoma) here.  

If K-State were to survive the first round, Kenpon suggests the Badgers might get them in the next round anyway.  Purdue's odds to make the Regional Final are surprisingly high, which suggests a possible upset of Tennessee here.  But, at the end of the data, the math really likes Virginia to roll through this region.

That all said, I should mention that while the Vegas spread does put K-State on upset alert, those numbers are much less favorable to Wisconsin and Iowa.  Actually, based on the Vegas line, Wisconsin is the most likely 5-seed to get upset by the dreaded 12-seed (Oregon), and Villanova's chances look much better based on the Vegas line.  This will be an interesting test of which method (Kenpom vs. Vegas) is a better predictor.

Midwest Region Breakdown

The historical seed comparison chart and simulation results table for the Midwest Region are shown below.



As the chart shows, the 2-, 5-, 6-, and 7-seeds are quite strong, while the 1-, 3-, and 4-seeds look pretty average.  In the first round, this does not look like it is going to manifest itself (although Kansas is on a very mild upset alert), but the 2nd round is where things might get interesting.

The simulation suggests that Auburn has a much better than average shot at getting to the Sweet 16, while Iowa St and Wofford would be tougher than expected challenges for Houston and Kentucky.  Utah Sate also projects to be a slightly scarier than usual 2nd round challenge for UNC. 

If UNC and Kentucky both survive to the Regional Final, Kentucky would have a great shot at a mild upset.  However, this bracket looks like it has the potential for the one to produce the wildcard.  I might be tempted to take Auburn to win the Region.

West Region Breakdown

Finally, here is the seed comparison chart and simulation results table for the West Region



Based on the chart, the Top 3 seeds in the West are all quite strong, while the rest of the top 10 seeds are pretty average with the exception of Marquette, who is a "weak" 5-seed and Florida, who is a "strong" 10-seed.  For this reason, both Marquette, and especially Nevada are on upset alert in Round 1.  Incidentally, Vegas agrees on Marquette, but is not as optimistic about Florida.

The second round does not give an strong indications of an upset, but Texas Tech is projected to be a tougher match-up than usual for 2-seed Michigan in the Sweet 16.  The winner of this potential Texas Tech - Michigan match-up would put Gonzaga on a mild upset alert, I think the odds would favor the Zags pulling this one out.

On a side note, the Vegas odds suggest that if a 2-seed were to get upset this year, the mostly likely (yet still very unlikely) victim would be the Wolverines at the hands of the Montana Grizzlies. 

Add it Up

So, in summary, if I use Kenpom data, math, and use history as a guide, I get the following results:

1st Round upsets:

14-Yale over 3-LSU
13-UC Irvine over 4-K State
12-Murray State over Marquette
11-St. Mary's over 6-Villanova
10-Iowa over Cincinnati
10-Florida over Nevada
9-Oklahoma over 8-Ole Miss

2nd Round Upsets:

7-Wofford over 2-Kentucky
6-Iowa State over 3-Houston
5-Auburn over 4-Kansas

3rd Round Upsets:

5-Auburn over 1-UNC
4-VA Tech over 1-Duke
3-Purdue over 2-Tennessee

Final Four

1-Virginia
1-Gonzaga
2-MSU
5-Auburn

And, if I really go all the way with Kenpom, I would have to take Virginia over Gonzaga in the Title Game.

So, that is that.  I will probably use this bracket in one of (but not my "real") online brackets.  But, ultimately, the eye test matters.  It will be fun to see how this pans out.

Enjoy!

Comments

Popular posts from this blog

Dr. Green and White Helps You Fill Out Your Bracket (2024 Edition)

For as long as I can remember, I have loved the NCAA Basketball Tournament. I love the bracket. I love the underdogs. I love One Shining Moment. I even love the CBS theme music. As a kid I filled out hand-drawn brackets and scoured the morning newspaper for results of late night games. As I got older, I started tracking scores using a increasing complex set of spreadsheets. Over time, as my analysis became more sophisticated, I began to notice certain patterns to the Madness I have found that I can use modern analytics and computational tools to gain a better understanding of the tournament itself and perhaps even extract some hints as to how the tournament might play out. Last year, I used this analysis to correctly predict that No. 4 seed UConn win the National Title in addition to other notable upsets. There is no foolproof way to dominate your office pool, but it is possible to spot upsets that are more likely than others and teams that are likely to go on a run or flame out early.

The Case for Optimism

In my experience there are two kinds of Michigan State fans. First, there are the pessimists. These are the members of the Spartan fan base who always expect the worst. Any amount of success for the Green and White is viewed to be a temporary spat of good luck. Even in the years when Dantonio was winning the Rose Bowl and Izzo was going to the Final Four, dark times were always just around the bend. Then, there are the eternal optimists. This part of the Spartan fan base always bets on the "over." These fans expect to go to, and win, and bowl games every year. They expect that the Spartans can win or least be competitive in every game on the schedule. The optimists believe that Michigan State can be the best Big Ten athletic department in the state. When it comes to the 2023 Michigan State football team, the pessimists are having a field day. A major scandal, a fired head coach, a rash of decommitments, and a four-game losing streak will do that. Less than 24 months after hoi

2023 Final Playoff and New Year's Six Predictions

The conference championships have all been played and, in all honesty, last night's results were the absolute worst-case scenario for the Selection Committee. Michigan and Washington will almost certainly be given the No. 1 and No. 2 seed and be placed in the Sugar Bowl and the Rose Bowl respectively. But there are four other teams with a reasonable claim on the last two spots and I have no idea what the committee is going to do. Florida State is undefeated, but the Seminoles played the weakest schedule of the four candidates and their star quarterbac (Jordan Travis) suffered a season ending injury in the second-to-last game of the regular season. Florida State is outside of the Top 10 in both the FPI and in my power rankings. I also the Seminoles ranked No. 5 in my strength of record metric, behind two of the other three candidates. Georgia is the defending national champions and were previously ranked No. 1 coming into the week. But after losing to Alabama in the SEC Title game,