There are a lot of things that make for a great college basketball game: the players, the coaches, the fans, the mascots, the venues... each one bring something different and exciting to the table. But, there is one part of the game that seems to bring out the ire of fans on all sides: the officials.
Obviously, officials are a necessary part of the game. But, every fan can likely think of a game where the officials seemed to "take over" and effectively because the story. Every fan can likely think of a single call that they believe single-handedly cost their team a game. (Most MSU fans would likely point to the non-call on Draymond Green in the 2010 Final Four.)
At the end of the day, officials are human beings trying to do a job to the best of their ability. If they do their job perfectly, they likely won't get noticed at all. It is only when they are at the center of a controversy that we learn their names. Every generation of Spartan fans likely knows the names of a few officials that they feel might be either biased against their team or simply bad at their craft.
In years past, name like Ed Hightower and Ted Valentine uses to raise my blood pressure slightly. These days, the name Bo Boroski is one that most Spartans know. But, were/are these officials actually biased against the Spartans? If they were, is there a way that we could tell?
As for the first question, it simply is not reasonable to suggest that any official out there simply has it out for MSU, Tom Izzo, or any other coach or program. In order to be an NCAA official of the division one level takes a great deal of effort and dedication, just like any other profession. There is a set of checks and balances to evaluate the performance of each official. If were overtly biased, that would not be in the job for very long.
That said, no person is truly unbiased. As hard as any single person (including officials) try to be fair in all situations, it is literally impossible to separate a lifetime of experiences that weighs on ones soul when one is trying to decide in a split second if the play was a charge or a blocking foul.
Maybe the kid taking the charge reminds him of the son of one of his friends. Maybe the player called for the block was mouthing off in the last huddle. Maybe, the ref once had a bully in middle school that wore a University of Michigan hat. Maybe, the ref's wife cheated on him with a Kentucky grad. Who knows? Unconscious bias will creep into the brains of all of us imperfect human beings. Officials are no different, no matter how they try.
But if at least some level of bias (unconscious or not) exists in the officiating of college basketball, how could it be detected? This is a problem that I have thought about for a while, and I think that I found a way to potentially measure it.
How to grade a ref
I decided to approach this analysis just like I do any other sports-related analysis that I conduct (and write about here): I treated it like a scientific experiment. One thing about science is that individual observations are almost always pretty much useless. So, looking at individual plays, fouls, or games would be meaningless. What I needed was as large of a set of consistent data as possible.
But, what kind of data is actually useful for this purpose? I thought about looking at number of fouls called or even free throw rate of MSU and MSU's opponents. However, I decided against this metric because fouls and free throws can pile up at the end of close games or in garbage time.
Instead, I settled on two related metrics: performance against the spread (ATS) and wins or losses relative to expectation. Performance ATS is more straightforward. The folks in Vegas and the betting community are really good, on average, at getting the line to a place where the favored team will win almost exactly 50 percent of the time.
Good teams tend to beat the spread a little more often than not (because they are good) but deviations of more than a few percentage points are extremely rare. In the database that I assembled for this project, which extends back to the 2005-06 season, MSU is 267-245-9 (51.2 percent) against the final Vegas line. I have an existing database of college basketball line from the prediction tracker website, so I already had this data in hand.
Performance relative to expectation is a little more complicated. The idea here is to measure actual wins and losses, but in a weighted fashion. I have written extensively about the way to correlate the Vegas line to the odds that a favored team will win or lose. I once again leaned on this data here.
To give an example, let's say that MSU plays a total of 20 games. In the preseason, MSU is favored by 13 points in the first 10 games. Then, Big Ten play begins, and the Spartans are only favored by 2.5 in each of the next 10 games.
According to the historical correlations, in this example, MSU would have a 90 percent chance of winning each of their first 10 games, but only a 60 percent chance of each of the second group of games. So, the most likely outcome would be for MSU to win nine of the first 10 games (90 percent) and go 6-4 (60 percent) in the second batch of games.
If I use the actual spread data for all of the MSU games in the database back to 2006, I can calculate the expected number of wins and that can be compared to MSU actual performance in these contests. Over that span, I calculate that MSU was expected to have a record of 364.5 wins and 156.5 losses (70.0 percent). In reality, MSU went 371-150 (71.2 percent) or just slightly better than expected, based on the final spreads. MSU won about six-and-a-half more games (out of 521) than the spread predicted.
Note also that there are a handful of games where the spread data was not available, so the totals are a bit short of the totals in the record books. That said, most of these games were preseason blowouts that had huge spreads anyway.
Just as MSU's overall performance can be measured ATS and against expectation, MSU performance involving any individual referee or combination of referees can also be measured ATS and against expectation.
As an example, consider the case of the official Kelly Pfeifer. Kelly has officiated a total of 22 MSU games since 2013. In those 22 games, MSU was expected to have a record of 16.6 - 5.4. In reality, MSU went 17-5 in those games and MSU was 11-11 ATS in those games.
In other words, in games where Kelly was an official, MSU won +0.4 more games than expected and was -0.3 wins ATS (this number is slightly negative since MSU overall is slight over .500 against the spread over the entire timeframe). This is about as close as "unbiased" as possible based on this analysis. I applied this same methodology to every official who has worked an MSU game since 2006.
While Pfeifer seems perfectly neutral when it comes to MSU, not all officials are. As we will see MSU tends to under-performance in the presence of some officials, and over perform in the presence of others. But, how can we tell if this performance is "suspicious" or not?
One way to think about this test is to treat the experiment like a coin flip experiment. If someone were to flip a coin 20 times, the most likely result is to be get 10 heads and 10 tails. But, something like eight heads and 12 tails would not be unusual. In order to determine when a person should suspect that the coin is fair or not, one can apply a statistical principle called the binomial test.
The binomial test essentially is a calculation of the odds of a specific result in something like a coin flip experiment. As a rule of thumb, once the odds drop to below five percent, the deviation from expectation is considered to be "statistically significant." In the case of 20 coin flips, if we observe only five heads (or tails) or fewer, something is likely up.
If the number of coin flips increases, it is easier to spot an unfair coin. If we flip a coin 100 times, the threshold that raises suspicion, based on the binomial test, is 41 head / tails or fewer. In my analysis of college basketball officiating, I used the same method to evaluate the statistical significance of any deviations from the expected outcomes.
OK, Let's Look at the Data
With that rather long introduction, it is time to look at the results. For context, a total of 159 different referees have officiated an MSU game since 2006. However, over 100 of those officials have worked fewer than seven games total. As mentioned above, larger sample sizes are needed to identify any sort of pattern. While I made the relevant calculations for all 159 refs, I will mainly focus on the officials that have worked at least 20 games in the past 15 years, which is only 23 total.
Also, I will mention here that it is pretty common for fans to refer to specific "officiating crews." While each game has three total officials, those officials do not work in consistent groups. In fact, the maximum number of games worked by the same trio of officials is three total games. If I only consider pairs of officials, one pair has work in a maximum total of 14 MSU games. So, while some analysis can be done on a limited number of pair of officials, there is no such thing as a three-man crew.
Without further ado, Figure 1 below summarizes MSU's performance, both against the spread (on the y-axis) and relative to expectation (on the x-axis) when each most frequent 23 officials are working. In all cases, the values are wins and loses relative to the average. The size of the data point scales with the total number of games.
Figure 1: Michigan State basketball's performance against expectation and the spread when various officials are working (2006-21). |
In addition, I have color-coded the data points to reflect the statistical significance for the data points that are straying from the average. Yellow and light green data points are between a five percent and 20 percent confidence level. If a measurement falls below five percent into the truly statistically significant zone, I will shaded those data point red (if it is negative for MSU) or dark green (if it is positive).
As Figure 1 shows, there are definitely officials are either a net negative or a net positive with respect to actual MSU wins and losses and relative to the spread. However, none of these results rise to the level of statistical significance (less than five percent of happening by chance alone).
In total, there are five total officials who historically seem to be trouble for MSU, i.e. the yellow data points. When it comes to wins and losses, Bo Boroski has been a net negative (-2.5 wins relative to expectation out of 60 games), but this is not very significant.
In the past 16 years, there are three officials who have been worse for MSU than Boroski: Ed Hightower (-3.5 wins over 40 games, and who retired in 2014), Pat Driscoll (-3.6 win over 35 games), and the most significant, Terry Wymer (-4.8 wins over 65 games, which also makes him the official who has worked the most MSU games over this span). The odds that MSU's poor performance on Wymer's watch is only due to chance is just 13 percent.
The two other officials whose presence is correlated to below average performance on the court are Jim Burr and D.J. Carstensen, but for a different reason. In these two cases, it is not the wins and loses that are notable, it is MSU's performance against the spread, which -3.9 wins ATS for Burr (out of 29 games, ending in 2014) and a shocking -8.2 games (out of 57) for Carstensen.
MSU is 21-33-3 against the closing spread (37 percent) in games officiated by Carstensen. The odds of that percentage being so low is only 11 percent. While this is also not low enough to be statistically significant, it is notable.
That said, there is also another side to this coin, as Figure 1 shows. While there are five officials whose presence correlates to poor MSU performance, there are also three officials who have the opposite effect: Robert Riley, Bill Epp, and especially Terry Oglesby. Unlike the yellow data points, the light green ones tend to fall closer to a diagonal line from the bottom left to the upper right. In other words, these officials seems to have a positive effect on MSU ability to both win and to beat the spread.
As for Riley, MSU is 18-11 ATS in the games in which he has officiated, which is a bit suspicious. As for Epp, MSU is 14-7 ATS which is also high, but more notably, MSU is 21-0 straight up in the games that Epp has officiated. While that sounds pretty bad, MSU has been heavily favored is most of those games. The spread was double digits in 14 of the 21 games, it was only below seven point three time, and he had never officiated a game where MSU was the underdog.
As for Oglesby, MSU is 26-13-1 ATS in the games that he has worked, which is just barely outside of the threshold for statistical significance. Also notable is that MSU has won as the in 3 of the 5 total games where the Spartans were the underdogs, including the win over Kansas in the 2015 Champion's Classic and the road win over Michigan in 2019. Perhaps the title of this piece should actually be, does Terry Oglesby love MSU?
In order to provide a little more context, I wanted to see if there was any noticeable change in the data if home games and road games were separated. Figure 2 below shows the wins relative to expectation data for the same group of officials. This time, however, I have plotted the data for home games only on the x-axis and road games only on the y-axis. Similarly, Figure 3 shows the data for wins against the spread, also separated based on the venue.
Figure 2: Actual wins relative to expectation, based on the closing Vegas spread, and separated based on venue (home games on the x-axis and road games on the y-axis. |
Figure 3: Wins against the closing spread, relative to the MSU average, and separated based on venue (home games on the x-axis and road games on the y-axis. |
Other Odds and Ends
- When it comes to pairs of officials, Wymer / Valentine combination was not great for MSU. The Spartans were 5-8 (-3.2 relative to expectation) and 3-10 ATS all time.
- The Boroski / Scirotto combination is notably bad for MSU on the road. The Spartans are 1-4 straight up and 0-5 ATS
- Wymer teamed up with Lamont Simpson has not been good for MSU straight up. The Spartans are 2-5 (-2.9 relative to expectation) with this pair
- Officials with fewer games and a notably negative impact on MSU wins, relative to expectation (R2E): Donnee Gray (7-10 straight up, -3.3 R2E, from 2006-09), Reggie Greenwood (2-5 straight up, -2.3 R2E, from 2006-09), and Antonio Petty (3-3, -1.9 R2E, from 2008-12). MSU also did poorly ATS with these officials
- Officials with fewer games and a notably positive impact on MSU wins, relative to expectation: Keith Kimble (12-2 straight up, +2.6 R2E, active), Chris Beaver (14-0, +1.7 R2E, active), and Earl Walton (11-0, +2.1 R2E, active
- Finally, there are two addition officials who post notably positive results for MSU ATS: Tom Eades (12-4 ATS) and Mark Whitehead (10-3 ATS).
Comments
Post a Comment