Skip to main content

MSU Football Recruiting Retrospective, Part 1: Counting the Stars

In a normal year, the month of May marks a bit of a transition in my mind.  The NCAA basketball tournament has faded from view in the rear view mirror, the NBA and NHL playoffs are usually in full swing, MLB has just gotten started, and the NFL draft just wrapped up.  In addition, college football recruiting is usually heating up.  

While 2020 is anything but a normal year, there are still a few vestiges of sports remaining. While essentially all sports are on hold, the NFL draft still happened, and college football recruiting is still taking place.  MSU picked up a total of ten commitments in the month of April, despite the lock down and new coaching staff.  Some things are still proceeding.

For me, data can not be locked down. Over the past year or two, I have been building a database of high school / college football recruits and digging through the data in search of interesting stories.  With the completion of the 2019 football season and the 2020 NFL draft, it is time to start sharing these stories with all of you. 

The first topic in my study focuses on the topic of "stars" and will lay the ground work for what is to come. During the summer, as college football teams gain commitments, fanatics will obsess over how many "stars" a player has next to his name and how this star rating will impact the "ranking" of that team's next class of football players.  While it makes for interesting water-cooler (now Zoom?) discussions, there is very little information or analysis of what these different numbers of "stars" really mean.  How much better is a five-star compared to a four-star?  What do team rankings mean? Can this all be quantified in a way that actually makes sense? Naturally, there are several recruiting services and websites out there that try to answer these questions.

The first service of note is 247, which claims to have a "consensus" ranking of high players by combining the rankings of several other services.  Most journalists will use the 247 star rating as a default. In the case of 247, they assign "points" to each player out to at least four decimal points, with the "consensus" number one player in the country getting 1.000 "points."  As for stars, there is usually a roughly fixed number of players in each star category, from two-stars up to five-stars.  As for team rankings, they use a complicated formula involving a Gaussian distribution which looks pretty, but which doesn't seem to have a lot of actual logic behind it (no offense...)  

My preference is to use the rankings and rating developed by Rivals.  I have been a member of that community for many years and it is the system that I am most familiar with.  Rivals also uses a star system (two to five) and breaks each category into subgroups with a numerical "rating" that spans a range from 4.9 to 6.1 (although ratings below 5.2 seem to have been phased out over the past few years).  Rivals also uses a complex formula to rank teams, based on the numerical ratings and bonus points for players in the Top 250.  

Both systems (and other systems that I have seen) both essentially try to do the same thing.  They rank players and place them into bins with some attempt to numerical quantify their level of skill in order to project how successful that player will be in college and eventually (perhaps) in the pros. My concern with both systems is that they present something that appears quantitative, but which is actually still just qualitative. 

For example, what does it tell us that MSU tight end Matt Dotson had a score of 0.8985 coming out of high school on 247 while Cody White had a score of 0.8847?  Does this mean Dotson is 1.56% better of a player?  On Rivals, Dotson was a 5.9-rated player (mid four-star) while Cody White was a 5.7-rated player (high three-star).  Does this suggests that Dotson is/was actually 3.5% better?  

Furthermore, when it comes to team rankings, this issue of quantifying the value of each player is simply propagated, no matter what formula is used.  What is the "best" way to quantify the quality of a class? By adding up stars? By averaging the 247 points or Rivals ratings, either straight or using some (arbitrary) weighting?  What does it mean that MSU's 2019 class had 1522 points on Rivals while Michigan's class had 2268 points?  Is UofM's class 49% better?  Can we expect UofM's class to produce 49% more NFL players or Big Ten All Conference players?  What does any of this actually mean?

Clearly, this is a tricky problem, and there is honestly no "correct" answer. My approach to solve it is to find a metric that does tell us something quantitative about the quality and potential of high school football players. What I decided on was to use the NFL draft as a simple measure of "output."  This metric is compelling, as it is easily tabulated, readily available, and simple.  A player each gets drafted or he doesn't.  It certainly does not tell us everything about a player and how productive he was in college, but I believe that it does serve as a proxy for overall success.  As for the "input" to this correlation, I needed a large and consistent data set of high school player rankings.  In principle, I could have used stars, Rivals ratings, or 247 points.  In the end, I made the decision to use the 4.8 to 6.1 Rivals ratings system, as is contains a sufficiently large number of categories, and the data was fairly easy to mine.

I then set out to build a database of all players in the Rival database back to 2004 and cross reference it to every NFL draft back to 2007.  I quickly found out that the Rivals data starts to get a bit inconsistent and/or incomplete before 2007, so I generally cut off my analysis there. Also, while I have players all the way up to the 2020 high school senior class in my database, from the viewpoint of the correlation, there are still a significant number of players from the 2016 class who have not yet used up their eligibility.  So, the range of high school classes that I use as a baseline are the 2007 to 2015 classes.  This represents nine total classes and roughly 34,000 high school football players, not all of which eventually signed with any team at all.  With this introduction, it is now time to start digging into the data.

Do Stars Matter?

The first important fact about the Rivals ratings and/or stars is that the total number of players in each bin or category is not the same.  As a general rule, the more stars, the fewer the players.  The graph below shows the average number of players in each Rivals bin over the past nine classes.

Figure 1: Distribution of Players in the nine Rivals Rating bins

As a general rule, the five-start (6.1) bin contains roughly 32 players, and is meant to try to project players that are likely to be future first-round draft picks. The three four-star bins (5.8, 5.9, and 6.0) usually consist of about 350 high school recruits, which more or less corresponds other players that are likely to get drafted.  The three-star bins (5.5, 5.6, and 5.7) contains about 1,300 total players while the two-star bins (5.2, 5.3, and 5.4) is made up of around 2,000 or more players.  In any given year, the Rivals database contains 3,500 to 4,000 rated players.  I believe that the 247 rating system essentially mirrors this distribution.

When ran through all the numbers from 2007 to 2015, comparing the NFL draft rate to original high school recruiting rating, I got the following correlation:

Figure 2: NFL draft rate as a function of high school recruiting rating

I separated the data from players who committed to Power 5 teams as well as Group of Five teams, as there was a significant difference.  In general, I will focus mostly on the Power Five data set. To be clear, the Power Five data set shows the fraction of players drafted compared to the total number of players in that bin who signed with a Power Five team on Signing Day. This analysis does not consider walk-ons. From this one chart, we can learn a lot about the relationship between recruiting rankings and eventual college success, as measured by NFL-"draftability."

There is a little but of something for everyone in this chart. First, it is clear that "stars matter."  The correlation between recruiting rating and the draft rate is very clear.  A little over half of all five-star recruits eventually hear their named called on draft weekend.  For the next bin (6.0, high four-stars) the rate drops to about a third.  For the next bin (5.9, mid four-stars) the rate is down to a quarter, and for the rest of the four-stars (5.8) it is slightly less than a fifth.  For the three-star bins, the draft rate is down to between six and eleven percent.  For the two-star bins, the rate is about five percent.  Clearly, players with more stars are more likely to be NFL prospects. 

But, there is also another side to that coin.  While five-star recruits more often than not wind up in the NFL, almost half of them do not.  If we slide down just one more bin to the high four-stars (who are still Top 100 National Recruits), a full two-thirds of those players don't get drafted.  These are far from can't-miss prospects.  Furthermore, the draft rate does not fall to zero as fast as many "star grazers" seem to believe. For a group of roughly ten random three star players, on average one of them is going to eventually get drafted.  While it is not a great strategy to completely stock a Power Five team with two-start recruits, one in twenty will eventually get drafted.  Talent is certainly there in the ranks of of three-star and two-start recruits.  It just takes skill to identify and develop it.

As a final comment on this data, I should note that I make no distinction between the round in which players get drafted.  As stated above, the five-star category is supposed to project eventual first round draft picks.  The chart below shows the actual distribution of players per round.

Figure 3: Distribution of players in each NFL draft round based on high school star rating

As the Figure shows, the early rounds do tend to have a bias towards the very high level high school recruits, and the vast majority of five star players who are drafted go in the first three rounds.  A full 20 percent of all first round picks are five stars, and over 50 percent are four or five stars.  A closer look at the data suggests that this bias comes almost exclusively from the five stars and high four star (6.0 rated) players who were generally ranked in the Top 75 or so coming out of high school.  

As for the idea that five star recruits will become first-round draft picks, that is true for 18 percent of all five star players from 2007 to 2015, with an additional 13 percent going in the second round and eight percent going in the third. As for three star players and mid-to-low four star players, they are distributed pretty evenly in the seven rounds, with the two star players and lower appearing more frequently in the later rounds.   

A New Metric: "NFL Draft Potential"

While the above data, especially that found in Figure 2, gives valuable information, perhaps its greatest potential value lies in its ability to answer the question that I explored in my introduction: what is the best way to quantity the potential "value" of a given recruit or recruiting class? This Figure provides just such as opportunity in the form of probability and expected value. 

It is obviously impossible to say on signing day if a given player is going to be a future NFL Pro Bowler, just a college starter, or a complete bust.  Whether a player is a star or a bust may depend on totally unpredictable factors such as injuries, mental and physical development, run-ins with the law, etc.  But, what we can do is to use historical data to assign a probability that a player with a certain rating is going to wind up in the NFL draft.  Figure 2 gives exactly these probabilities.  To illustrate what I mean, let's say that a given recruiting class is made up of the follow 24 players:
  • Two five-star (6.1) recruits
  • Three high four-star (6.0) recruits
  • Four mid four-stat (5.9) recruits
  • Six low four-star (5.8) recruits and
  • Nine high three-star recruits (5.7)
How many players from this group would we expect to eventually get drafted?  The answer is about five, one from each of the five ratings bins.  More precisely, based on the data in Figure 2:
  • 2 x 54.6% = 1.09 players
  • 3 x 33.5% = 1.00 player
  • 4 x 24.2% = 0.97 players
  • 6 x 17.5% = 1.05 players
  • 9 x 11.3% = 1.02 players
for a grand total of 5.13, which is the expected value of future NFL draft picks from this "class."  While this new metric is not perfect, it has a lot of things going for it.  First, it actually has a unit (number of players) that has a physical meaning.  Thus, when we compare players, teams, states, or classes, we have some idea of the magnitude of these differences in a real sense.  A "score" of six actually is twice as good as a "score" of three.  

Second, because the metric is based on data averaged over all Power Five teams, it is possible to also examine the deviation for this mean in an equally quantitative sense.  For example, if over the time frame that the data was collected, a team had a total NFL Draft Potential score of 20, but actually put 25 players into the draft, then that team has overachieved by five players.  Mathematically, that means that all the other teams combined put five fewer players into the draft than their collective NFL Draft Potential predicted.  Everything sums to zero.  This also has very powerful implications that I will explore over the next several pieces in this series.  

This metric is the cornerstone of how I plan to analyze recruiting data going forward. As you might imagine, there is a lot to cover and a lot of new data to dig through.  Next time, I will take a closer look at the past MSU classes and how they compare to the rest of the Big Ten and Nationally. After that, I will dig a bit deeper into the actual on-field performance of MSU's classes over the years, looking for areas of under and over-achievement. Finally, I look at recruiting and on-field performance of MSU and some of MSU's competition with respect to different position groups and geographical locations. It is going to be a fun ride.

That is all for now.  As always, stay home, wash your hands, and Go Green.

Comments

Popular posts from this blog

Dr. Green and White Helps You Fill Out Your Bracket (2024 Edition)

For as long as I can remember, I have loved the NCAA Basketball Tournament. I love the bracket. I love the underdogs. I love One Shining Moment. I even love the CBS theme music. As a kid I filled out hand-drawn brackets and scoured the morning newspaper for results of late night games. As I got older, I started tracking scores using a increasing complex set of spreadsheets. Over time, as my analysis became more sophisticated, I began to notice certain patterns to the Madness I have found that I can use modern analytics and computational tools to gain a better understanding of the tournament itself and perhaps even extract some hints as to how the tournament might play out. Last year, I used this analysis to correctly predict that No. 4 seed UConn win the National Title in addition to other notable upsets. There is no foolproof way to dominate your office pool, but it is possible to spot upsets that are more likely than others and teams that are likely to go on a run or flame out early....

2024 Week Eight Preview: OK Computer

Playing the first game after a bye week is like waking up from a nap. It is a little tough to predict how the body will respond. If a nap comes at just the right time and lasts for just the right length of time, it can be very refreshing and rejuvenating. But sometimes waking up for a nap can be rough. It can cause a disorienting, groggy feeling like suddenly two plus two equals five and that down is the new up. Based on the way the three weeks prior to the bye week went, last week's break at the midpoint of the season came at exactly the right time for the Spartans. Facing one top five team is challenging enough. Facing two top five teams on consecutive weekends including almost 5,000 miles of travel is something else entirely. But how will the rested Spartans look on the field come Saturday night? It is hard to predict what we are going to get. It is the classic "rest versus rust," million dollar question.  I prefer to be optimistic and to believe that the Spartans will...

2024 Week Seven Preview: Intermission

It is hard to believe that we are already halfway through the Michigan State Spartans' 2024 season. The Green and White currently sit at 3-3, having just lost two games straight to teams both ranked in the top three nationally.  Despite the current losing streak, Michigan State is actually slightly ahead of schedule. While the Spartans' schedule currently grades out to be harder than expected when I conducted the analysis this summer (by 0.7 games), Michigan State's current odds to go to a bowl game (46%) are 10 percentage points higher than what I projected.  In Week Seven, Michigan State has drawn a much needed bye. Think about it as an intermission of sorts. The Spartans' mission this weekend is to rest, heal, reflect on the first half of the season, and prepare for back half of the schedule with the goal of qualifying for the bowl game. Michigan State's team and staff may be taking it easy, but data and Vegas never sleep. Today's piece will focus more on the...