Last week we briefly covered how correlation and causation relate to the game of football. Well today I want to take a closer look at statistics. Now Mile High Reports has a love-hate relationship with statistics, and I know I haven't always been the most helpfully when it comes to finding a balance, especially before I joined the staff, but since then I've been trying to look at how statistics both represent and predict football.
Now the biggest issue many people have with sports statistics is that they feel it doesn't tell the whole story, and most of the time that's true. Looking only at total yards for running backs or quarterbacks is a good example of this, while important, it's hardly a good picture of what really happened. But this article isn't meant to talk about how statistics represent football, rather we'll focus on my other mission, which statistics, if any, can predict future success?
Predictions, a Beginning:
There are countless statistics out there for football, but which ones really gauge how good a team IS (not was or could be) because that's what predictive statistics wants to know, if two teams are about to face each other, how can we best tell which one will win? This was my goal for the past two years. Now I laid out a few guidelines and goals for the study:
- This obviously can't take place at week 1 since there has been no real data on the current roster, so what week do I begin the study?
- What statistics should I use?
After creating these questions I began to look into this.
Now I don't want to delve too deeply into the methodology since it's not as important for our understanding, but for those with questions or concerns, feel free to email me and I'll gladly go into more detail.
The NFL by nature is tough to predict due to the fact there are only 16 games, a very small sample size to really judge a team, but from a standpoint of how long each game is and how many plays each unit of a team is on the field, the NFL does allow for the ability to overcome this. But when it came to my study, how many weeks should I allow to pass before comparing teams?
After two seasons I found the sweet spot of correlation between the statistics and themselves to be week 9.
This was trickier since there are so many metrics to choose from. Now there has been surprising little work done with predictive statistics so in 2010 I just kind of shot in the dark with which ones to use. I was met with mixed results and during the 2011 off-season I removed the statistics that predicted under 60% of games I also kept home field despite being under 60% because that is part of a larger study and I wanted to continue tracking it.
So here was my list last season:
*How they are listed on the table will be in ()*
- Home field (Home)
- Yards per attempt (Y/A)
- Net yards per attempt (includes sack yards) (NY/A)
- Adjusted net yards per attempt (ANY/A)
- Passer rating (Rating)
- Completion percentage (Comp %)
- Offensive touchdown percentage (TD %)
- Offensive interception percentage (Int %)
- Pro Football Focus' offensive team score (PFF O)
- Pro Football Focus' defensive team score (PFF D)
- Pro Football Focus special team score (PFF ST)
- Football Outsiders DVOA (DVOA)
- Simple Rating System (SRS)
- Fan voting (pretty much an eye test prediction) (Eye)
I also included another statistic which the combination of the group, which ever team won the majority of categories was the team I would personally predict to win (Prediction). With this group I was able to get an even better picture of predicting success. Let's take a look and see which ones did well and which ones didn't.
Here is a small sample of my compilation of predictive statistics for the past two seasons:
|Home||Y/A ||NY/A ||ANY/A ||Rating ||Comp % ||TD % ||Int % ||PFF O ||PFF D ||PFF ST||DVOA ||SRS ||Eye ||Prediction
There are clear winners and losers. Note that there is also a clear separation between the regular season and the playoffs. Now this might be tied to smaller sample sizes overall, but it's hard to tell. Also home field is massively more important in the playoffs than during the regular season.
Now many may just look at this and say "so what?" Well what is important here is when we are talking about a player during the season, take a quarterback for instance, we can say:
So our quarterback has the better adjusted net yards per attempt compared to our opponents quarterback next week, historically speaking, we have a 75.2% chance of winning this game.
Or you could say
Well we are at home in this wild card game and we have the better defense according to Pro Football Focus, we should win this game.
You can also apply this to the importance of a statistic. We've heard some negative notes about the passer rating, we'll the team with the better passer rating up to that game is likely to win 69.7% of the time. Whereas home field advantage is largely overrated outside of the playoffs.
While hardly perfect, gaining the predictive nature of statistics gives us can help us understand who has the better chance of winning a game. Similar to a position by position comparison like we usually see prior to most games, this is along the same lines. It's not meant to predict perfectly, merely another tool to help us.
This is merely one use of predictive statistics. There is the study of statistics as to their deviance across multiple seasons, we see this a lot in fantasy football. There is also the study of how players statistics change based on their team and predicting how it will change if they go to a new team. There are a lot of ways to apply this, this was just a small example.
Next time we'll continue talking about predictive versus explanatory statistics.