One of the main things football analysts look for when they study football statistics is the correlation between two aspects of the game. We've heard countless arguments around the media and the internet about a variety of topics:
"A team that rushes over 30 times wins 76% of their games"
"They really controlled the clock and won the time of possession battle"
"You need to be efficient in the red zone to win"
We've all heard these things, but how do they actually relate to the reality of the NFL? Today we'll look at correlation and causation within the NFL, and whether some of the things we've come to believe are true or not. Now I won't be covering every statistic or misconception in the NFL, rather just looking at a few example and then I hope that will provide a guideline for future situations that may come up.
This article will be the first of two that look at two statistical parts of the NFL. Today will obviously look at correlation and causation in the NFL and the next article will look at the predictive nature of statistics. Then as time goes on I will likely share more data as I research it under this title. Now I know this may not interest most of you, which is fine, but I figured since I did this research I might as well share it, and for those of you who do enjoy it, awesome.
Correlation and Causation:
Before we get started I suppose it's important to briefly explain what these words mean, I'll do my best, hopefully the statisticians in the crowd don't get too upset. While Wikipedia shouldn't be used as a reference in most cases, they do sum up these words simply and effectively:
In statistics, dependence refers to any statistical relationship between two random variables or two sets of data. Correlation refers to any of a broad class of statistical relationships involving dependence... Familiar examples of dependent phenomena include the correlation between the demand for a product and its price.
Now there are varying types of correlation, but this isn't a statistic site so we can simply sum it up by saying two items are dependent (or are correlated) if when one changes, the other is affect. Now it's not completely that simple, but that's a good summery, we also won't get into the statistical mathematics either since there are college classes for such things.
There are varying levels of correlation that we need to discuss, ranging from 1 to -1. A correlation of 1 is a perfectly positive, linear relation, a 1 for 1 growth. They share a perfect growth pattern together. Say 1 banana costs 1 dollar and they share a correlation of 1, increase the banana order to 2 and it would cost 2 dollars. A -1 is the opposite, if one item increases the other drops an equal but opposite amount. So if something has a negative correlation that means when one item changes, the other changes equally but in the opposite direction. An example of this is the more interceptions you throw the less chance of winning you have, that's a negative correlation. As interceptions go up, the changes of winning go down.
In common usage, causality is also the relationship between a set of factors (causes) and a phenomenon (the effect). Anything that affects an effect is a factor of that effect. A direct factor is a factor that affects an effect directly, that is, without any intervening factors. (Intervening factors are sometimes called "intermediate factors".) The connection between a cause(s) and an effect in this way can also be referred to as a causal nexus.
While correlation means two items are connected, it doesn't mean they cause each other. The example I gave of a negative correlation (interceptions to wins) is a form of causality but not all football statistics have causation. While most football statistics have some form of correlation to winning, not all are the cause of winning, but we'll get into that in the next section. One thing to note again, Correlation does not automatically equal Causation, but it can.
How Does Football Tie Into All of This?
Running to Victory
Often times, as the examples at the beginning show, we tie statistics to results. But how often are we right in our conclusions? Let's start by looking at the run game and the NFL. The connection (or lack of one) between rushing yards and attempts and winning has been coming into the light more and more in recent years. And let's look it this way, yes it's true that teams that win more have more rushing yards on average, there is a correlation, but is there causation? Now this is where stats leave and just football analysis remains. Let's lay this out:
- We know that winning teams have more rushing yards and attempts
- We know that when a team is ahead they run more
- We know that when a team is behind they pass more
By knowing these three things we can build a logical argument foundation. I'll let an analytical site do that for me, they just have a way with words:
Once teams are already winning the game is when the largest differential in running yardage occurs. This makes sense because running the ball makes the game shorter. The clock doesn’t stop after a running play like an incomplete pass. The winning team wants the game to be over quicker, they’re the ones winning after all!
To build on this, if you look at history, you see the same trend, rushing yards in the first three quarters really don't correlate to winning but rushing yards and attempts in the 4thquarter does. "Why?" you may ask. It's because when a team is ahead they rush it, more than in any other situation. A team that is behind will run less because they lose time off the clock and usually earn less yards per play when running the ball.
So the idea that rushing the ball and winning are connected is true, but saying that the more you rush the more you win is false. Now there are more layers to this, more effective running increases your chances of winning, but attempts don't.
Time of Possession:
Another area that we often hear about from commentators saying a team needs to better control the clock. The thing is, can you really control the clock. From statistical studies, we know time of possession does correlate to winning, though not very strongly, but is there causation? Let's try another logical breakdown:
- We know running the ball takes up more time of possession than passing
- We know that teams that tend to run the ball more are likely ahead
- We know that teams that are behind tend to pass the ball more
This leads me, and others, to say that time of possession is:
"intermediate outcome" in a football game. In other words, it is a natural byproduct of being good at something else. You can’t be good at "time of possession." Just like total running yards, lopsided advantages in TOP tend to appear late in games when offenses that are already ahead let the clock run down while offenses that are behind try to move the ball quickly. So it’s winning that often leads to TOP and not necessarily TOP that leads to winning.
When we look at the nature of football, we see that time of possession doesn't lead to winning, and when we sit down and think about it, that makes sense. It's not how slow or fast you score, it's that you score, the best example of this being the Colts-Dolphins game where the 'Phins held the ball for 45:07 and still lost, because what matters is that you score as often as possible, not the speed at which you score.
Now like running ball and winning, there are other factors, a team that controls the clock for 55 minutes is more likely to win since they have far more chances to score, but what matters there is points per drive, if you hold the ball 55 minutes but never reach the end zone, you are likely to lose. This is the case of the top ranked defenses of the league who have bad offenses.
Red Zone Efficiency:
This is a rising stat among many people over the past few years. On a basic level, it makes sense, if you are in the red zone, you should score more which should lead to more winning, but does it? Let's think about this, do you even need to reach the red zone to score? No, of course not. Recently I use a brain teaser created by Brian Burke at Advanced NFL Stats that helps put this in perspective:
Imagine you are a coach and the night before the big game the Ghost of Football Future visits you and says: "Tomorrow you will have two 3rd-and-1s, one at the other team’s one-yard line and one at mid-field. One will result in a TD and one in a lost fumble — but you can choose which will be which".
That choice is a no-brainer: you want to score from the 50 and fumble on the one leaving the other team backed up against its goal line — not score from the one and give the other team the ball on the 50. But what does this say about any special importance of scoring in the red zone compared to scoring from elsewhere?
The choice is obvious, and it becomes even moreso if you apply it to the whole game. While everyone will agree it’s good to be efficient in the red zone, there’s no denying that, I don’t think any coach would be against it, but it’s not as vital as I think it believed. I want a team that is good in the red zone, but there are better measure of scoring ability, like points per drive. There is a correlation between winning and red zone performance, it is very small. If we look at the top red zone efficiency teams of 2011, it's not too hard to see it really doesn't matter a whole lot. Playoff teams on average ranked 14.6th, with the distribution across the board. San Francisco ranked 29th, the Giants ranked 9th while the Lions ranked 1st. It totally varies. The average record of the top 16 teams compared to the bottom 16 teams are nearly identical.
There just isn't any causation between red zone efficiency and winning. Now is it okay to suck in the red zone, not particularly, because like all of the topics we've covered, it's more complex. A team should maximize it's chances when given them, and it's not bad to be efficient in the red zone, but it's also not a good way to judge an offense.
I hope this wasn't too confusing, I tried to avoid the math of each situation, though there was plenty of it to back up our conclusions. This was mostly a mental foundation so that when you hear a stat or explanation you now have to tools to think about the idea logically and decide it that stat is actually related to the topic. Once you have that logical foundation, you are able to do research to see if it supports you idea. I'm sure there are some of you with the math background to go out and find the actual correlation between two items, and we'll get into that later, but for today, I hope we got your mind thinking.