# Correlations in direct marketing II: The Wrath of Khan Academy

Yesterday, we saw how to run and interpret correlations.  Today, we’re going to look at the implications of the way correlations are set up for direct marketers.

First and foremost, I must stipulate that correlation does not equal causation.  I did a good job of discussing this in a previous post talking about how attractive Matt Damon is in his movies.  Rather than go into a lot of detail on this, I’ll link over to that post here.  Looking back at that post, I forgot to put in a picture of Matt Damon, which I will rectify here:

Intelligence and attractiveness correlate;
I wish I could have explained this to people in high school.

This is fairly intuitive, given our discussion of height and weight earlier.  With exceptions for malnutrition and the like, it really doesn’t make sense to say someone’s weight causes them to be taller or height causes them to be heavier.

There’s a great Khan Academy video that covers a lot of this here. The Khan Academy video also gives me an excuse for the name of the blog post that I really couldn’t pass up.

Back to correlations, they only predict linear (one-way) relationships.  Given the renewal rates above, a correlation is not the ideal tool for describing this relationship, as it will give you a rubric that says “for every decrease in decile number, you will have an X% increase in renewal rate.”  We can see looking at the data that this isn’t the case — moving from 2 to 1 has a huge impact, whereas moving from 9 to 6 has an impact that is muddled at base.

Another example is the study on ask strings we covered here.  When looking at one-time donors, asking for less performed better than asking for the same as their previous gift.  Asking for more also performed better than asking for the same as the person’s previous gift.  However, if you were to run a correlation, it would say there is no relationship because the data isn’t in a line (graphically, you are looking at a U shape).  We know there is a relationship, but not one that can be described with a correlation.

You’ll also note that they only work between two variables.  Most systems of your acquaintance will be more complex than this and we’ll have to use other tools for this.  That said…

Correlations are a good way of creating a simple heuristic.  SERPIQ just did an analysis of content length and search engine listings that I learned about here. They found a nice positive correlation:

Hat tip to CoSchedule for the graph and SERPIQ for the data.

As you read further in the blog post, you’ll see that there is messiness there.  It’s highly dependent on the nature of the search terms, the data are not necessarily linear, and non-written media like video are complicating.  However, the data lend themselves to a simple rule of “longer form content generally works a little bit better for search engine listings” or, in a lot of cases, “your ban on longer-form content may not be a good idea.”  While these come with some hemming and hawing, being able to have simplicity in your rules is a good thing, making them easier to follow.

But refer back to the original point: correlation isn’t causation.  Even in the example above of word count being related to search engine listings, more work is required to find out what type of causal relationship, if any, there is between word count and search engine listings.

Hope this helps.  Tomorrow, we’ll talk about regression analysis, which will take you all the way back to your childhood to look for memories that will…

Um, actually, it will be about statistical regression analysis.  Never mind.