# Correlations in direct marketing: an intro

This week, I’d like to take a look at some of the formulae and algorithms that run our lives in direct marketing.

The algorithm was invented by Al Gore and named after his dance moves
(hence, Al Gore rhythm).
Here is a video of him dancing.

Before you run in fear, my goal is not to make you capable of running these algorithms — some of the ones we’ll talk about this week I haven’t yet run myself.  Rather, my goal is to create some understanding of what these do so you can interpret results and see implications.

And the first big one is correlation.

But, Nick, you say, you covered correlation in your Bloggie-Award-winning post Semi-Advanced Direct Marketing Excel Statistics; will this really be new?

My answers:

1. Thank you for reading the back catalog so intensely.
2. The Bloggies, like 99.999998% of the Internet, do not actually know this blog exists.
3. In that post, I talked about how to run them, but not what they mean.  I’m looking to rectify this.

So, correlation simply means how much two variables move together (aka covariance).  This is measured by a correlation coefficient that statisticians call r.  R ranges from 1 (perfect positive relationship) to 0 (no relationship whatsoever) to -1 (perfect negative relationship).  The farther from zero the number is, the stronger the relationship.

A classic example of this is height and weight.  Let’s say that everyone on earth weighed 3 pounds for every inch of height.  So if you were 70 inches tall (5’10”), you would weigh 210 pounds; at 60 inches (5’0”), you would weigh 180 pounds.  This is a perfect correlation with no variation, for an R of 1.

Clearly, this isn’t the case.  If you are like me, after the holidays, your weight has increased but you haven’t grown in height.  Babies aren’t born 9 inches long and 27 pounds (thank goodness).  And the Miss America pageant/scholarship competition isn’t nearly this body positive.  So we know this isn’t a correlation of one.

That said, we also know that the relationship isn’t zero.  If you hear that someone is a jockey at 5’2”, you naturally assume they do not moonlight as a 300-pound sumo wrestler on the weekend.  Likewise, you can assume that most NBA players have a weight that would be unhealthy on me or (I’m making assumptions based on the base rate heights of the word with this statement) you.

So the correlation between height and weight is probably closer to .6.

There’s a neat trick with r: you can square it and get something called coefficient of determination.  This number will get you the amount of one variable that is predicted by the other.  So, in our height-weight example, 36 percent (.6 squared) of height is explained by its relationship with weight and vice versa.  It also means that there’s 64 percent of other in there (which we’ll get to tomorrow when we talk about regression analysis.

You can get some of this intuitively without the math.  Here’s a direct marketing example I was working on a couple of weeks ago.  An external modeling vendor had separated our file into deciles in terms of what they felt was their likelihood of renewing their support.  Here’s what the data looked like by decile:

 Decile Retention rates over six months 1 50% 2 40% 3 35% 4 32% 5 30% 6 25% 7 27% 8 24% 9 28% 10 21%

No, this isn’t the real data; it’s been anonymized to protect the innocent and guilty.

You need only look at this data to see that there is a negative correlation between decline and retention rate — the higher (worse) decile, the lower the retention rate.  It also illustrates that it’s not a perfect linear relationship — clearly, this model does a better job of skimming the cream off the top of the file than predicting among the bottom half of the file.

Tomorrow, we’ll talk about the implications of these correlations for direct marketing.