regression

In my DMA Leadership Conference talk, I said that people who listened to what donors say they want in donor surveys deserve to be lied to. That was obviously too harsh – what I should have said is that they deserved to be misled.

Because people (not just donors, but all human beings*) aren’t meaning to lie to you; they just don’t know what their true motivation is. As we’ve seen, emotional reaction happens 6000 times faster than rational thought. So unless someone is doing System Two thought, where they are rationally considering all alternatives, the role reason plays in this process is coming up with the best possible justification of a decision already made.

Consider a study that asked people to rank their top 16 motivations. Sex was rated #14; wealth was dead last. Then they looked at actual subconscious motivators of decisions. Sex was rated #1 and wealth was rated #5.

This should be considered no surprise to people who have met, well, ya know, people. But it was a surprise to people themselves, who think themselves chaise and uncorruptable, but in reality dream of having very special moments in Scrooge McDuck’s vault.

0pu-08jzteemwvmsx

But that doesn’t mean all donor surveys are bad – far from it. It just means that, in a statement that may get me arrested by the Tautology Police**:

Bad donor surveys are bad. Good donor surveys are good.

Common traps in donor surveys:

Talking only to current donors. You want to talk to people who stopped giving as well, to the extent that they will talk to you. After all, you are looking for the difference between these two groups. Trying to define who your good donors are without talking to former donors is like saying the reason that Fortune 500 companies are successful is because they have employees and offices.
Asking donors to analyze why they did what they did. They don’t know. So they are going to try to figure out what answer someone like them would generally say or what they think you want to hear. Neither is helpful to you.
Asking donors what is most important to them. Clearly, from the above, the answer is sex. Looking only at your limited options, however, they will probably make mistakes in determining what is important to them, similar to the poor people who thought that sex and wealth (aka Genie Wishes #1 and #2) didn’t impact them.

So how do you construct a survey that gets to these important points? You are going to set up your survey so that you can run a regression analysis.**** If you need help with how to do this, check out our post on basic regression.

You will need a dependent variable. Ideally, this will be donation behavior because it is a clear expression of the behavior you are trying to impact. If not, an overall satisfaction score with the organization will be generally OK, as it should correlate strongly with donation behavior.

For your independent variables, ask about aspects of your organization. So, for example, “have you ever called X Organization about your donations?”, “did you receive a thank you note for each donation you made?”, “have you been to X Web site”, “how many days did it take for you to get your thank you note on your last gift?”, etc.

The powerful thing about regression analysis is that it will help you figure out both how people feel about their experience and how important that experience is to them? For example, my guess is that for most organizations, the number of days it took to get a thank you will be a good predictor of retention. Since the analysis tells you the strength of that association, you can invest the right amount of resources into that area versus new donor welcome packages or donor relations staff or database infrastructure and the like.

* Yes, non-donors are also considered human beings – just slightly lesser ones.

** Motto: Enforcing through enforcement since Socrates.***

*** Former motto: Our motto is our motto.

**** Or other modeling if you are feeling fancy.

If you don’t know what a linear regression analysis is or how it is measured, I recommend you start with my post on running regressions in Excel here.

OK, now that you’re back, you’ll notice I did an OK job of saying what a linear regression analysis is and what it means, but I didn’t mention why these would be valuable. Today, we rectify this error.

In yesterday’s post on correlations, I mentioned that they only work for two variables at a time. This is extremely limiting, in that most of your systems are more complex with this. Additionally, because of interactions between multiple variables, it’s difficult to determine what is causing what. I’ve discussed before how the failure of the US housing market was related to people assuming variables that are independent were actually correlated with each other.

Linear regression analysis allows you to look at the intercorrelations between and among various variables. As a result, regression analysis is the primary basic modeling algorithm. In fact, it’s often used as a baseline for other approaches — if you can’t beat the regression analysis, it’s back to the drawing board.

Two side notes here:

First, if you are interested in learning to do this yourself, I strongly recommend Kaggle competitions. Kaggle is where people compete for money to produce the best models for various things — right now, for example, they are running a $200,000 competition on diagnosing heart disease, a $50,000 competition for stock market modeling, and a $10,000 competition to identify endangered whales from photography.

It’s some pretty cool data stuff and the best part is that they have tutorial competitions for people like me (and perhaps you; I would hate to assume). One sample is to model what passengers would survive the sinking of the Titanic from variables like age, sex, class ticket, fare, etc. They walk you through correlation, regression, and some more advanced modeling techniques we’ll discuss later in the week. Here, as ever, they look for improvement on regression as the goal of more advanced models.

Second, it’s tempting to view regression as a Mendoza line* of modeling: a lowered hurdle that shouldn’t be bothered with. But regression can give you fairly powerful results and, unlike many of the other more advanced modeling we’re going to discuss, you can do it and interpret it yourself.

That said, like correlation, it doesn’t know what to do with non-linear variables. For example, you have probably noticed that your response rate falls off significantly after a donor hasn’t donated in 12 months (plus or minus). A regression model that looks at number of months since last gift will ignore this and assume that the difference between 10 and 11 months is the same as the difference between 12 and 13 months. And it isn’t. It also will choke on our ask string test in the same way as correlations will.

So here are some things worth testing with regression analyses:

Demographic variables: you may know the composition of your donor file (and if you are like most non-profits, it’s probably female skewed). But have you looked at which sex ends up becoming the better donor over time? It may be with a regression analysis that the men on your file donate more or more often (or not), which could change your list selects (I know I have been known to put a gender select on an outside file rental to improve its performance).

Lapsed modeling your file: Using RFM analysis, you know what segments perform best for you and which go into your lapsed program (if not, use RFM analysis to figure out what segments perform best for you). However, there may be hidden gems in your file that missed a gift (according to you) and would react well if approached again hidden in your lapsed files. Taking your appended data like wealth, demographics, and other variables alongside your standard RFM analysis can help find some of these folks to reach out to.

Content analysis: In the early regression article, I show a (bad) example of using regression analysis to find out what blog posts work best. This can be applied to Facebook or other content as well.

What I didn’t mention is that once you have this data, it probably applies across media. What works on Facebook and in your blog are probably good topics for your enewsletters, email appeals, and possibly paper newsletters as well. Through this type of topic analysis, you will figure out what your constituents react to, then give them more of it.

This, however, looks at your audience monolithically. In future posts, I’ll talk about both some ways to cluster/segment your file like k-means clustering and some ways on improving on regression analysis with techniques like Bayesian analysis. For now, though, it’s time to look at some formulae that rule our worlds even beyond direct marketing: what do Google and Facebook use?

* A baseball term coming from Mario Mendoza, a weak-hitting shortstop who usually averaged around .200 batting average. Anyone below Mendoza in the batting average category was considered to be hitting below the Mendoza line or very poorly. (He made up for this for several years with strong fielding). And now you know the rest of the story.

mario_mendoza_autograph

Direct to Donor

Direct marketing tips for the modern nonprofit

Menu

Creating useful donor surveys

Regression analysis in direct marketing