# Semi-advanced direct marketing Excel statistics

In addition to not being a database, Excel is also not a statistics package.  If you are going to do anything advanced, I highly recommend R.  The programming language, not the John-Cleese-played Q replacement in The World is not Enough and Die Another Day.

Cheer up! Yes, it’s a lesser part in lesser Bond movies.
You, John Cleese, are still an international treasure.

Anyway, stats in Excel.  We’ll start with correlations, as they can give us some insight into blog and Facebook traffic and interactions.

I’m going to demonstrate this on this blog’s stats because I have the data available. However, I’ve also done this with Facebook posts very successful.  The prep work you will have to do for either blog posts or Facebook posts is to record your outcomes (view, likes, shares, and/or comments), to code the subject matter of each post, and to put in any other variables like day of the week that may be relevant.  Here’s my version of this:

I have my blog posts on the left and the various factors on the right (and there are more tags, but I need to cut it off at some point to display it.  Yes, I have a blog post that has zero views. If you would like to break the seal on a blog post about why we do segmentation, it’s here; I’m sure it would appreciate a visit.

I then went through this and deleted tags that only applied to a single post. Then I ran a correlation for each individual variable to page views. The correlation function looks like

=CORREL(\$B2:\$B20,I2:I20)

And is expressed from 1 to -1.

Here’s what that looks like.  Let’s look at the days of the week first, because there appears to be an effect here — Monday content has been king, with a strong correlation to page views. It will be interesting going forward to see in the long term whether that is the nature of the content (I’ve been trying to put introduction content on Mondays and get progressively more involved throughout the week) or that people are more interested in reading blog posts on Mondays.  Nevertheless, I can probably do a better job of setting up the rest of the week as must-see content, since Tuesday, Wednesday, and Thursday are all a bit negative.

Images tend to correlate well to views as well, probably because they show better in social media.  I’d been noticing this from just a glance as well, so you will probably be seeing more images here in the weeks to come. They will also be less boring images, since some of the lower performers were images of equations and Excel sheets. It is not coincidence that the tallest Python led this blog post.

And it looks like cultivation and multichannel efforts are winning while conversion, lifetime value, and personalization are not as strong, with negative correlations to page views. I won’t be acting on this immediately, but keeping an eye on it. And I do have a multichannel week planned in the near future, so we’ll be able to test whether that’s an artifact in the data.

However, you might notice that the reason cultivation is ranked so highly is that it is in the two top performing posts, which are Monday posts. Is it the topic or the day that made those strong?  For this, you need regression.

Normally, you wouldn’t do this after only 20 blog posts.  We are not going to be able to draw any statistically significant conclusions, but I do want to show you how it’s done.

1. Go to Data > Data Analysis > Regression
2. Select the range from your outcome variable as your Y range and the range of the independent variables you want to test in the X range like so:
3. Hit OK. You’ll get something that looks like this.  In my case, it’s a really, really bad regression:

Yuck. The things you would normally be looking for are:

• in R-squared, you are looking for as close to 1 as possible.  One would mean your model is totally predictive. Zero means it predicts nothing at all.
• In P-value per variable, you are looking for less than .05.  That would show if there is a statistically significant relationship between any of the variables and your output. In this case, there isn’t and we can pretty much throw out the whole thing.
• If there is a relationship, you want to look at the coefficient for two things:
• Is it positive or negative? Positive is good things; negative is bad things.
• How big is the relationship? In this one, if these were significant, it would be a bad idea to post about personalization again, as posting about it reducing views by 7.  But it isn’t significant, so I’m not yet worried.

Hope this helps you with the stats side of Excel.  Tune in next week, when we look at some of the things that Excel is actually good at.

# Appending data in Excel

One of the primary reasons to have a database is to have data in various tables.  A simple example is a database with a people table – a list of your donors – and a donations table.  This allows for there to be a one to many relationship between the donors and donations.  After all, a good donor will generally donate multiple times.  By using a key in the people table, each donation is marked with that same key – e.g., all donations from donor 12398 have 12398 in their donor field and are associated with that record in the people table.

Excel can’t do this.  As a flat file, all you can represent are one-to-one relationships.  If you had to have multiple gifts in Excel, you would have duplicate donor records for each instance of the gift.

But you can append data to a record in Excel.

A while back, I wanted to do some homemade modeling on a file to add sex in as a variable (male/female in this case, to simplify the model).  Unfortunately, I didn’t have a list of who was male or female in this particular file, so I used US census records to determine how many males and females there were with a given first name.  This led to the odd conclusion that someone who had a name that 60 females and 40 males have in the US is a 60% woman.  Thank goodness I was using this for modeling and not for assigning Mr. or Ms.

By the way, if you were wondering if there were any perfect half-female, half-male names, there were three: Ariel, Hong, and Kris.  Sydney, Robbie, Frankie, and Kerry were almost androgynous, just missing the cut.  The Excel sheet is here if you are curious.

To complete the model, I had to put the percentage of femaleness with each record.  I used a somewhat fiddly function called VLOOKUP.

VLOOKUP tells Excel to look through a bunch of rows to find a particular value in the left column of an array, find it, and return another column in that array.  To wit:

=VLOOKUP(A1, ‘[Male v Female first names.xlsx]Sheet1’!\$A:\$B,2,FALSE)

This assumes that A1 is the first name you want to look up.  This then goes over to the document called Male v Female first names and looks at the first two columns, A & B.  It searches for the name in column A.  If it finds it, it returns back the value in B next to that value.  So, if A1 is Hong, it finds Hong in the array, sees the value next to that is 50%, and puts it in the cell in question.  The FALSE at the end just means you want an exact match, rather than the closest one.

It is by no means as pretty or efficient as a SQL JOIN statement.  In fact, if you have a lot of these to do, as I did, copy and paste it for the full column, then get yourself some tea and a crossword puzzle while Excel labors like a hamster on a wheel.  But, on the plus side, you don’t have to learn SQL.  You still should, and we’ll cover SQL for direct marketers at some point, but you can’t learn everything immediately.