It’s time to stop… vanity metrics

I’m writing this during the South Carolina Republican primary.  The votes haven’t started being counted yet, but I know who is going to win.  Because I know that Ben Carson has 35% of the Facebook likes among GOP contenders in the state; Trump is second at 25%.  Thus, Carson will get approximately 35% of the vote.

What?  Doesn’t it work that way?  Facebook likes aren’t a reliable indicator of support, donations, interest, or almost anything else?

The bitter truth: Facebook likes are a vanity metric.  They have little to do with your ultimate goal of constituent acquisition, donor conversion, and world domination, yet people will still ask what that number is.  And when they hear it, they will nod, say that that’s a good number, and ask what we can do to increase it.

That’s when a tiny little part of you dies.

So, in our Things To Stop Doing, we have vanity metrics.  These metrics may make you feel good.  They may be easy to measure.  And some of them may feel like a victory.  But they bring you little closer to your goals.  We are creatures of finite capacity and time, so the act of measuring them, talking about them, or (worst of all) striving for them drains from things that actually matter.

Facebook likes and Twitter followers are probably some of the better-known vanity metrics.  But they are far from the only ones.  And while some of these are partly useful (e.g., Facebook likes is an indicator of a warm lead repository for marketing on the platform), there’s almost always a better measure.

Because it always comes back to what your goals are.  Usually, that goal is to get people to take an action. Your metrics should be close to that action or the action itself.

Without further ado, some metrics to stop measuring.

Web site visits.  Yes, really.  This is for a couple of reasons:

  1. Not all visitors are quality visitors.  If you’ve been using Web site visits as a useful metric, and wish to depress yourself, go to Google Analytics (or your comparable platform) and see how long visitors spend on your site.  Generally, you’ll find that half or more of your users are on your site for more than 30 seconds.  Are 30 seconds long enough for people to take the action you want them to take on your site?  Not usually (except for email subscribes).

  2. Not all visitors are created equal.  Let’s say you find that people coming to your site looking for a particular advocacy action sign up for emails 10% of the time; those who come looking for information about a disease sign up 5% of the time; those who look for top-line statistics sign up 1% of the time.  Which of these is the most valuable visitor?

    This isn’t a trick question.  You would rather have one person looking for advocacy actions than nine people looking for stats.  Except that the metric of Web site visits lumps them all into one big, not-very-useful bucket.

These are both symptoms of the larger problem, which is that if you had to choose between two million visitors, of whom 1% convert, and one million visitors, of whom 3% convert, you’d choose the latter.  Thus, potential replacements for this metrics are visits to particular pages on the Website where you have a good idea of the conversion rates, weighted Web traffic, and (most simply) conversions.

Mail acquisition volume.  You get the question a lot – how many pieces are we sending in acquisition?  Is it more or less than last year?  And it’s not a bad estimate as to a few different things about a mail program: are they committed to investing in mail donors?  Is the program growing or shrinking?  What are their acquisition costs?

But from a practical perspective, all of these things could be better answered by the number of donors acquired (and even better by a weighted average of newly acquired donors’ projected lifetime values, estimated from initiation amount and historical second gift and longer-term amounts, but that’s tougher).  A good rule of thumb is:

Never measure a metric that someone could easily game with a counterproductive action.

And you can do that with mail acquisition volume by going on a spending spree.  Of course, you can also do that with donors acquired, but it will spike your cost per donor acquired, which you are hopefully pairing with the number of donors acquired like we recommend in our pairing metrics post.

Time on site.  You notice that people are only spending an average of 1:30 on your Web site, so you do a redesign to make your site and content stickier.  Congratulations – you got your time on site up to 2:00!

Someone else notices that people are spending 2:00 on your Web site.  They work to streamline content, make it faster loading, and give people bite-sized information rather than downloading PDFs and such.  Congratulations – you got your time on site down to 1:30!

Therein lies the problem with time on site – whatever movement it makes is framed as positive when it could be random noise.  Or worse.  Your sticky site may just be slower loading and your bite-sized content may just be decreasing conversion.

So another rule of good metrics:

Only measure metrics where movement in a direction can be viewed as good or bad, not either/both.

Here again, conversions are the thing to measure.  You want people to spend the right amount of time on your site, able to get what they want and get on with their lives.  That Goldilocks zone is probably different for different people.

Email list size.  While you totally want to promote this in social proof (like we talked about with McDonalds trying to get cows to surrender), you actually likely want to be measuring a better metrics of active email subscribers, along the lines of people who have opened an email from you in the past six months.  These are the people you are really trying to reach with your messaging.

When you remove metrics like these from your reporting or, at least, downplay them, you will have fewer conversions with your bosses that ask you to focus on things that don’t matter.  That’s a win for them and a win for you.

I should mention that I am trying to build my active weekly newsletter subscribers.  Right now, we have an open rate of 70% and click-through rates of 20%+, so it seems (so far) to be content that people are enjoying (or morbidly curious about).  So I’m hoping you will join here and let me know what you think.

 

It’s time to stop… vanity metrics

Regression analysis in direct marketing

If you don’t know what a linear regression analysis is or how it is measured, I recommend you start with my post on running regressions in Excel here.

OK, now that you’re back, you’ll notice I did an OK job of saying what a linear regression analysis is and what it means, but I didn’t mention why these would be valuable.  Today, we rectify this error.

In yesterday’s post on correlations, I mentioned that they only work for two variables at a time. This is extremely limiting, in that most of your systems are more complex with this.  Additionally, because of interactions between multiple variables, it’s difficult to determine what is causing what.  I’ve discussed before how the failure of the US housing market was related to people assuming variables that are independent were actually correlated with each other.  

Linear regression analysis allows you to look at the intercorrelations between and among various variables.  As a result, regression analysis is the primary basic modeling algorithm.  In fact, it’s often used as a baseline for other approaches — if you can’t beat the regression analysis, it’s back to the drawing board.

Two side notes here:

First, if you are interested in learning to do this yourself, I strongly recommend Kaggle competitions.  Kaggle is where people compete for money to produce the best models for various things — right now, for example, they are running a $200,000 competition on diagnosing heart disease, a $50,000 competition for stock market modeling, and a $10,000 competition to identify endangered whales from photography.

It’s some pretty cool data stuff and the best part is that they have tutorial competitions for people like me (and perhaps you; I would hate to assume).  One sample is to model what passengers would survive the sinking of the Titanic from variables like age, sex, class ticket, fare, etc.  They walk you through correlation, regression, and some more advanced modeling techniques we’ll discuss later in the week.  Here, as ever, they look for improvement on regression as the goal of more advanced models.

Second, it’s tempting to view regression as a Mendoza line* of modeling: a lowered hurdle that shouldn’t be bothered with.  But regression can give you fairly powerful results and, unlike many of the other more advanced modeling we’re going to discuss, you can do it and interpret it yourself.

That said, like correlation, it doesn’t know what to do with non-linear variables.  For example, you have probably noticed that your response rate falls off significantly after a donor hasn’t donated in 12 months (plus or minus).  A regression model that looks at number of months since last gift will ignore this and assume that the difference between 10 and 11 months is the same as the difference between 12 and 13 months.  And it isn’t.  It also will choke on our ask string test in the same way as correlations will.

So here are some things worth testing with regression analyses:

Demographic variables: you may know the composition of your donor file (and if you are like most non-profits, it’s probably female skewed).  But have you looked at which sex ends up becoming the better donor over time?  It may be with a regression analysis that the men on your file donate more or more often (or not), which could change your list selects (I know I have been known to put a gender select on an outside file rental to improve its performance).

Lapsed modeling your file: Using RFM analysis, you know what segments perform best for you and which go into your lapsed program (if not, use RFM analysis to figure out what segments perform best for you).  However, there may be hidden gems in your file that missed a gift (according to you) and would react well if approached again hidden in your lapsed files.  Taking your appended data like wealth, demographics, and other variables alongside your standard RFM analysis can help find some of these folks to reach out to.

Content analysis: In the early regression article, I show a (bad) example of using regression analysis to find out what blog posts work best.  This can be applied to Facebook or other content as well.

What I didn’t mention is that once you have this data, it probably applies across media.  What works on Facebook and in your blog are probably good topics for your enewsletters, email appeals, and possibly paper newsletters as well.  Through this type of topic analysis, you will figure out what your constituents react to, then give them more of it.

This, however, looks at your audience monolithically.  In future posts, I’ll talk about both some ways to cluster/segment your file like k-means clustering and some ways on improving on regression analysis with techniques like Bayesian analysis.  For now, though, it’s time to look at some formulae that rule our worlds even beyond direct marketing: what do Google and Facebook use?

 

* A baseball term coming from Mario Mendoza, a weak-hitting shortstop who usually averaged around .200 batting average.  Anyone below Mendoza in the batting average category was considered to be hitting below the Mendoza line or very poorly.  (He made up for this for several years with strong fielding).  And now you know the rest of the story.

mario_mendoza_autograph

Regression analysis in direct marketing

Correlations in direct marketing II: The Wrath of Khan Academy

Yesterday, we saw how to run and interpret correlations.  Today, we’re going to look at the implications of the way correlations are set up for direct marketers.

First and foremost, I must stipulate that correlation does not equal causation.  I did a good job of discussing this in a previous post talking about how attractive Matt Damon is in his movies.  Rather than go into a lot of detail on this, I’ll link over to that post here.  Looking back at that post, I forgot to put in a picture of Matt Damon, which I will rectify here:

 

damon_cropped

Intelligence and attractiveness correlate;
I wish I could have explained this to people in high school.

This is fairly intuitive, given our discussion of height and weight earlier.  With exceptions for malnutrition and the like, it really doesn’t make sense to say someone’s weight causes them to be taller or height causes them to be heavier.

There’s a great Khan Academy video that covers a lot of this here. The Khan Academy video also gives me an excuse for the name of the blog post that I really couldn’t pass up.

Back to correlations, they only predict linear (one-way) relationships.  Given the renewal rates above, a correlation is not the ideal tool for describing this relationship, as it will give you a rubric that says “for every decrease in decile number, you will have an X% increase in renewal rate.”  We can see looking at the data that this isn’t the case — moving from 2 to 1 has a huge impact, whereas moving from 9 to 6 has an impact that is muddled at base.

Another example is the study on ask strings we covered here.  When looking at one-time donors, asking for less performed better than asking for the same as their previous gift.  Asking for more also performed better than asking for the same as the person’s previous gift.  However, if you were to run a correlation, it would say there is no relationship because the data isn’t in a line (graphically, you are looking at a U shape).  We know there is a relationship, but not one that can be described with a correlation.

You’ll also note that they only work between two variables.  Most systems of your acquaintance will be more complex than this and we’ll have to use other tools for this.  That said…

Correlations are a good way of creating a simple heuristic.  SERPIQ just did an analysis of content length and search engine listings that I learned about here. They found a nice positive correlation:

wordcountcontent

Hat tip to CoSchedule for the graph and SERPIQ for the data.

As you read further in the blog post, you’ll see that there is messiness there.  It’s highly dependent on the nature of the search terms, the data are not necessarily linear, and non-written media like video are complicating.  However, the data lend themselves to a simple rule of “longer form content generally works a little bit better for search engine listings” or, in a lot of cases, “your ban on longer-form content may not be a good idea.”  While these come with some hemming and hawing, being able to have simplicity in your rules is a good thing, making them easier to follow.

But refer back to the original point: correlation isn’t causation.  Even in the example above of word count being related to search engine listings, more work is required to find out what type of causal relationship, if any, there is between word count and search engine listings.

Hope this helps.  Tomorrow, we’ll talk about regression analysis, which will take you all the way back to your childhood to look for memories that will…

Um, actually, it will be about statistical regression analysis.  Never mind.

Correlations in direct marketing II: The Wrath of Khan Academy

Correlations in direct marketing: an intro

This week, I’d like to take a look at some of the formulae and algorithms that run our lives in direct marketing.

Al Gore giving his global warming talk in Mountain View, CA on 7

The algorithm was invented by Al Gore and named after his dance moves
(hence, Al Gore rhythm).
Here is a video of him dancing.

Before you run in fear, my goal is not to make you capable of running these algorithms — some of the ones we’ll talk about this week I haven’t yet run myself.  Rather, my goal is to create some understanding of what these do so you can interpret results and see implications.

And the first big one is correlation.

But, Nick, you say, you covered correlation in your Bloggie-Award-winning post Semi-Advanced Direct Marketing Excel Statistics; will this really be new?

My answers:

  1. Thank you for reading the back catalog so intensely.
  2. The Bloggies, like 99.999998% of the Internet, do not actually know this blog exists.
  3. In that post, I talked about how to run them, but not what they mean.  I’m looking to rectify this.

So, correlation simply means how much two variables move together (aka covariance).  This is measured by a correlation coefficient that statisticians call r.  R ranges from 1 (perfect positive relationship) to 0 (no relationship whatsoever) to -1 (perfect negative relationship).  The farther from zero the number is, the stronger the relationship.

A classic example of this is height and weight.  Let’s say that everyone on earth weighed 3 pounds for every inch of height.  So if you were 70 inches tall (5’10”), you would weigh 210 pounds; at 60 inches (5’0”), you would weigh 180 pounds.  This is a perfect correlation with no variation, for an R of 1.

Clearly, this isn’t the case.  If you are like me, after the holidays, your weight has increased but you haven’t grown in height.  Babies aren’t born 9 inches long and 27 pounds (thank goodness).  And the Miss America pageant/scholarship competition isn’t nearly this body positive.  So we know this isn’t a correlation of one.

That said, we also know that the relationship isn’t zero.  If you hear that someone is a jockey at 5’2”, you naturally assume they do not moonlight as a 300-pound sumo wrestler on the weekend.  Likewise, you can assume that most NBA players have a weight that would be unhealthy on me or (I’m making assumptions based on the base rate heights of the word with this statement) you.

So the correlation between height and weight is probably closer to .6.

There’s a neat trick with r: you can square it and get something called coefficient of determination.  This number will get you the amount of one variable that is predicted by the other.  So, in our height-weight example, 36 percent (.6 squared) of height is explained by its relationship with weight and vice versa.  It also means that there’s 64 percent of other in there (which we’ll get to tomorrow when we talk about regression analysis.

You can get some of this intuitively without the math.  Here’s a direct marketing example I was working on a couple of weeks ago.  An external modeling vendor had separated our file into deciles in terms of what they felt was their likelihood of renewing their support.  Here’s what the data looked like by decile:

Decile Retention rates over six months
1 50%
2 40%
3 35%
4 32%
5 30%
6 25%
7 27%
8 24%
9 28%
10 21%

No, this isn’t the real data; it’s been anonymized to protect the innocent and guilty.

You need only look at this data to see that there is a negative correlation between decline and retention rate — the higher (worse) decile, the lower the retention rate.  It also illustrates that it’s not a perfect linear relationship — clearly, this model does a better job of skimming the cream off the top of the file than predicting among the bottom half of the file.  

Tomorrow, we’ll talk about the implications of these correlations for direct marketing.

Correlations in direct marketing: an intro

7 direct marketing charts your boss must see today

Yay!  It’s my first clickbait-y headline!

I preach, or at least will be preaching, the gospel of testing everything.  There have been times that it has been a rough year for the mail schedule, but then we get to a part of the year we tested into last year, so I know that the projections are going to be pretty good and our tweaks are going to work.  It is those times that there are but one set of footprints on the beach, for it is the testing that is carrying me. So I eventually had to test out one of these headlines — my apologies in advance if it works.

The truth is that there are no such charts that run across all organizations.  There are general topics that you need to cover with your boss – file health in gross numbers, file health by lifecycle segment, in-year performance, long-term projections, how your investments are performing.

But what you need to do is tell your story.  You need to analyze all of the data, make your call, and present all of the evidence that makes your case and all of the evidence that opposes it.

This sounds simple, but how often do you see presentations that feature slides that educate to no end – slides that repeat and repeat but come to no point.  Also, they are repetitive and recapitulate what has already been said.

On Monday, I brought up the war between art and science marketers.  The secret to how the artists win is:

Stories with pictures

Yes, really. The human brain craves narrative and will put a story to about anything that comes in front of it.  It also retains images better than anything else.  There’s a semi-famous experiment where they gave noted oenologists (French for “wine snobs”)* white wine with red food coloring. The experts used all of the words that one uses to describe red wine, without ever noting that it was actually a white wine. When confronted with this, the so-called wine experts all resigned their posts and took up the study of nonprofit direct marketing to do something useful with their lives.

winesmeller

OK, I’m lying about that last part.

My point is that we privilege our sight over all other sense – in essence, we are all visual learners.  When we see words on a slide, our brain, which is still trying to figure out why it isn’t hunting mastodons, sees the letters and has to pause to think “what’s with all of those defective pictures.”

So, as I’ve been writing a lot of defective pictures and I promised the seven direct marketing charts your boss must see today, let’s discuss a story that you would want to tell and how you would present it.

1.

Graph1

The idiot I replaced the idiot that I replaced cut acquisition mailings in 2012.

2.

Graph2

It spiked net revenue for a time, enough for him to find another job.

3.

Graph3

But that has really screwed us out of multiyear donors coming into 2015.  You can see the big drop in multiyear donors in 2014 because they weren’t acquired two years earlier.

4.

graph4

And multiyear donors are our best donors.  You’ll also note that our lapsed reacquired donors have greater yearly value than newly acquired with about the same retention rate.  Thus, my first strategic priority to focus more in reacquiring lapsed donors.  Not as good as the multiyear donor that idiot made sure we didn’t have coming into the file this year, but pretty darn good.

5.

graph5

Lapsed donors have actually decreased as a portion of our average acquisition mailing…

6.

graph6

…yet they have been cheaper to acquire.  In summary, they are better donors than newly acquired donors and they are cheaper to acquire, yet we’ve been reaching out to them less.  Thus, we have an opportunity here.

7.

graph7

Because of this insight and because my salary significantly lags the national average for a direct marketing manager of $67,675, I believe I deserve a raise.  I’m now open for questions.

I swear that in many presentations, this would be over 30 slides and over an hour long.  I’ve actually given some of those presentations and if someone was in one of those and is still reading this, I apologize.

Some key notes from this:

  • Note the use of color to draw attention to the areas that are important to you. Other data are there to provide background, but if you are giving the presentation, it is incumbent upon you to guide the mind of your audience.  In fact, if you are giving the presentation, you may wish to present the chart/graph/data normally, then have the important colors jump out (or the less important ones fade away), arrows fly in, and text appear.
  • As mentioned, this is a different structure of presentation that would normally occur. Normally, there would be a section on file health, then one on revenues, one on strategic priorities, and so on.  However, when you structure like that, the slide that makes the point of why you are doing the strategic priorities you are doing may be 50 slides early.  You can say, “remember the slide that said X?” but regardless of what the answer is, the answer is really is no.  You are smarter than that.  You are going to use data to support narrative, not mangle your story to fit an artificial order of data.
  • There is one point per image (with the exception of #4, which had a nice segue opportunity) and no bullet points. Bullet points help in Web reading (hence my using them here), but they actually hurt memory and retention in presentations.

With this persuasive power, though, comes persuasive responsibility.  Not in the sense that your PowerPoint will soon have you enough dedicated followers to form your own doomsday cult, although if that opportunity arises, please take the high road.

What I mean is as you get better and better at distilling your point, there will be a temptation to take shortcuts and to tilt the presentation so it favors your viewpoint beyond what is warranted.  Part of this is ethical, to be sure – don’t be that type of person – but a larger part is that no one person is smarter than everyone else summed together.  Even readers of this blog.  If you omit or gloss over important data points, you aren’t allowing honest disagreement and insights among your audience that can come to greater understanding.  By creating an army of ill-informed meat puppets, you are going it alone trusting on your knowledge and skill alone to get you through.  There will be a day and that day may be soon when the insight you will need will be in someone else’s head.

You do have to prioritize for your audience.  You may have noticed some other points you would have covered in these graphs – retention in this program is falling and cost to acquire donors is increasing.  This person chose to focus on lapsed but didn’t hide the other metrics, which is sound policy.

So we will cap off the week tomorrow with tricks that other people use to shade their data.  I debated doing this section because it could be equally used as a guide to shade your data.  But you are trusting me and I’m trusting you.  Knowledge is not good or bad in and of itself, but let’s all try to use it for good.

* Oenology is actually from the Greek words for “wine” and “study of,” but that isn’t funny…

7 direct marketing charts your boss must see today

Metric pairing for fun and nonprofit

There is no one metric you should measure anywhere in direct marketing.  Like Newton would have said if he were a direct marketer, each metric must have an equal and opposite metric.

The problem with any single metric is, as either or both of Karl Pearson and Peter Drucker said, that which is measured improves.  My corollary to this is what isn’t measured is sacrificed to improve that which is measured.

So what metric dyads should you be measuring?

Response rate versus average gift: This one is the obvious one.  If you measured only response rate, someone could reduce the ask string and lower the heck out of the amount of the ask to spike response rates.  If you focused solely on gift amount, you could cherry pick leads and change the ask string to favor higher gifts.  Put together, however, they give a good picture of the response to a piece.

Net income versus file health: Anyone could hit their net income goals by not acquiring new donors.  More on this another time, but suffice it to say this is a bad idea, possibly one of the worst ideas.  Likewise, an acquisition binge can increase the size of a donor base very quickly but spend money even more quickly.

Cost per donor acquired versus number of new donors acquired: If you had to design a campaign to bring in one person, you could do it very inexpensively – probably at a profit.  Each successive donor because harder and harder to acquire, requiring more and more money.  That’s why if only cost is analyzed, few donors will be acquired, and vice versa.

Web traffic (sessions or unique visitors) versus bounce rate: Measuring only one could mean many very poor visitors or only a few very good visitors.  Neither extreme is desirable.

Click-through rate versus conversion rate: If only your best prospective donors click on something, most of them will convert.  More click-throughs mean a lower conversion rate, but no one should be punished for effectiveness in generating interest.

List growth versus engagement rates: Similar to Web site metrics, you want neither too many low-quality constituents nor too few high-quality ones. Picture what would happen if someone put 1,000, 10,000, or 100,000 fake email addresses on your email list.  Your list would grow, but you would have significantly lower open rates and click-throughs.  Same with mail – as your list increases, response rate will go down – you need to find if the response rate is down disproportionately.

Gross and net revenue: Probably don’t even need to mention this one, but if you measure gross revenue only, you will definitely get it.  You will not, however, like what happens to your costs.

Net revenue versus ROI: Usually, these two move in concert.  However, sometimes, additional marginal costs will decrease ROI, but increase net revenue per piece as in the example yesterday.  In fact, most examples of this are more dramatic, involving high-dollar treatments where high-touch treatments increase costs significantly, but increase net revenue per piece more.  A smart direct marketing will make judgment calls balancing these two metrics.

Net revenue versus testing: This is clearly a cheat, as testing is not really a metric, but a way to increase your revenue is not to take risks, mailing all control packages, using the same phone script you always have, and running the same matching gift campaign online that you did last year.  Testing carries costs, but they are costs that must be born to preserve innovation and prevent fatigue in the long run.

These are just a few of the metrics to look out for, but the most important part of this is that any one single metrics can be gamed (whether intentional or un-).  One of the easiest ways to avoid this is thinking in the extreme – how would you logically spike the metrics.  From there, you can find the opposing metric to make sure you maintain a balanced program.

Metric pairing for fun and nonprofit

Semi-advanced direct marketing Excel statistics

In addition to not being a database, Excel is also not a statistics package.  If you are going to do anything advanced, I highly recommend R.  The programming language, not the John-Cleese-played Q replacement in The World is not Enough and Die Another Day.

Cleese

Cheer up! Yes, it’s a lesser part in lesser Bond movies.
You, John Cleese, are still an international treasure.

Anyway, stats in Excel.  We’ll start with correlations, as they can give us some insight into blog and Facebook traffic and interactions.

Wait, you argue – Facebook is not direct marketing.  First, yes, you are correct.  Second, no, you are wrong; there is a way that you can use Facebook as a direct marketer.  I’ll talk about this more when I do a whole social media week (don’t worry, folks, I promise to spend time deflating the hype and hopefully producing things you can print out and get to board members and say “See? Let’s put money in places where it will make money!”).  Third, because Facebook has limited value (but not zero) as a direct marketing vehicle, you can test things on there to see how they resonate with your audience.  Granted, your Facebook audience and direct mail audience will probably be fairly dissimilar, but your online audience is probably similar to your Facebook audience.  And what you are looking for is what makes compelling online content for you.  So this is a way to make Facebook your testing ground before you put it on to a real platform (i.e., your Web site, your emails).

I’m going to demonstrate this on this blog’s stats because I have the data available. However, I’ve also done this with Facebook posts very successful.  The prep work you will have to do for either blog posts or Facebook posts is to record your outcomes (view, likes, shares, and/or comments), to code the subject matter of each post, and to put in any other variables like day of the week that may be relevant.  Here’s my version of this:

beginningofregression

I have my blog posts on the left and the various factors on the right (and there are more tags, but I need to cut it off at some point to display it.  Yes, I have a blog post that has zero views. If you would like to break the seal on a blog post about why we do segmentation, it’s here; I’m sure it would appreciate a visit.

I then went through this and deleted tags that only applied to a single post. Then I ran a correlation for each individual variable to page views. The correlation function looks like

=CORREL($B2:$B20,I2:I20)

And is expressed from 1 to -1.

correlations

Here’s what that looks like.  Let’s look at the days of the week first, because there appears to be an effect here — Monday content has been king, with a strong correlation to page views. It will be interesting going forward to see in the long term whether that is the nature of the content (I’ve been trying to put introduction content on Mondays and get progressively more involved throughout the week) or that people are more interested in reading blog posts on Mondays.  Nevertheless, I can probably do a better job of setting up the rest of the week as must-see content, since Tuesday, Wednesday, and Thursday are all a bit negative.

Images tend to correlate well to views as well, probably because they show better in social media.  I’d been noticing this from just a glance as well, so you will probably be seeing more images here in the weeks to come. They will also be less boring images, since some of the lower performers were images of equations and Excel sheets. It is not coincidence that the tallest Python led this blog post.

And it looks like cultivation and multichannel efforts are winning while conversion, lifetime value, and personalization are not as strong, with negative correlations to page views. I won’t be acting on this immediately, but keeping an eye on it. And I do have a multichannel week planned in the near future, so we’ll be able to test whether that’s an artifact in the data.

However, you might notice that the reason cultivation is ranked so highly is that it is in the two top performing posts, which are Monday posts. Is it the topic or the day that made those strong?  For this, you need regression.

Normally, you wouldn’t do this after only 20 blog posts.  We are not going to be able to draw any statistically significant conclusions, but I do want to show you how it’s done.

  1. Go to Data > Data Analysis > Regression
  2. Select the range from your outcome variable as your Y range and the range of the independent variables you want to test in the X range like so:
    regression panel
  3. Hit OK. You’ll get something that looks like this.  In my case, it’s a really, really bad regression:
    a bad regression

Yuck. The things you would normally be looking for are:

  • in R-squared, you are looking for as close to 1 as possible.  One would mean your model is totally predictive. Zero means it predicts nothing at all.
  • In P-value per variable, you are looking for less than .05.  That would show if there is a statistically significant relationship between any of the variables and your output. In this case, there isn’t and we can pretty much throw out the whole thing.
  • If there is a relationship, you want to look at the coefficient for two things:
    • Is it positive or negative? Positive is good things; negative is bad things.
    • How big is the relationship? In this one, if these were significant, it would be a bad idea to post about personalization again, as posting about it reducing views by 7.  But it isn’t significant, so I’m not yet worried.

Hope this helps you with the stats side of Excel.  Tune in next week, when we look at some of the things that Excel is actually good at.

Semi-advanced direct marketing Excel statistics