Testing beyond individual communications

So far, the testing that I’ve discussed is how to optimize a communication or overall messaging.  The next step is trying to answer fundamental questions about the nature of your program – things like how many times to communicate and through what means.

There is a pretty good chance that you are not communicating enough to many of your constituents.

But wait, you say.  We send out a mail piece a month, have multiple telemarketing cycles per year, and have both a monthly e-newsletter and semi-frequent emails on other topics.  Our board members and staff who are on our seed lists are consistently on me, you say, that we are communicating too much.  And we get donors who complain that they are getting a mail piece before their last one was acknowledged.

However, remember in the discussion of segmentation that more donors are saying their nonprofits are undercommunicating, not over. That means that the average number profit needs to be communicating more than it is.

And the concern that you are annoying people with asking for money comes from an oft-quoted and concerning inferiority complex from the nonprofit.  We have to believe that we are good enough to merit a gift and making an appropriate ask to be effective.  We want to give our donors an opportunity to be a part of something powerful and transformative.  Remember that if we do our jobs well, donating to our organization is a positive experience.

So how would you test whether you are communicating often enough/too often?  The first step is to figure out where you are as a control with a cross-medium communications calendar.  This is easy said than done, but it’s a necessary first step.  This need not be perfect; as you are going to want to have some communications that are timely and focused on current events, you may have to have some placeholders in place that simply indicates “we’re going to email something here.”

Then split test your file and test, so that part of your file gets X communications and another gets X plus or minus 1.  I’d suggest plus.  Then measure the total success of the communications.

I once helped lead a test where we took mail pieces out of our schedule during membership recruitment.  We would send a piece or two, then wait to see if those donors would donate before sending to them against to make sure that we were addressing them properly as either a renewed donor or as someone who has not yet renewed.  Each individual piece in the resting membership series had a significantly better ROI and better net than the more consistent appeal series.

Yet the appeal series brought in more money for the organization and the mission overall.  I would argue, as I did at the time, this is the actual important metric.  If you want to look at metrics like ROI or response rate, your best opportunity is to send one letter to your single best donor – you’ll get a 100% response rate and ROI percentages in the tens of thousands or more.

But for real life, the goal is more money for more mission.  So overall net is the metric of choice.

The easiest campaigns to add to are the ones that already have a multistage component.  Let’s say you have a matching gift campaign that goes mail piece 1, email 1, mail piece 2, email 2 (with two weeks between each).  A way of testing up would be to look at doing mail piece 1, email 1 + mail piece 1.5, mail piece 2 + email 1.5, email 2 (so there’s still two weeks between each set of communications, but they double up in the middle).  That would be adding a mail piece and an email and if you test both of these with net as your goal, you will have a better framework for the campaign in the following year as well as for additional testing throughout the year.

With email only campaigns, there’s another way of checking whether you are over-emailing your file – looking to see if your total opens and clicks fall.  There is a point at which open rates and click rates will begin to fall; however, you shouldn’t worry too much until adding another email not only lowers your open and click rates but lowers your total number of opens and clicks (similar to a focus on total net, rather than net per piece).

This tipping point in email is probably well past where you think it is.  Hubspot did a study of emails per month on both open and click-through rates.  The sweet spot with the highest open and click rates was between 15 and 30 email per month.

That’s right – opens and clicks went up until you got in the range of daily emails.  Things went downhill after 30 days.  So if you are sending more than daily emails (on any day but December 31 or the last day of a matching opportunity), you might be emailing too much – so take that as a cautionary tale for the .0001% of you who are doing this.  For the other 99.9999%, hopefully this will give support for the business case for testing up on your emails.

There are three tricks to cross-platform testing:

  1. There is a whole science of attribution testing. If you have the ability to look at this literature and your data systems will support this, go for it.  However, most organizations of my experience don’t have all of their data in the same place initially, making this exceedingly hard.  Thus, this sort of testing up/down for cadence should look at sources of revenue by audience test panel rather than through what medium the donation is made.  You may be surprised how much adding a mail piece increases your online revenue or adding a telemarketing cycle boosts the mail piece.
  2. Unlike with strictly piece-based attributes, I’d argue you have to test every cell here because there are interactions among the means of communication. It may be that mail + mail is better than mail and mail + phone is better than mail, but that when you have mail + phone + mail, you have diminishing returns that don’t compensate for doing both mail pieces.
  3. You will have to be vigilant about the creation of your testing cells. ft_15-07-23_notonline_200pxAs much as you would like to call everyone who has a phone number or email everyone who has an email address, and use those who don’t have a phone number or email on file as a control audience, those are different types of donors.  Pew has a great summary of the non-Internet users of the US at right.  Even if you looked just at the age and income variables, you can see how this would make your control audience look very different from your non-control.In reverse, 66% of 25-29 year olds live in houses where there is no landline, compared with 14% of 65+ year olds, according to the National Center for Health Statistics.

    So, if you think of the average person for whom you have a phone number, but not an email address, that person looks very different from the one where you have an email address, but not a phone number.  Thus, you have to either control for all demographic variables in your assessment (hard) or split test people by means of communication that you have available. (marginally easier)

Thanks for reading and be sure to let me know at nick@directtodonor.com what future topics you’d like to see.

Testing beyond individual communications

Testing for smaller lists

One of my favorite non-Far Side single panel cartoons is

miracle

 

This is often what it feels like to be a small nonprofit or small division of a nonprofit.  You know exactly what you would do if you were big.  But you aren’t (yet).  And absent that miracle in the middle, you aren’t going to be there soon.  It feels like a Catch-22 – you aren’t big enough to test, but you aren’t going to enough to test unless you test.

A lot of people have this problem.  One of my favorite conversion sites, unbounce.com, recommends that you have 1000 conversions per month to do A/B testing.  That takes a large nonprofit to accomplish.  Like the Oakland As in Moneyball (both book and movie are recommended), you have fewer resources, so you are going to have to be smarter than your competition other worthy causes.  Here are some tips on how:

Learn what’s important first: Before you do your first test with online traffic, look at your analytics reports (do you have Google Analytics on your site?).  Where are people bouncing from your site?  Where are they dropping out of the donation process?  What forms aren’t converting?  You may be able to do more with one-tenth the traffic or donor list if you are testing the things that will matter to you.

Steal from other people first: There are some things that are almost immutably true.  Requiring more information on a form means lower conversion rates.  Having a unique color for your donate button that stands out from the other colors on your Web site will increase clicks.  Using a person’s name, unless it’s in a subject line, will likely increase response rate.  I commend the site whichtestwon.com to you.  I’ve had the privilege of presenting at their live events and the type of information that comes of them in terms of what others have tested first will save you time and money on things you can do, rather than test.

Go big: I’ve talked about things like envelopes and teasers and things to test.  If you don’t have a large donor or traffic base, ignore that.  You want to be testing audience and offer – the things that can be global and game changing.

Test across time: If you are testing an audience, an offer, or a theme, that doesn’t have to be accomplished in one piece or email.  Rather, you can test it over a year if you want.  Let’s say you want 25,000 people in each testing group, but only have 3,000, you can get a similar feel for the response to large-scale changes over nine pieces, rather than testing it all in one.

Require less proof: Chances are you are used to doing more with less already.  If you are Microsoft, you can run your test until you get 99.9% certain you are correct.  You should be willing to be less certain.  Some nonprofits choose 80% certainty as their threshold.  Even 60% can give you directional results.  Bottom line, this is a restriction you may be willing to relax.

Test cheaply:  Testing direct mail and telemarketing is expensive.  You want to do your learnings on your site with Google Analytics and either Google’s optimization tool or Optimizely, in email, or on social media.  I would go so far as to say that even larger nonprofits don’t want to test an envelope teaser that they haven’t already tested as a subject line to see if it grabs attention.  Survey tools like SurveyMonkey or Zoomerang can also help you pre-test your messaging either with your core audience (free) or with a panel of people who fit your demographic target (cheap, if you can keep your number of questions down).

Get testing subjects cheaply: I know it sounds like I’m in Google’s pocket, but they have many nonprofit solutions at the right price for smaller nonprofits – free.  One of these is Google Grants, which allows you to use their AdWords solution with in-kind donated advertising.  Get this now, if you don’t have it.  We’ll do a whole week on AdWords at some point, but in the meantime, if you have a form you are testing and you don’t have enough traffic, pause all of your campaigns except the ones directed to that form.  You will get your results a lot more quickly.

Test by year: It’s not an ideal solution, but if you test one thing one year and then another tactic the next year at the same time, you can get a gut feeling as to what is more effective.

Avoid word salad: Consider the time on West Wing (which I remember better than many real-life presidencies) when the Majority Leader who was running for president was asked why he wanted to be president:

 

“The reason I would run, were I to run, is I have a great belief in this country as a country and in this people as a people that go into making this country a nation with the greatest natural resources and population of people, educated people … with the greatest technology of any people of any country in the world, along with the greatest, not the greatest, but very serious problems confronting our people, and I want to be President in order to focus on these problems in a way that uses the energy of our people to move us forward, basically.”

Good writing converts.  Good writing mandates active verbs and few adverbs (my personal crutch).

 

“It’s an adverb, Sam. It’s a lazy tool of a weak mind.”
— Kevin Spacey in Outbreak

Good writing ignores the mission statement, discards stats, eschews your jargon, and touches you in a very personal place.  OK, perhaps not that active a verb.  I’m talking about your heart, you sicko.

Don’t test good copy versus bad copy.  Come up with your best before you test, lest you learn what you already should know.

Conspire.  You have coalition partners and people who are in similar positions around you.  Get out into the big blue room and see what they are doing.  And be generous with your own tests – deposits in the karma bank rarely fail to pay interest.

Finally, embrace the advantage of being small.  As a smaller nonprofit, you are going to have to be smarter about testing than bigger ones.  But you will be able to swing for the fences while they are still trying to get their different versions of teaser copy through the Official Teaser Copy Review Subcommittee.  You can be bold and find your voice honed to what works, rather than what your boss’s boss’s boss’s brother-in-law said you should try out over Thanksgiving dinner.

Tomorrow, we’ll go into some testing modalities that allow you to test things beyond a single communication or theme.

Testing for smaller lists

How to test more than one variable at a time

Yesterday, I articulated why you would want to test more than one variable at a time.  The trick is testing multiple things and still getting results you can act on.  Here’s how to do this without taxing your math skills and without exceeding the capacity of your list.  We’ll talk about mail testing as online testing is far easiest, given instant results, lower testing costs, the ability to see what specific calls to action (i.e., links) are getting the most action, and the ability to sequence tests (e.g., testing a subject line earlier in the day, then rolling out with a winning subject line and two test copy versions in the afternoon).

Let’s look at a simple testing matrix, where you would want to test two variables (let’s say envelope and copy) with two options each.  It’s easy enough to do the multiplication and say that, for a “test one variable at a time” approach, you would want four test cells; let’s call them A1, A2, B1, and B2.

Copy 1 Copy 2
Envelope A
Envelope B

By testing every option, you get the best testing results.  Likewise, you can see intuitively that no two testing cells will give you full results: if you tested A1 and B2, for example, if B2 did better than A1, you wouldn’t know whether it was envelope B or copy 2 (or both) that caused the lift.  And any other combination would not test both variables.

But three testing cells will get you the data you need.  Let’s say you did not test cell B2 and get the following results:

Copy 1 Copy 2
Envelope A  $.50  $.75
Envelope B  $1.00  ?

Here, you can see envelope B did 50% better on a gross per piece basis than envelope A when it had copy 1.  You can also see that copy 2 did 100% better than copy 1 when in envelope A.  From this, you can deduce that B2, which wasn’t tested, would have the best results at about $1.50 gross per piece.  This is because that is both 50% better than B1 (the result of the envelope lift) and 100% better than A2 (the result of the copy lift).

And if you ever do have results this clean, please let me know about them.

This is just a 2 x 2 matrix, but it illustrates a point that will apply to greater multivariate testing – any time you have significant results in three out of four of a 2×2 matrix, you can deduce the fourth.  I call this the testing L because of the shape it makes on your testing matrix.  All you need to do is to iterate a traditional A/B test and you can get a robust testing strategy.Let’s say you want to test three envelopes and three sets of copy.  Here you would only need five intersections instead of the nine you might expect:

Copy 1 Copy 2 Copy 3
Envelope A  X  X
Envelope B  X  X
Envelope C  X

You can see how the L technique would help you determine the remaining blank cells one at a time.

What if you wanted to add another dimension?  In addition to your three envelopes and three sets of copy, you’d like to also test three ask strings, for example.  Here, you’d only need to test both ask strings as a replacement for the existing ask string.  So to the above testing matrix, where you are testing:

A1$ (where $ is the existing ask string)
A2$
B2$
B3$
C3$

Adding, say, A2+ and B3% would give you an L over this new dimension (remembering that you were able to solve for the entire testing matrix in two dimensions).  If you were to try to test all 27 possible combinations with a quantity of 25,000 each, that would be a prohibitive 675,000 mailings.  This makes it a much more manageable 175,000 people.

This assumes, of course, that you want 25,000 people in each test panel.  There are some more advanced statistical techniques that would allow you to mail a larger number of intersecting tests.  These make it so that each cell is no longer independently projectable, but can still give you aggregate results.  I personally like having some experience at the cellular level, especially if there is a strong control already in place (in which case I will weigh the testing cells toward the control package’s attributes), but this is a possibility.

But even 25,000 is too large for some.  In fact, one of the more frequent questions I hear is how to do testing when you have a small list.  That’s the topic for tomorrow; then, Friday, I’ll talk about cross-platform and cadence testingcross-platform and cadence testing – testing that goes beyond the “here’s a communication; here’s another communication” type testing.

How to test more than one variable at a time

“Test one variable at a time” is a lie

It’s not an intentional lie and its heart is in the right place, but it’s wrong nonetheless.

The reason people will tell you to test only one variable at a time is that you want to be able to isolate why what happened happened.  So, for example, if you changed the teaser on an envelope and sent it to an equivalent audience at the same time with the same contents in the envelope, if there was an increased response rate, that is a winning test because of the envelope.

This is a fine way to test if there’s only one thing you want to learn at a time.  You can refine your program this way, getting better and better.  This is the direct marketing equivalent of kaizen – the practice of continual improvement popularized in manufacturing, but now applies to much strategic thinking.

But there are some significant problems with this:

  • You can’t test synergy between variables. Let’s say you have a subject line you’d like to test.  However, it may work better with a different version of your email; after all, you wrote the original subject line for this email – the new one may not fit as well.  Testing one thing at a time may not allow us to test the most coherent versions of each of your offers.
  • It can lead to small ball, where you only test things at the most granular level. In his book Fundraising When Money is Tight, Mal Warwick talks about testing teaser copy 25 different times with almost as many clients.  Of the tests, 21 – 84% — showed no difference (and these were at quantities that would have shown a difference had there been one).  This is an OK learning if you can learn other things from the package as well, but if that’s all you learn, you’ve investing in testing without any return more than four out of five times.
  • It can’t make significant leaps forward. Let’s say you have a control piece in decline.  You know it needs to be replaced because of its response rate.  Or maybe, in a more positive outlook, you’ve accomplished the goal you were striving toward.  Either way, the way to get rid of this piece isn’t to test the envelope one year and the response device the next year – you have to test more than one variable at once.

All in all, this violates a rule you should have for yourself – to learn as much as possible whenever possible.  Think of it as if you were trying to reach the highest elevation on Earth.  If you had the rule of “go up from where you until you can’t go up any more,” you will reach a peak higher than you are currently, but by no means the highest point possible.  Similarly, if you had the rule “climb to the highest point you can see, even if it means going down a bit,” you will be doing better and getting higher than you were, but this iterative process will not lead to you having to don an oxygen tank anytime soon.

So it is with testing.  Testing one variable at a time will get you closer and closer to your local maximum, but not the global maximum.

But the basis of the argument for variable isolation is not untrue.  You still need to be able to figure out what works and what doesn’t.  The trick is sussing out what did what in your test.  That’s what we’ll cover tomorrow: how to layer multiple single tests to get results you can act on.

“Test one variable at a time” is a lie

The basics of direct marketing testing

It’s testing week here at Direct to Donor and we’re going to start with some simple principles, as is our Monday pattern.  This is the first of many testing weeks, given the importance of the topic.

Unless it doesn’t work, in which case we might never speak of this again.

This is a great segue to the first rule of testing: that which works, works.  That which doesn’t work, doesn’t work.

I know, it sounds like I’m stating the obvious, but there’s an oft-forgotten conclusion from that, which is that if you aren’t willing either to roll out with it or to scrap it, you shouldn’t test it.

Two different much-beloved CEOs have expressed to me over the years that the single best predictor of whether a mail piece would succeed or not is whether they liked the piece.  If they liked it, it wouldn’t work; if they didn’t, it would.

Part of why they were and are much beloved is that it doesn’t matter whether they liked it or not – it was all about whether a tactic worked successfully.  The mantra that goes with this is:

Or, as I would put it, it doesn’t matter if the source if the quote is a damn dirty Commie if it’s a good quote.

That said, there are some things that are beyond what your organization will accept.  For example, an environmental organization shouldn’t use paper in their mailings from non-recycled old growth forests.

If that’s the case, then, don’t test it.  A goal of testing is to find something that will be able to use in some larger capacity in the future.  If that isn’t possible, it eliminates the need for the test.

That said, the list of sacred cows should be as small as possible.  You’ll hear me say test everything and I mean it – other than things that are untenable, experimentation is the best and truest way of learning.  Donor surveys are great, but they show what people think they would do and how they would react versus what they do do; it’s important not to confuse words and deeds.

There are things that are far more important to test than others.  That’s why it’s important to start with a hypothesis and to test the fundamentals first.  I love tests that nibble around the edges as much as the next person – if you can show me how to improve a piece with a different teaser and pick up 1.4% additional in response rate, I’m game.  But the most important tests that you run will be about fundamentals – who are the people that I should be talking to and what offer am I giving them.  Having a hypothesis will help with this and the broader it goes is better.  A hypothesis like “I believe our lapsed donors will respond best to the means and message that brought them into the organization initially” works well because it lends itself to testing on various platforms with a variety of tactics and a success could mean large things for the organization.  One like “I believe a larger envelope will work better” is restricted to one piece at one time and thus has limited ripple effect.

Finally, please learn from my mistakes and don’t test something by rolling out with it.  I did this in my first year of nonprofit marketing.  We had three underperforming mail pieces and I decided to replace them with new packages I had dreamt up.  Thankfully, I was lucky – the success of one of my pieces paid for the abject failure of the other two.  If I hadn’t been lucky, I might be blogging on effective panhandling tips right now.  You don’t want to put your nonprofit in a position where hitting goals and achieving mission is based on your hunch.

This may be a bit of conventional wisdom.  However, tomorrow, there is one piece of conventional testing wisdom that needs to be taken out back and shot for the benefit of your testing program.

The basics of direct marketing testing