# How to test more than one variable at a time

Yesterday, I articulated why you would want to test more than one variable at a time.  The trick is testing multiple things and still getting results you can act on.  Here’s how to do this without taxing your math skills and without exceeding the capacity of your list.  We’ll talk about mail testing as online testing is far easiest, given instant results, lower testing costs, the ability to see what specific calls to action (i.e., links) are getting the most action, and the ability to sequence tests (e.g., testing a subject line earlier in the day, then rolling out with a winning subject line and two test copy versions in the afternoon).

Let’s look at a simple testing matrix, where you would want to test two variables (let’s say envelope and copy) with two options each.  It’s easy enough to do the multiplication and say that, for a “test one variable at a time” approach, you would want four test cells; let’s call them A1, A2, B1, and B2.

 Copy 1 Copy 2 Envelope A Envelope B

By testing every option, you get the best testing results.  Likewise, you can see intuitively that no two testing cells will give you full results: if you tested A1 and B2, for example, if B2 did better than A1, you wouldn’t know whether it was envelope B or copy 2 (or both) that caused the lift.  And any other combination would not test both variables.

But three testing cells will get you the data you need.  Let’s say you did not test cell B2 and get the following results:

 Copy 1 Copy 2 Envelope A \$.50 \$.75 Envelope B \$1.00 ?

Here, you can see envelope B did 50% better on a gross per piece basis than envelope A when it had copy 1.  You can also see that copy 2 did 100% better than copy 1 when in envelope A.  From this, you can deduce that B2, which wasn’t tested, would have the best results at about \$1.50 gross per piece.  This is because that is both 50% better than B1 (the result of the envelope lift) and 100% better than A2 (the result of the copy lift).

And if you ever do have results this clean, please let me know about them.

This is just a 2 x 2 matrix, but it illustrates a point that will apply to greater multivariate testing – any time you have significant results in three out of four of a 2×2 matrix, you can deduce the fourth.  I call this the testing L because of the shape it makes on your testing matrix.  All you need to do is to iterate a traditional A/B test and you can get a robust testing strategy.Let’s say you want to test three envelopes and three sets of copy.  Here you would only need five intersections instead of the nine you might expect:

 Copy 1 Copy 2 Copy 3 Envelope A X X Envelope B X X Envelope C X

You can see how the L technique would help you determine the remaining blank cells one at a time.

What if you wanted to add another dimension?  In addition to your three envelopes and three sets of copy, you’d like to also test three ask strings, for example.  Here, you’d only need to test both ask strings as a replacement for the existing ask string.  So to the above testing matrix, where you are testing:

A1\$ (where \$ is the existing ask string)
A2\$
B2\$
B3\$
C3\$

Adding, say, A2+ and B3% would give you an L over this new dimension (remembering that you were able to solve for the entire testing matrix in two dimensions).  If you were to try to test all 27 possible combinations with a quantity of 25,000 each, that would be a prohibitive 675,000 mailings.  This makes it a much more manageable 175,000 people.

This assumes, of course, that you want 25,000 people in each test panel.  There are some more advanced statistical techniques that would allow you to mail a larger number of intersecting tests.  These make it so that each cell is no longer independently projectable, but can still give you aggregate results.  I personally like having some experience at the cellular level, especially if there is a strong control already in place (in which case I will weigh the testing cells toward the control package’s attributes), but this is a possibility.

But even 25,000 is too large for some.  In fact, one of the more frequent questions I hear is how to do testing when you have a small list.  That’s the topic for tomorrow; then, Friday, I’ll talk about cross-platform and cadence testingcross-platform and cadence testing – testing that goes beyond the “here’s a communication; here’s another communication” type testing.

# “Test one variable at a time” is a lie

It’s not an intentional lie and its heart is in the right place, but it’s wrong nonetheless.

The reason people will tell you to test only one variable at a time is that you want to be able to isolate why what happened happened.  So, for example, if you changed the teaser on an envelope and sent it to an equivalent audience at the same time with the same contents in the envelope, if there was an increased response rate, that is a winning test because of the envelope.

This is a fine way to test if there’s only one thing you want to learn at a time.  You can refine your program this way, getting better and better.  This is the direct marketing equivalent of kaizen – the practice of continual improvement popularized in manufacturing, but now applies to much strategic thinking.

But there are some significant problems with this:

• You can’t test synergy between variables. Let’s say you have a subject line you’d like to test.  However, it may work better with a different version of your email; after all, you wrote the original subject line for this email – the new one may not fit as well.  Testing one thing at a time may not allow us to test the most coherent versions of each of your offers.
• It can lead to small ball, where you only test things at the most granular level. In his book Fundraising When Money is Tight, Mal Warwick talks about testing teaser copy 25 different times with almost as many clients.  Of the tests, 21 – 84% — showed no difference (and these were at quantities that would have shown a difference had there been one).  This is an OK learning if you can learn other things from the package as well, but if that’s all you learn, you’ve investing in testing without any return more than four out of five times.
• It can’t make significant leaps forward. Let’s say you have a control piece in decline.  You know it needs to be replaced because of its response rate.  Or maybe, in a more positive outlook, you’ve accomplished the goal you were striving toward.  Either way, the way to get rid of this piece isn’t to test the envelope one year and the response device the next year – you have to test more than one variable at once.

All in all, this violates a rule you should have for yourself – to learn as much as possible whenever possible.  Think of it as if you were trying to reach the highest elevation on Earth.  If you had the rule of “go up from where you until you can’t go up any more,” you will reach a peak higher than you are currently, but by no means the highest point possible.  Similarly, if you had the rule “climb to the highest point you can see, even if it means going down a bit,” you will be doing better and getting higher than you were, but this iterative process will not lead to you having to don an oxygen tank anytime soon.

So it is with testing.  Testing one variable at a time will get you closer and closer to your local maximum, but not the global maximum.

But the basis of the argument for variable isolation is not untrue.  You still need to be able to figure out what works and what doesn’t.  The trick is sussing out what did what in your test.  That’s what we’ll cover tomorrow: how to layer multiple single tests to get results you can act on.

# The basics of direct marketing testing

It’s testing week here at Direct to Donor and we’re going to start with some simple principles, as is our Monday pattern.  This is the first of many testing weeks, given the importance of the topic.

Unless it doesn’t work, in which case we might never speak of this again.

This is a great segue to the first rule of testing: that which works, works.  That which doesn’t work, doesn’t work.

I know, it sounds like I’m stating the obvious, but there’s an oft-forgotten conclusion from that, which is that if you aren’t willing either to roll out with it or to scrap it, you shouldn’t test it.

Two different much-beloved CEOs have expressed to me over the years that the single best predictor of whether a mail piece would succeed or not is whether they liked the piece.  If they liked it, it wouldn’t work; if they didn’t, it would.

Part of why they were and are much beloved is that it doesn’t matter whether they liked it or not – it was all about whether a tactic worked successfully.  The mantra that goes with this is:

Or, as I would put it, it doesn’t matter if the source if the quote is a damn dirty Commie if it’s a good quote.

That said, there are some things that are beyond what your organization will accept.  For example, an environmental organization shouldn’t use paper in their mailings from non-recycled old growth forests.

If that’s the case, then, don’t test it.  A goal of testing is to find something that will be able to use in some larger capacity in the future.  If that isn’t possible, it eliminates the need for the test.

That said, the list of sacred cows should be as small as possible.  You’ll hear me say test everything and I mean it – other than things that are untenable, experimentation is the best and truest way of learning.  Donor surveys are great, but they show what people think they would do and how they would react versus what they do do; it’s important not to confuse words and deeds.

There are things that are far more important to test than others.  That’s why it’s important to start with a hypothesis and to test the fundamentals first.  I love tests that nibble around the edges as much as the next person – if you can show me how to improve a piece with a different teaser and pick up 1.4% additional in response rate, I’m game.  But the most important tests that you run will be about fundamentals – who are the people that I should be talking to and what offer am I giving them.  Having a hypothesis will help with this and the broader it goes is better.  A hypothesis like “I believe our lapsed donors will respond best to the means and message that brought them into the organization initially” works well because it lends itself to testing on various platforms with a variety of tactics and a success could mean large things for the organization.  One like “I believe a larger envelope will work better” is restricted to one piece at one time and thus has limited ripple effect.

Finally, please learn from my mistakes and don’t test something by rolling out with it.  I did this in my first year of nonprofit marketing.  We had three underperforming mail pieces and I decided to replace them with new packages I had dreamt up.  Thankfully, I was lucky – the success of one of my pieces paid for the abject failure of the other two.  If I hadn’t been lucky, I might be blogging on effective panhandling tips right now.  You don’t want to put your nonprofit in a position where hitting goals and achieving mission is based on your hunch.

This may be a bit of conventional wisdom.  However, tomorrow, there is one piece of conventional testing wisdom that needs to be taken out back and shot for the benefit of your testing program.