How to test more than one variable at a time

Yesterday, I articulated why you would want to test more than one variable at a time.  The trick is testing multiple things and still getting results you can act on.  Here’s how to do this without taxing your math skills and without exceeding the capacity of your list.  We’ll talk about mail testing as online testing is far easiest, given instant results, lower testing costs, the ability to see what specific calls to action (i.e., links) are getting the most action, and the ability to sequence tests (e.g., testing a subject line earlier in the day, then rolling out with a winning subject line and two test copy versions in the afternoon).

Let’s look at a simple testing matrix, where you would want to test two variables (let’s say envelope and copy) with two options each.  It’s easy enough to do the multiplication and say that, for a “test one variable at a time” approach, you would want four test cells; let’s call them A1, A2, B1, and B2.

Copy 1 Copy 2
Envelope A
Envelope B

By testing every option, you get the best testing results.  Likewise, you can see intuitively that no two testing cells will give you full results: if you tested A1 and B2, for example, if B2 did better than A1, you wouldn’t know whether it was envelope B or copy 2 (or both) that caused the lift.  And any other combination would not test both variables.

But three testing cells will get you the data you need.  Let’s say you did not test cell B2 and get the following results:

Copy 1 Copy 2
Envelope A  $.50  $.75
Envelope B  $1.00  ?

Here, you can see envelope B did 50% better on a gross per piece basis than envelope A when it had copy 1.  You can also see that copy 2 did 100% better than copy 1 when in envelope A.  From this, you can deduce that B2, which wasn’t tested, would have the best results at about $1.50 gross per piece.  This is because that is both 50% better than B1 (the result of the envelope lift) and 100% better than A2 (the result of the copy lift).

And if you ever do have results this clean, please let me know about them.

This is just a 2 x 2 matrix, but it illustrates a point that will apply to greater multivariate testing – any time you have significant results in three out of four of a 2×2 matrix, you can deduce the fourth.  I call this the testing L because of the shape it makes on your testing matrix.  All you need to do is to iterate a traditional A/B test and you can get a robust testing strategy.Let’s say you want to test three envelopes and three sets of copy.  Here you would only need five intersections instead of the nine you might expect:

Copy 1 Copy 2 Copy 3
Envelope A  X  X
Envelope B  X  X
Envelope C  X

You can see how the L technique would help you determine the remaining blank cells one at a time.

What if you wanted to add another dimension?  In addition to your three envelopes and three sets of copy, you’d like to also test three ask strings, for example.  Here, you’d only need to test both ask strings as a replacement for the existing ask string.  So to the above testing matrix, where you are testing:

A1$ (where $ is the existing ask string)
A2$
B2$
B3$
C3$

Adding, say, A2+ and B3% would give you an L over this new dimension (remembering that you were able to solve for the entire testing matrix in two dimensions).  If you were to try to test all 27 possible combinations with a quantity of 25,000 each, that would be a prohibitive 675,000 mailings.  This makes it a much more manageable 175,000 people.

This assumes, of course, that you want 25,000 people in each test panel.  There are some more advanced statistical techniques that would allow you to mail a larger number of intersecting tests.  These make it so that each cell is no longer independently projectable, but can still give you aggregate results.  I personally like having some experience at the cellular level, especially if there is a strong control already in place (in which case I will weigh the testing cells toward the control package’s attributes), but this is a possibility.

But even 25,000 is too large for some.  In fact, one of the more frequent questions I hear is how to do testing when you have a small list.  That’s the topic for tomorrow; then, Friday, I’ll talk about cross-platform and cadence testingcross-platform and cadence testing – testing that goes beyond the “here’s a communication; here’s another communication” type testing.

How to test more than one variable at a time

“Test one variable at a time” is a lie

It’s not an intentional lie and its heart is in the right place, but it’s wrong nonetheless.

The reason people will tell you to test only one variable at a time is that you want to be able to isolate why what happened happened.  So, for example, if you changed the teaser on an envelope and sent it to an equivalent audience at the same time with the same contents in the envelope, if there was an increased response rate, that is a winning test because of the envelope.

This is a fine way to test if there’s only one thing you want to learn at a time.  You can refine your program this way, getting better and better.  This is the direct marketing equivalent of kaizen – the practice of continual improvement popularized in manufacturing, but now applies to much strategic thinking.

But there are some significant problems with this:

  • You can’t test synergy between variables. Let’s say you have a subject line you’d like to test.  However, it may work better with a different version of your email; after all, you wrote the original subject line for this email – the new one may not fit as well.  Testing one thing at a time may not allow us to test the most coherent versions of each of your offers.
  • It can lead to small ball, where you only test things at the most granular level. In his book Fundraising When Money is Tight, Mal Warwick talks about testing teaser copy 25 different times with almost as many clients.  Of the tests, 21 – 84% — showed no difference (and these were at quantities that would have shown a difference had there been one).  This is an OK learning if you can learn other things from the package as well, but if that’s all you learn, you’ve investing in testing without any return more than four out of five times.
  • It can’t make significant leaps forward. Let’s say you have a control piece in decline.  You know it needs to be replaced because of its response rate.  Or maybe, in a more positive outlook, you’ve accomplished the goal you were striving toward.  Either way, the way to get rid of this piece isn’t to test the envelope one year and the response device the next year – you have to test more than one variable at once.

All in all, this violates a rule you should have for yourself – to learn as much as possible whenever possible.  Think of it as if you were trying to reach the highest elevation on Earth.  If you had the rule of “go up from where you until you can’t go up any more,” you will reach a peak higher than you are currently, but by no means the highest point possible.  Similarly, if you had the rule “climb to the highest point you can see, even if it means going down a bit,” you will be doing better and getting higher than you were, but this iterative process will not lead to you having to don an oxygen tank anytime soon.

So it is with testing.  Testing one variable at a time will get you closer and closer to your local maximum, but not the global maximum.

But the basis of the argument for variable isolation is not untrue.  You still need to be able to figure out what works and what doesn’t.  The trick is sussing out what did what in your test.  That’s what we’ll cover tomorrow: how to layer multiple single tests to get results you can act on.

“Test one variable at a time” is a lie