Yesterday, I articulated why you would want to test more than one variable at a time. The trick is testing multiple things and still getting results you can act on. Here’s how to do this without taxing your math skills and without exceeding the capacity of your list. We’ll talk about mail testing as online testing is far easiest, given instant results, lower testing costs, the ability to see what specific calls to action (i.e., links) are getting the most action, and the ability to sequence tests (e.g., testing a subject line earlier in the day, then rolling out with a winning subject line and two test copy versions in the afternoon).
Let’s look at a simple testing matrix, where you would want to test two variables (let’s say envelope and copy) with two options each. It’s easy enough to do the multiplication and say that, for a “test one variable at a time” approach, you would want four test cells; let’s call them A1, A2, B1, and B2.
Copy 1 | Copy 2 | |
Envelope A | ||
Envelope B |
By testing every option, you get the best testing results. Likewise, you can see intuitively that no two testing cells will give you full results: if you tested A1 and B2, for example, if B2 did better than A1, you wouldn’t know whether it was envelope B or copy 2 (or both) that caused the lift. And any other combination would not test both variables.
But three testing cells will get you the data you need. Let’s say you did not test cell B2 and get the following results:
Copy 1 | Copy 2 | |
Envelope A | $.50 | $.75 |
Envelope B | $1.00 | ? |
Here, you can see envelope B did 50% better on a gross per piece basis than envelope A when it had copy 1. You can also see that copy 2 did 100% better than copy 1 when in envelope A. From this, you can deduce that B2, which wasn’t tested, would have the best results at about $1.50 gross per piece. This is because that is both 50% better than B1 (the result of the envelope lift) and 100% better than A2 (the result of the copy lift).
And if you ever do have results this clean, please let me know about them.
This is just a 2 x 2 matrix, but it illustrates a point that will apply to greater multivariate testing – any time you have significant results in three out of four of a 2×2 matrix, you can deduce the fourth. I call this the testing L because of the shape it makes on your testing matrix. All you need to do is to iterate a traditional A/B test and you can get a robust testing strategy.Let’s say you want to test three envelopes and three sets of copy. Here you would only need five intersections instead of the nine you might expect:
Copy 1 | Copy 2 | Copy 3 | |
Envelope A | X | X | |
Envelope B | X | X | |
Envelope C | X |
You can see how the L technique would help you determine the remaining blank cells one at a time.
What if you wanted to add another dimension? In addition to your three envelopes and three sets of copy, you’d like to also test three ask strings, for example. Here, you’d only need to test both ask strings as a replacement for the existing ask string. So to the above testing matrix, where you are testing:
A1$ (where $ is the existing ask string)
A2$
B2$
B3$
C3$
Adding, say, A2+ and B3% would give you an L over this new dimension (remembering that you were able to solve for the entire testing matrix in two dimensions). If you were to try to test all 27 possible combinations with a quantity of 25,000 each, that would be a prohibitive 675,000 mailings. This makes it a much more manageable 175,000 people.
This assumes, of course, that you want 25,000 people in each test panel. There are some more advanced statistical techniques that would allow you to mail a larger number of intersecting tests. These make it so that each cell is no longer independently projectable, but can still give you aggregate results. I personally like having some experience at the cellular level, especially if there is a strong control already in place (in which case I will weigh the testing cells toward the control package’s attributes), but this is a possibility.
But even 25,000 is too large for some. In fact, one of the more frequent questions I hear is how to do testing when you have a small list. That’s the topic for tomorrow; then, Friday, I’ll talk about cross-platform and cadence testingcross-platform and cadence testing – testing that goes beyond the “here’s a communication; here’s another communication” type testing.