testing matrix

It sounds like a non-sensical question. And it highlights another major difference between offline and online direct marketing — trackability.

Those who live in the digital marketing space are used to being able to track what happens with their emails and campaigns down to the user level. They complain when tracking pixels don’t work quite the way they are supposed to on every device and aim for ever better attribution models to understand where their investments are going.

XX Home Maytag B.jpg Those in the offline space are used to sending something out and waiting for results. And waiting. And waiting.

Further, they are used to looking at packages as a whole. They get one result: did someone donate (OK, two: and how much)? Because of this, it’s tempting to think of mail testing as the thumbs up or thumbs down as in the Roman coliseum.

But you can find out things like your offline open rates and tweak them to your heart’s content. Take a simple 2X2 testing matrix.

While you won’t be able to tell what your actual open rate was, you can to content yourself with relative open rates. With online, you have an intuitive feel for whether a 20% open rate is good or bad compared with the emails around it (and whether they generally are opened at 10% or 30%). This same relative weighing works well in mail. If 20% more people donating with envelope A than with envelope B all other things being equal, then you have a 20% better open rate with envelope A.

Similarly, if letter C does better than letter D by 30% with the other parts of the mail piece staying constant, you have a 30% better “click-through” rate.

And you probably already know the trick that you only have to test three of the four quadrants here. If envelope A beats B when they both use letter D and letter C beats D when they both use envelope B, chances are pretty good that the winning test is envelope A with letter C, even though that wasn’t a tested combination.

But what you may not know is the right algorithm can do this writ large with a wide variety of variables. Ask your vendor(s) if they can run permutations that will allow you to figure out what happens when you five envelopes, four offers, three letter permutations, six different ask strings, and so on. They should be able to create a variablized stew that helps you run a number of tests at once.

The other thing that I’d recommend is not just taking a page from the online playbook, but using online tools to test your efforts first. Don’t know if your teaser copy will work well? Try it as an email subject line or a CPC ad headline first. While the audiences are a bit different online and offline, catchy is generally catchy and boring is boring. Working out details like this online can save your testing for things that can actually help you get to know your donor better, leading to more valuable communications and donors.

(Or, better yet, scrap your teaser copy and test a plain white envelope — it may have the best open rate of all.)

Yesterday, I articulated why you would want to test more than one variable at a time. The trick is testing multiple things and still getting results you can act on. Here’s how to do this without taxing your math skills and without exceeding the capacity of your list. We’ll talk about mail testing as online testing is far easiest, given instant results, lower testing costs, the ability to see what specific calls to action (i.e., links) are getting the most action, and the ability to sequence tests (e.g., testing a subject line earlier in the day, then rolling out with a winning subject line and two test copy versions in the afternoon).

Let’s look at a simple testing matrix, where you would want to test two variables (let’s say envelope and copy) with two options each. It’s easy enough to do the multiplication and say that, for a “test one variable at a time” approach, you would want four test cells; let’s call them A1, A2, B1, and B2.

	Copy 1	Copy 2
Envelope A
Envelope B

By testing every option, you get the best testing results. Likewise, you can see intuitively that no two testing cells will give you full results: if you tested A1 and B2, for example, if B2 did better than A1, you wouldn’t know whether it was envelope B or copy 2 (or both) that caused the lift. And any other combination would not test both variables.

But three testing cells will get you the data you need. Let’s say you did not test cell B2 and get the following results:

	Copy 1	Copy 2
Envelope A	$.50	$.75
Envelope B	$1.00	?

Here, you can see envelope B did 50% better on a gross per piece basis than envelope A when it had copy 1. You can also see that copy 2 did 100% better than copy 1 when in envelope A. From this, you can deduce that B2, which wasn’t tested, would have the best results at about $1.50 gross per piece. This is because that is both 50% better than B1 (the result of the envelope lift) and 100% better than A2 (the result of the copy lift).

And if you ever do have results this clean, please let me know about them.

This is just a 2 x 2 matrix, but it illustrates a point that will apply to greater multivariate testing – any time you have significant results in three out of four of a 2×2 matrix, you can deduce the fourth. I call this the testing L because of the shape it makes on your testing matrix. All you need to do is to iterate a traditional A/B test and you can get a robust testing strategy.Let’s say you want to test three envelopes and three sets of copy. Here you would only need five intersections instead of the nine you might expect:

	Copy 1	Copy 2	Copy 3
Envelope A	X	X
Envelope B		X	X
Envelope C			X

You can see how the L technique would help you determine the remaining blank cells one at a time.

What if you wanted to add another dimension? In addition to your three envelopes and three sets of copy, you’d like to also test three ask strings, for example. Here, you’d only need to test both ask strings as a replacement for the existing ask string. So to the above testing matrix, where you are testing:

A1$ (where $ is the existing ask string)
A2$
B2$
B3$
C3$

Adding, say, A2+ and B3% would give you an L over this new dimension (remembering that you were able to solve for the entire testing matrix in two dimensions). If you were to try to test all 27 possible combinations with a quantity of 25,000 each, that would be a prohibitive 675,000 mailings. This makes it a much more manageable 175,000 people.

This assumes, of course, that you want 25,000 people in each test panel. There are some more advanced statistical techniques that would allow you to mail a larger number of intersecting tests. These make it so that each cell is no longer independently projectable, but can still give you aggregate results. I personally like having some experience at the cellular level, especially if there is a strong control already in place (in which case I will weigh the testing cells toward the control package’s attributes), but this is a possibility.

But even 25,000 is too large for some. In fact, one of the more frequent questions I hear is how to do testing when you have a small list. That’s the topic for tomorrow; then, Friday, I’ll talk about cross-platform and cadence testing cross-platform and cadence testing – testing that goes beyond the “here’s a communication; here’s another communication” type testing.

Direct to Donor

Direct marketing tips for the modern nonprofit

Menu

What are the open rates and click-throughs of your mail pieces?

How to test more than one variable at a time