Why trapezoid zero? Because square one is too dignified.
If you are like most nonprofit direct marketers, you have one true donor database that filters in all of the data you need from various applications, gives you a view into what every constituent needs and wants, and allows you to segment and communicate with them instantly.
Then you wake up from your dream. You perform requisite amounts of personal grooming, cover your dirty shameful bits with clothing sufficiently uncomfortable to convey gravitas, go to work, and stare at the latest list you have pulled in Excel.
Excel is not a bad program per se and has gotten even better since Clippy was taken out for some cannoli in downtown Redmond and never heard from again. The trick is that it is not intended to do what we do with it for address lists and the like. For that, Microsoft created Access, which is actually a bad program per se.
I promise to talk about using Excel for its intended purpose, but first, here’s how to manage your lists in Excel, starting with basic deduplication.
Deduplication could effectively be called Sisyphean except some is pushing the blinkered rock down as you are trying to push up. You are never, never done deduplicating, especially given that the most common records for deduplication are your best supporters (given that they will be engaging the organization from all angles).
So let’s assume you have a very simple list. You want to remove duplicates from an email list. Specifically, you are looking for exact duplicates – you don’t mind emailing the same email twice as long as it is for different names and you don’t mind emailing the same person twice as long as it is at two separate addresses. Here’s your list:
As you can see, there are a few Blofelds in there to get rid of. Your list will be longer and less menacing.
- Select all by hitting Ctrl + A
- Go to Data > Sort
- Sort by all three fields in any order
- Now, you are going to compare each row to the one below it. Go to the top row and the first empty column and put in =A2=A3. This creates a TRUE/FALSE statement where anything that comes back TRUE is identical.
- Drag the lower right corner of the cell you just created down so it covers all rows and over to the right so it covers as many columns as you have. It will look something like this:
- Highlight the TRUE/FALSE columns, hit Ctrl + C to copy the columns, then right-click on the same columns to bring up the special menu. Select Paste Values. What you are doing here is replacing your TRUE/FALSE statements with the results of those statements. Because you are going to be sorting by these, you don’t want Excel recalculating.
- Select all by hitting Ctrl + A
- Go to Data > Sort
- Sort by the three TRUE/FALSE columns. The output will look something like this.
As you can see, there are two records in rows 14 and 15 that were exact matches for other records in the database, since they have three columns of TRUE. Those are your duplicates and you can delete them. One might say you have a license to delete them.
That works fine for simple data. But how often does that happen? Tomorrow will be some advanced tips for deduplication in Excel.