This week’s trivia…

Ever wonder why it’s called a spreadsheet? According to Wikipedia, the word "spreadsheet" referred to the format of paper-based accounting ledgers, where two long sheets opened, or "spread" across the center binding, much like a newspaper, with categories in columns along the top and invoices on one side and payment amounts on the other. If you’d like to know more, check out this blog.

Now on to this week’s tip…

Remove Duplicates

Have you ever been presented with two or more lists that appear to be almost identical? Maybe you only want to see which items are shared across both lists and you need to remove the items that only appear on one list, or, conversely, you only want to see which items appear on only one list (flashback to Venn diagrams, anyone?) There must be an easier way to accomplish this than manually scanning through the two lists, right? Right! And it’s called Remove Duplicates.

This button lives in the Data tab of Excel 2010. Want to see how it works?

I, like the characters in Best in Show, love watching dog shows. Recently, it got me thinking… I know only so many dog breeds are "officially" recognized by the American Kennel Club, so what poor dogs aren’t? I found the official AKC list and also a list of "all" breeds on Wikipedia and copied these into separate worksheets in Excel. Now, I can play with this information in a variety of ways. Let’s say I simply wanted to combine the lists and remove the duplicates (forgetting that I have a list of "all" breeds and have no other way of figuring this out for the sake of this example).

  1. Copy and paste the AKC list into the Wikipedia list. (Note: You must select only the cells you’re copying for this exercise to work. Do not select the entire Column A, for when you try to paste it below the other list in the Wikipedia sheet, you’ll receive an error)


  2. Next, click on the cell below the end of the list in the Wikipedia sheet, and paste the AKC list.


  3. Now select Column A, and click Remove Duplicates.


  4. Press OK in the window that pops up. In this example we only have 1 column, but notice that you could choose multiple columns from which to remove duplicates (a very useful feature if you’re dealing with a mailing list or something similar with multiple columns of data).


  5. Then you may want to sort this alphabetically. Click on the AàZ button in the Data tab.


  6. Now that it’s in alpha order, you’ll probably want to do a quick manual scan to ensure that all duplicates were removed. This feature will only remove "true" duplicates – i.e. items where the text is exactly the same. So if there is a typo or slightly different spelling, it will sneak by. For instance, I found these:

    Technically not "duplicates" by Excel standards, but certainly duplicates for our purposes, so I’ll just delete one of the rows.

Conversely… maybe I wanted to see which ones are only on one list. To do this, I would make one of the lists look different by formatting, perhaps by highlighting or bolding the text. Then, when I copy and paste and remove duplicates, I can easily see which item is from which list. In the below photo, the bolded words were on the AKC list but not on the Wikipedia list.