“A challenging problem in data management is that the same entity may be represented in multiple ways throughout the dataset. For instance, customer “Andy Hill” might also be present as “Mr. Andrew Hill” or “Hill, Andrew R.”. Variations can result from merging independent data sources, spelling mistakes, inconsistent naming conventions and abbreviations, or records with additional/missing information.
Fuzzy Lookup technology, developed by Microsoft Research, allows you to quickly identify data records which are textually similar. You can identify fuzzy duplicates within a single table or perform a fuzzy join between two different tables. The default configuration works well for a wide variety of data, but the matching may also be customized for specific domains.”
Download the tool here: http://www.microsoft.com/download/en/details.aspx?id=15011
Here is a very simple example. I created two tables as shown:
| Column1 | | Column1 |
| Upen | | Upend |
| Upendra | | Upendra |
| Upendra Rao | | Rao, Upendra |
| Upen Rao | | Upendra.Rao |
| | Upendra|Rao |
| | Rao, Upen |
| | Upen, Rao |
Here is the join I defined:

Here is the output:
| Output | | |
| Column1 | Column1 | Similarity |
| Upen | Upend | 0.9435 |
| Upendra | Upendra | 1.0000 |
| Upendra Rao | Rao, Upendra | 1.0000 |
| Upen Rao | Rao, Upen | 1.0000 |
Here is another example:
| | | | | Output | | |
| Column1 | | Column1 | | Column1 | Column1 | Similarity |
| 1 | | 10 | | 1 | 1 | 1.0000 |
| 2 | | 3 | | 2 | 2 | 1.0000 |
| 3 | | 4 | | 3 | 3 | 1.0000 |
| 4 | | 5 | | 4 | 4 | 1.0000 |
| 5 | | 1 | | 5 | 5 | 1.0000 |
| 6 | | 2 | | 6 | | |
| 7 | | 7 | | 7 | 7 | 1.0000 |
| 8 | | 9 | | 8 | | |
| 9 | | 0 | | 9 | 9 | 1.0000 |
| 10 | | 4 | | 10 | 10 | 1.0000 |
For people who work with Data a lot, I think this is a good tool to have in their toolbox.