Microsoft Fuzzy Lookup Add In Mac

  1. On the Ribbon click on Fuzzy Lookup command, this will open the pane/window. Select the left and right tables to compare, if data of same table to be compared then select same as left, right. The lookup will automatically join the same name columns; you can delete that join and add of your choice.
  2. Proposed as answer by Ed Price - MSFT Microsoft employee Saturday, November 3, 2012 1:54 PM Wednesday, October 19, 2011 8:49 PM Answerer.
  3. The Fuzzy Lookup Add-In for Excel performs fuzzy matching of textual data in Excel.

When I run the fuzzy lookup tool it does its best to find a match with at least an X% similarity (default 50%). However it's all over the place. Sometimes it's spot on, other times it's not even close.

Excel Fuzzy Lookup Add-In is used to match similar, but not exactly matching data. This function is often used instead of VLOOKUP, when we want to compare two columns which have very similar data, but not exactly the same. As an output, Fuzzy Lookup returns a table of matched similar data in the chosen column. In daily data manipulation, there is a common need to compare two same data sets where one of them comes from some external source and can be misspelled or typed incorrectly. In this tutorial, you will see how to install Fuzzy Lookup Add-In, prepare data and create a Fuzzy Lookup, which can be very useful in data consolidation and save a lot of time.

  1. Installation of Fuzzy Lookup in Excel
  2. Explanation of the function
  3. Preparation of data for Fuzzy Lookup
  4. Create Fuzzy Lookup

Fuzzy Lookup is not a standard Excel function, therefore you can’t find it in your standard tabs and buttons. In order to enable this function, Microsoft created an Add-In which can be downloaded from the following link:

After you have downloaded the installation file, you need to open it and install following instructions. Once you have installed the Add-In, next time when you open an Excel it will automatically import Add-In. As a result, you will get a new tab at the end of a Ribbon called “Fuzzy Lookup” and a button with the same name:

As mentioned in the intro of the article, Fuzzy Lookup is used when we want to match two sets of data (two tables), but we don’t have exactly the same values in matching fields. For example, we want to match two tables based on values in column “Name” and in a first table we have value “Michael Jackson”, while in a second table we have similar, but misspelled name “Michal Jackson”. In this case, if we use standard VLOOKUP function, it will not match this two values because it’s looking for the exact match. Using of Fuzzy Lookup solves this problem, by matching columns based on their similarity.

This is very often used when we get a table in Excel imported from some other source, or just manually copied and want to match it with another table with the same data sorted out. In most cases, a first table will have many typing mistakes and misspelled words and would first need to be cleaned manually to be able to Lookup it with our prepared table. This can be very time consuming and that’s the point where Fuzzy Lookup saves precious time. Please note that the matching columns must be formatted as text.

Before being able to do a Fuzzy Lookup, we need to format our data into tables. To do this select cells range, click on Insert tab and choose Table. After we have created two tables, they need to be named in order to be used in Fuzzy Lookup function. This is done by selecting the whole table and entering a name into a Name Box:

Now we have data ready for Fuzzy Lookup: As you can see on previous pictures, we have two tables: the first one (named Sales_Actual) contains data on actual sales per salesperson (Columns “Sales Person” and “Sales Actual”) and the second one contains data of target sales per salesperson (Columns “Sales person” and “Sales Target”).

Once we have formatted our data in Spreadsheet, we can start creating Fuzzy Lookup with two tables created. We can imagine that the first table is imported from some other data source and presents a report of sales per sales person, while the second table contains arranged table created in our Excel file which contains targeted sales for every person. In our example, we want to match these two tables based on column “Sales Person” and create a new table which will have all data aggregated (“Sales Person”, “Sales Actual”, “Sales Target”). We can see that the first table has some misspelled names and we want to match them with correct names in the second table based on their similarity.

Let’s now create an example of Fuzzy Lookup and explain how it works. First, we need to select a cell, which will be the first cell of a newly created table, then go to Fuzzy Lookup tab and click on Fuzzy Lookup button. We will get the following window opened on the right side:

To create our table, we have several steps to do:

In the first part of Fuzzy Lookup window, we need to choose two tables which will be matched. In our case left table will be “Sales_Actual” and the right table will be “Sales_Target”. After that, we need to choose the columns which we want to match and click on the button between them. In our example, we want to match tables based on similarity of columns “Sales Person”, so we will choose this column both in Left Columns and Right Columns. Once we do that, the table below will have one new row with these matching columns. In Output columns, we need to check columns that we want to be in a newly generated table: “Sales_Person”, “Sales_Actual” and “Sales_Target”.

There is also an option to choose field “FuzzyLookup.Similarity” which gives the percentage of similarity between two columns. In the end, we can choose Similarity Threshold (0-100%) which tells the function what level of similarity we want to match. After everything is set up, we can click go and get a table based on entered parameters:

As you can see in the picture, the new table is created from the first two chosen. It consists of 3 columns that we choose and column similarity which calculates the similarity of “Sales_Person” columns in the two tables in percentage. For example, “John Bryant” from the first table is matched with “John T. Bryant” from the second table as their similarity is equal to 92%. Also, names “Rachel Williams” and “Harry White” do not have similar values in the second table (based on similarity percentage – 50%), so no value is filled in “Sales Target” column for these two entries.

If we wanted to get these two values also in the matched table, we would have put smaller Similarity Threshold in Fuzzy Lookup window. In the following example, we put it at 0.2 (20%) which means that we want to match all names that have the similarity of 20% or greater. Here is the result:

The output now is the same table, but with included names “Rachel Williams” and “Harry White” because their similarity to “Jason J. Williams” and “Harrison L. White” is greater than 20%.

Are you still looking for help with the VLOOKUP function? View our comprehensive round-up of VLOOKUP function tutorials here.

By: Bhavesh Patel | Updated: 2018-08-28 | Comments (1) | Related: More >Integration Services Development


Problem

As a part of the data cleaning process we can use the data flow transformationsFuzzy Lookup and Fuzzy Grouping. Both can be used to standardizeand correct data during the load process. A developer may confuse these twooptions due to similarities betweenthe two transformations, so I will demonstrate the differences between these twocomponents.

Solution

The Fuzzy Lookup performs standardization of data by correcting and providing missingvalues. While the Fuzzy Grouping transformation performs data cleaning tasks byidentifying rows of data that are likely to be duplicated and selecting a canonicalrow of data to use in standardizing the data. We will demonstrate both of thesetransformations.

Setup SQL Server Test Environment

First, we will setup a test database Fuzzy_lookup with master tableCustomerData and insert some test data.

Fuzzy Lookup Transformation in SQL Server Integration Services

TheFuzzy Lookup transformation is used for fuzzy matching (not exactbut close matching). The lookup transformationuses an equi-join to locate matching records in the reference tables. To be morespecific, it returns records with at least one matching record and also returnsrecords with no matching records. However, the transformation requires at leastone column match to be configured for fuzzy matching. If you want to use onlyexact matching, use the Lookup transformation instead.

There are three features for customizing thislookup.

Maximum number of matches to output per lookup

You can set this threshold based on a lookup per column. It means if you setthe maximum number of matches to a value greater than 1, the output of the transformationmay include more than one row per lookup and some of the rows may be duplicates.

Token delimiters

This provides a default set of delimiters. It's used to tokenizethe data, but you can add custom token delimiters based on requirements to screeningyour data.

Similarity thresholds

This similarity threshold provides a decimal value between 0 and 1. The value1 means an exact match between the values of fuzzy matching criteria for desiredinputs. The confidence score 0 to 1, indicates the confidence in the match. Ifno usable match is found, similarity and confidence scores of 0 are assigned tothe row and the output columns copied from the reference table will contain nullvalues.

It provides two outputs:

  • _Similarity, a column that describes the similarity betweenvalues in the input and reference columns.
  • _Confidence, a column that describes the quality of thematch.

Fuzzy Lookup Example in SQL Server Integration Services

I have CSV file “customerData” shown below. It has two columns name andcustomerPoints. We also have the Master table CustomerData that we createdabove. We will use the CSV file and match up against the CustomerDatatable for our matching.

This is what my finished package will look like:

Flat File Source

As mentioned, I am trying to demonstrate the Fuzzy Lookup transformationwith respect to my use case. First, I will use a data flow task to performthis action which I named “GettingCustData”.

As part of the data flow task, I will use a flat file data source in orderto get the customer file “customerData”.

Microsoft Fuzzy Lookup Add In Mac Pro

Now in the data flow “GettingCustData” I have configured the above file as aflat file source. As an output of this, I have twocolumns name and customerPoints.

Now, I am going to add the Fuzzy Lookup transformation and link to Flat FileSource.

Fuzzy Lookup: Reference Table

As properties of the reference table, I haveused the sample master table “CustomerData” column nameto link with the flat file column name.

Fuzzy Lookup: Columns

Here, we can configure the column mapping needed for the fuzzy matching.

Fuzzy Lookup: Advanced

I need an output once per lookup, so I used 1 for Maximumnumber of matches to output per lookup. I keep similarity_threshold0 for visibility of observing the output of the fuzzy lookup withrespect to similarities. Also, I kept the default token delimiters.

Add a Data Viewer to See Results

After applying the fuzzy transformation, I added a Data Viewer between theFuzzy Lookup and Conditional Split so we can see what the data looks like. Belowis the configuration for this.

I executed the package and the output for the data viewer is shownbelow.

We can see the output columns for the Similarity and Confidence. Also,we can see for the 'test' there is no match. So, to handle these differentmatches, I will add a conditional split as shown below.

Conditional Split

The transformation lets you route your data flow to different outputs, in thisdemonstration I have used theconditional split transformation in order to get different outputs based onthe criteria defined within the transformation editor.

I have divided resultsinto three parts based on the criteria below.

Finished Package

Here is what the finished package looks like. We can see 5 rows wereperfect match, 7 rows were similar match and 3 rows a likely match.

Fuzzy Grouping Transformation

TheFuzzy Grouping transformation allows a single input from the data flow and itperforms a comparison with itself to try to identify duplicate values from the data.This transformation does not require any reference table to correct the data. Itwill use the grouping technique to check for typing mistakes and correct them.

To configure the transformation, you must select the input columns to use when identifyingduplicates and you must select the type of match; fuzzy orexact for each column.

How to controlling fuzzy grouping?

Microsoft Fuzzy Lookup Add In Mac Os

In the advanced parameters, the fuzzy grouping has the same kind control parameters:token delimiter and similarity threshold. Here I will mentionthe additional output columns.

  • _key_in, a column that uniquely identifies each row.
  • _key_out, a column that identifies a group of duplicaterows. The _key_out column has the value of the _key_in columnin the canonical data row. Rows with the same value in _key_out are partof the same group. The _key_out value for a group corresponds to the valueof _key_in in the canonical data row.
  • _score, a value between 0 and 1 that indicates the similarityof the input row to the canonical row.

Fuzzy Grouping Example in SQL Server Integration Services

I have a CSV file which has a village column. I have many village duplicatesandI need to refine this for unique villages.

Use Fuzzy Grouping in order to perform my scenario

Here is my CSV file “VillageData”.

On Data Flow task I configured the above as a flat filesource.

Pro

Now, I am going to add a Fuzzy Group transformation and link to Flat File Source.

Fuzzy Grouping: Connection Manager

Here you can specify an OLE DB connection, but I am not using a database forthis example.

Fuzzy Grouping: Columns

I have selected the column village. There are two options available inthe matchtype; Exact and Fuzzy. Rows are considered duplicates if theyare similar with a Fuzzy match type. If you specifyExact, only rows that contain identical values are considered duplicates.

Fuzzy Grouping: Advanced

As mentioned above,the fuzzy grouping advanced tab we have; token delimiters,similarity threshold and other output columns. I set the similarity threshold to 0.35for this example.

Add a Data Viewer to See Results

Microsoft Fuzzy Lookup Add In Mac Shortcut

Microsoft fuzzy lookup add in mac keyboard shortcut

I added a data viewer between the Fuzzy Grouping and Derived Column and mypackage looks this.

I executed the package and we can see the data viewer below.

I have got the corrected villagename in village_clean column. You can also review the _key_outversus_key_in.I marked the groups in the _key_outcolumn where the data would be grouped.

Differences Between Fuzzy Lookup and Fuzzy Grouping in SSIS

Here are the differences between these two transformations.

Fuzzy lookupFuzzy Grouping
Fuzzy Lookup performs data standardization, correcting and providingmissing values. Fuzzy Grouping performs a data cleaning task by identifying rowsof data that are likely to be duplicates.
Fuzzy Lookup enables you to match input records with clean, standardizedrecords in a reference table.Fuzzy Grouping enables you to identify groups of records in a tablewhere each record in the group potentially corresponds to the same real-worldentity.
Fuzzy Lookup returns the closest match in order to perform the fuzzy join.Fuzzy Grouping is useful for grouping together in order to perform twojoin options; Fuzzy and Exact.
For Fuzzy Lookup the comparison is made with a reference table.For Fuzzy Grouping the comparison is done with input data itself.
This is blocking transformation.This is blocking transformation.
Only input columns with the DT_WSTR and DT_STR data types can be usedin fuzzy matchingExact matching can be applied to columns of all data types except DT_TEXT,DT_NTEXT, and DT_IMAGE. The method for approximate matching of data is basedon a user-specified similarity score. It provides fuzzy matching to columnsDT_WSTR and DT_STR data types.
Next Steps
  • Check outtheSQL Server Integration Services tutorial.
  • Read more aboutTransformation in SQL ServerIntegration Services.
  • Read more aboutExact differences between FuzzyLookup and Fuzzy Grouping.

Last Updated: 2018-08-28



About the author
Bhavesh Patel is a SQL Server database professional with 10+ years of experience.
View all my tips
Related Resources