Removing Duplicates Rows with SSIS Sort Transformation

There are multiple ways to remove duplicate records in SQL Server. In this post, I’ll use the SSIS Sort Transformation to remove records and show you how easy it can be.

The SSIS Sort Transformation task is useful when you need to sort data into a certain sort order. You can compare it to the ORDER BY clause in a SELECT statement. Books Online explains it as:

“The Sort transformation sorts input data in ascending or descending order and copies the sorted data to the transformation output. You can apply multiple sorts to an input; each sort is identified by a numeral that determines the sort order. The column with the lowest number is sorted first, the sort column with the second lowest number is sorted next, and so on”

SSIS Sort Task in Action

First, open Visual Studio (or Business Intelligence Dev Studio if you’re using pre SQL Server 2012) and create an SSIS project. Next, we can go ahead and make a connection to our database. Right click Connection Managers in Solution Explorer and choose New Connection Manager:

SQL Freelancer SQL Server SSIS Sort Task Transformation Data Flow

Choose your Connection Manager type. In this example, we’ll use OLEDB. Next, configure the Connection Manager to point to your dataset. In this example, I’ll use localhost and my Dev database:

SQL Freelancer SQL Server SSIS Sort Task Transformation Data Flow
Test the connection and click OK. Next, drag a Data Flow task from the SSIS toolbox onto the design screen:

SQL Freelancer SQL Server SSIS Sort Task Transformation Data Flow

Right click the Data Flow task and choose Edit. You are now inside the data flow task. This is where all the action happens. Drag an OLEDB source task from the SSIS toolbox to the design screen:

SQL Freelancer SQL Server SSIS Sort Task Transformation Data Flow

Right click the OLEDB task and choose Edit. This screen is where we will define the connection manager we created earlier. Under OLEDB connection manager choose the connection you created. Leave data access mode as Table or view. Change the name of the table or the view to the table that has duplicate data that needs to be removed. In this example, I’ll use a table named Teams:

SQL Freelancer SQL Server SSIS Sort Task Transformation Data Flow

To preview the data click Preview. In my example, you can see I have duplicates in the Team, City and State columns:

SQL Freelancer SQL Server SSIS Sort Task Transformation Data Flow

Click OK to close the OLEDB Source task. Drag the Sort Transformation task onto the design screen. Connect the OLEDB Source task to the Sort task:

SQL Freelancer SQL Server SSIS Sort Task Transformation Data Flow
Click here to view the rest of this post.

Leave a Comment.