Insert/Update a SQL Server table using Merge Join in SSIS

I was asked a question awhile back on the easiest way to do an incremental load from one SQL Server table into another SQL Server table. This is common in data warehouses or reporting tables as you wouldn’t want to truncate a large table and perform a full insert. Instead, you would want to only copy the changes from the source to the destination. This could be an insert or an update. The easiest way, IMO, is SSIS. Let’s take a look.

Here’s how my environment is setup. I have two databases appropriately named Transactional (for transactional data) and Reporting (for static report data). Two tables, Source and Destination.

I also added some dummy data as you see below. I’ve highlighted what is different and will need to be changed in the destination table.

Let’s open SSIS and create a new SSIS project.
Drag and drop a Data Flow task into the design window, right click, Edit:

Next, drag and drop two OLE DB source components into the design window. I’m going to rename mine Source and Destination to match my table names.

Configure the two OLE DB sources to match the source table and the destination table. Below is a screenshot of my source connection manager.

Once your connection managers are configured let’s drag and drop two Sort components below each OLE DB Source and connect them to each source. We will be using a merge join next and the merge join component needs to have data sorted in ASC or DESC order. I always choose to sort on the Primary Key for each table.

In my example, I’ll choose to sort on my primary key, ID.

Next, drag the Merge Join transformation into the Design window and drag the data path from the Sort component to the Merge Join. When you attach the arrow to the transformation, the Input Output Selection dialog box appears, displaying two options: the Output drop-down list and the Input drop-down list. The Output drop-down list defaults to Source Output, which is what we want. From the Input drop-down list, select Merge Join Left Input, as shown below. We’ll use the other option, Merge Join Right Input, for the other connection.

Next, connect the data path from the other Sort component to the Merge Join transformation. This time, the Input Output Selection dialog box does not appear. Instead, the Input drop-down list defaults to the only remaining option: Merge Join Right Input.

Now, let’s configure the Merge Join transformation.

The first setting in the Merge Join Transformation Editor is the Join type drop-down list. From this list, you can select one of the following three join types:

  • Left outer join: Includes all rows from the left table, but only matching rows from the right table. You can use the Swap Inputs option to switch data source, effectively creating a right outer join.
  • Full outer join: Includes all rows from both tables.
  • Inner join: Includes rows only when the data matches between the two tables.

For our example, we want to include all rows from left table but only rows from the right table if there’s a match, so we’ll use the Left outer join option.

You now need to select which columns you want to include in the data set that will be outputted by the Merge Join transformation. For this exercise, we’ll include all columns. To include a column in the final result set, simply select the check box next to the column name in either data source.

Almost finished, but first let’s add a Conditional Split transformation. This will allow us to insert new records or update previous records.

In the Conditional Split Editor, I created two outputs. If (Destination) ID is NULL then the record doesn’t exist so we’ll perform an INSERT. If the ModifiedDate is different between the two tables then we know something has been updated since the last execution and we need to update the record. See below.

Since we can perform and INSERT or an UPDATE we’ll need two destinations. First, for the INSERT, we’ll simply be doing an INSERT into the table so we can drag the OLE DB Destination component into the window, choose “INSERT” in the Input Output selection window, and use the Reporting connection manager.

I’m going to check the “Keep Identity” box since the ID column is an identity column.

Next, for the UPDATE statement we’ll drag the OLE Command component into the window and configure it. Select “UPDATE” in the Input Output Selection window.

In the Connection Managers tab, assign the connection manager for Reporting.

In the Connection Managers tab, assign the connection manager for Reporting.

In the column mappings tab, assign parameters:

Final package should look like the following:

Save and execute. You can see that we updated two records and inserted one record:

Going back to our query you can see that everything matches up now:

Using a DateTime Expression in a SSIS Destination

Recently, I’ve ran into a few situations where I needed to export some SQL data into a CSV on a daily basis. I also needed to keep these CSV files in an archive folder for reference in case a problem came up. Moving the files to an archive folder is not a problem using the “File System” task, but the CSV needs to be a unique name otherwise it will just overwrite the previous file. There are numerous ways to make the filename unique, but in the post I’ll describe adding a datetime stamp to each file which also helps determine when the file was created.

I’ll assume you already have a Data Flow task configured with a destination and a flat file source. In this example, I’m going to use the AdventureWorksDW2012 database and copy data from the DimProduct table to a CSV. I already have my OLE DB source task configured and now I’m going to configure the Flat Destination.

Right click the Flat File Destination and click Edit:

SSIS Expression Destination DateTime

In the Flat File Destination Editor, click New…

SSIS Expression Destination DateTime

Select Delimited and click OK.

I’ve created a blank CSV on my Desktop which we’ll use for now, but this will eventually change:

SSIS Expression Destination DateTime

Click OK and select Mappings and verify all the mappings from the source are correct:

4 - SSIS Expression Destination DateTime

Once this is verified, click OK. (For this example, I only want to copy the first 6 columns)

Next, we need to create an expression on the connection string so that it knows to add the datetime stamp. Click on the Flat File Connection Manager. Scroll down the Properties window until you see Expressions:

SSIS Expression Destination DateTime

Click the ellipsis beside Expressions and the Property Expressions Editor will appear:

SSIS Expression Destination DateTime

Under Property, select Connection String and click the ellipsis under Expression:

SSIS Expression Destination DateTime

You should now see the Expression Builder window. Here is where we will define our complete file name including file path. For this example, I want to create the CSV in a directory named CSV on the C: drive. The filename that I will use is Products_datetime stamp.csv so the final output should be C:\CSV\Products_20150417092331.csv

The expression I’ll use to achieve this is below:

"C:\\CSV\\Products_" + (DT_STR,4,1252)DATEPART( "yyyy" , getdate() ) + RIGHT("0" + (DT_STR,4,1252)DATEPART( "mm" , getdate() ), 2) + RIGHT("0" + (DT_STR,4,1252)DATEPART( "dd" , getdate() ), 2) + RIGHT("0" + (DT_STR,4,1252)DATEPART( "hh" , getdate() ), 2) + RIGHT("0" + (DT_STR,4,1252)DATEPART( "mi" , getdate() ), 2) + RIGHT("0" + (DT_STR,4,1252)DATEPART( "ss" , getdate() ), 2) + ".csv"

Type your expression in the Expression window of the Expression builder:

SSIS Expression Destination DateTime

Click Evaluate Expression to view the output:

SSIS Expression Destination DateTime

Now that the expression is built, click OK twice. You should now see the expression formula icon beside the Flat File Connection Manager:

SSIS Expression Destination DateTime

If we execute the package and look inside the CSV folder, you should see our new file:

SSIS Expression Destination DateTime

SQL Server Integration Services Connector for SalesForce

Problem:

I have a list of contacts in my SQL Server database that I would like to import into Salesforce and keep Salesforce up to date as new contacts are entered daily into my database. I know there are a few different ways to accomplish this but I’d like to use a SSIS package so I can automate.

Solution:

In my opinion, the best (and easiest) way to accomplish this goal is using SSIS packages and a component from CozyRoc called Salesforce Destination. CozyRoc provides advanced SSIS libraries of custom tasks, components and scripts for handling Zip, GZip, BZip2, Tar archives, enhanced script reuse and secure Internet communications. I always go to CozyRoc when I need a custom task. In this tip I’ll show you how to take a contact list from SQL Server and import it into SalesForce. I’ll also show you how to schedule this on a daily basis.

For this example, let’s say I have a contact table in my SQL Server database like the following:

SQL Server SSIS SalesForce

I need to import these records into my Contacts tab in SalesForce:

SQL Server SSIS SalesForce

Create SSIS package

First thing I’ll need to do is download the CozyRoc SalesForce component. This download can be found here: http://www.cozyroc.com/ssis/salesforce-destination

I’ll download and install this component on the server or workstation that I plan on developing the package. Once installed, open SQL Server Data Tools and create a SSIS project:

SQL Server SSIS SalesForce

First, let’s create a connection to the SQL Server database that stores the contact information and the Salesforce destination:

Right click in Connection Manager and create “New OLE DB Connection”:

SQL Server SSIS SalesForce

Type Server Name, credentials, and select database from drop down that stores the contact information. In this example, I’m selecting my local server and Demo database:

SQL Server SSIS SalesForce

Test Connection and click OK. Right click on the Connection Manager again and click “New Connection”:

SQL Server SSIS SalesForce

If the Salesforce Destination component was installed properly, you should see a connection called “SFORCE”:

SQL Server SSIS SalesForce

Click Add. Type your UserName and Password + Security token. If you don’t know your security token click here to read more about resetting it.

SQL Server SSIS SalesForce

Once you have entered your login credentials, make sure to test connection.

You should now have two connections in Connection Managers:

SQL Server SSIS SalesForce

Now to designing the package:

From the Control Flow we will add our Data Flow Task and right click to configure.

SQL Server SSIS SalesForce

Add the OLE DB source and SalesForce Destination tasks to the designer:

SQL Server SSIS SalesForce

Configure OLE DB Source:

SQL Server SSIS SalesForce

To configure the SalesForce Destination right click and edit. Choose the connection manager you created earlier:

SQL Server SSIS SalesForce

Click the Component Properties tab and you’ll notice a few options under Custom Properties:

Action:

Action Description
Create Create a new record in the destination object
Update Update an existing record in the destination object. You must specify the appropriate object record identifier for the update to work.
Delete Delete an existing record from the destination object. You must specify the appropriate object record identifier for the delete to work.
Upsert Update and insert a record in the destination object. Selecting this value displays the dynamic parameter ExternalId.

 

Batch Size: Specify the number of rows to be sent as a batch. The maximum for regular mode is 200. The maximum for bulk-load is 10,000\.

Destination Object: Specify the destination Salesforce object from a dropdown where the data is to be loaded.

ExternalID: Specify the external identifier field for updating the foreign key lookup with an external identifier.

Mode:

Value Description
Regular Process the data in regular mode
BulkData Process the data in bulk-load data mode
BulkBinary Process the data in bulk-load binary mode

SQL Server SSIS SalesForce

Since we are adding Contacts, we’ll select Create (Action) and Contact from the DestinationObject option.

Click the Column Mappings tab and make sure all the columns are mapped correctly (delete mapping between ID’s for this example):

SQL Server SSIS SalesForce

Run the package and if everything is setup correctly your Salesforce instance should new contacts in it:

SQL Server SSIS SalesForceSQL Server SSIS SalesForce

Run this package as daily job

To run this package as a daily job that will populate Salesforce hourly, daily, weekly, etc. we’ll need to create a SQL Agent job and set a schedule.

Next Steps:

  • Updating Salesforce is a little more complicated and requires a custom field in Salesforce. The video titled “Upsert with external ID” located here will help with this.
  • Check out com for your custom SSIS needs.

SQL Server Integration Services Data Type Conversion Testing

There are two ways of converting data types within SQL Server Integration Services (SSIS).  Here is an overview of those options:

  • One way is using T-SQL code. Using the CAST function in your code can convert one data type into another. The scope of this tip is not intended on going into detail about the CAST function, but I will show a quick example.
  • The second way is using the Data Conversion Transformation Task. This task will convert the data in an input column to a different data type and then copy it to the output column. The task itself isn’t too complicating to setup as I’ll show you in this tip.

SQL Server Integration Services Data Conversion Task to Convert Data Types

Let’s look at the Data Conversion task first…

First, open Visual Studio (or Business Intelligence Dev Studio if you’re using pre SQL Server 2012) and create an SSIS project. Next, we can go ahead and configure a Connection Manager to our database. Right click Connection Managers in Solution Explorer and choose New Connection Manager:

SQL Freelancer SQL Server SSIS Data Type Conversion
Choose your Connection Manager type. In this example, we’ll use OLEDB. Next, configure the Connection Manager to point to your dataset. In this example, I’ll use localhost and the AdventureWorks2008R2 database:
SQL Freelancer SQL Server SSIS Data Type Conversion
Test the connection and click OK. Next, drag a Data Flow task from the SSIS toolbox onto the design screen:

SQL Freelancer SQL Server SSIS Data Type Conversion
Right click the Data Flow task and choose Edit. You are now inside the data flow task. This is where all the action happens. Drag an OLEDB source task from the SSIS toolbox to the design screen:

SQL Freelancer SQL Server SSIS Data Type Conversion
Right click the OLEDB task and choose Edit. This screen is where we will define the Connection Manager we created earlier. Under OLEDB connection manager choose the connection you created. Leave data access mode as Table or view. Change the name of the table or the view to the table that contains the data types to change. In this example, I’ll use a table named Sales.CurrencyRate:

SQL Freelancer SQL Server SSIS Data Type Conversion
To preview the data click Preview. In my example, I want to change the CurrencyRateDate and the ModifiedDate columns from a datetime data type to a date data type:

SQL Freelancer SQL Server SSIS Data Type Conversion
Click OK to close the OLEDB Source task. Drag the Data Conversion Transformation task onto the design screen. Connect the OLEDB Source task to the Data Conversion task:

SQL Freelancer SQL Server SSIS Data Type Conversion
Right click the Data Conversion task and choose Edit. Here is where we will convert our data types. Since I am converting CurrencyRateDate and ModifiedDate I will click on each of them in the Available Input Columns list:

SQL Freelancer SQL Server SSIS Data Type Conversion

Click here to view the rest of this post.