SQL Server Integration Services (SSIS) is a component of the Microsoft SQL Server database software that can be used to perform a broad range of data migration tasks.

SSIS is a platform for data integration and workflow applications. It features a data warehousing tool used for data extraction, transformation, and loading (ETL). The tool may also be used to automate maintenance of SQL Server databases and updates to multidimensional cube data.

What is data integration and why is it required?

1

Let us consider an example of a company wherethere are different departments as shown in image above. Suppose one day CEO asks a report for best performing department for the current year, then how this requirement can be fulfilled?

It is difficult since each department has its own way of storing data.And to compare data at single point is tedious job. To tackle this problem one option will be to connect databases. But this will increase complexity as well as time. So, the best option is data integration.

Data integration is a processin which heterogeneous data is retrieved and combined as an incorporated form and structure. There are many tools available in market to achieve this and SSIS is one of them.

SSIS is a fast & flexible data warehousing tool used for data extraction, loading and transformation like cleaning, aggregating, merging data, etc. It makes it easy to move data from one database to another database. SSIS can extract data from a wide variety of sources like SQL Server databases, Excel files, Oracle and DB2 databases, etc.

SSIS Architecture:

2

 

There are 4 important components in SSIS ARCHITECTURE.

1.Object Model

2.SSIS runtime

3.Integration Services service

4.Data Flow Task

SSIS object model is a managed set of APIs that allows us to write custom applications, command line utilities etc.

SSIS Runtime is an environment that runs the SSIS package, manages transactions etc. Runtime executables are the smallest part that runs into the environment. For example, packages, containers, tasks, event handlers all of these are runtime executables.

Integration Services service, a Windows service, monitors running SSIS packages and manages the storage of packages. It’s available in SQL Server Management Studio.

Data flow is the most important (and critical too) part of SSIS architecture. It contains a data flow engine that manages the data flow components. A significant number of data flow components ships along with the SSIS installation. And these can be categorized in three categories – source components (which extracts the data from a system), transformation components (performs transformations, modifications onto the extracted data) and load components (which simply performs the data loading tasks into the destination systems). Besides the available data flow components, we can write our own custom data flow components to accomplish any custom requirements.

Practical example to transfer data from flat file to database.

In order to understand the details, it’s always better to start implementing a “pilot” application first.

umang_india

Basic Requirements to start with SSIS

Visual studio with support for Business Intelligence Integration Services

SQL server Management studio

So before starting application create Sample.csv with below data:

3

Also create destination table with required columns. In this case it will be Name, Age, JoinDate, Salary, Retired

Now open Visual studio and create new integration project.

4

Newly created project contains Control flow, Data flow tabs. The data flow is a special task of the control flow. It needs a canvas of its own, so there’s an extra tab for the data flow, right next to the control flow.

To create a data flow, you can drag it from the SSIS toolbox to the control flow canvas.

5

 

Clicking the link will create a new data flow task for you. You end up with an empty canvas, just like in the Control Flow.

6

Then from left panel drag and drop flat file source and OLE DB Destination components. Connect both the components.

7

Now double click the components to configure them. First double click on flat file and set path to Sample.csv

Check the mappings and data types and click ok.

Double click on OLE DB. Create new database connection or use existing one.

This completes the package to transfer data from file to database. Package can be run using start button present visual studio.

Once package is run following and If we don’t encounter any error in the project, the result will be green color. If we have any issue, the result will indicate in red color.

8

9

We can view the execution result by viewing progress tab:

10

Advantages and Disadvantages of using SSIS

Advantages:

  1. Broad documentation and support
  2. Ease and speed of implementation
  3. Tight integration with SQL Server and visual studio
  4. Standardized data integration
  5. Offers real-time, message-based capabilities
  6. Support for distribution model
  7. Helps you to remove network as a bottleneck for insertion of data by SSIS into SQL
  8. SISS allows you to use the SQL Server Destination instead of OLE DB to load the data faster

Disadvantages of SSIS

  1. Few drawbacks of using SIS tools are as follows:
  2. Sometimes create issues in non-windows environments
  3. Unclear vision and strategy
  4. SSIS lacks support for alternative data integration styles
  5. Problematic integration with other products

SSIS Best Practices

SSIS is an in-memory pipeline. That’s why it’s important to make sure that all transformations occur in memory

Try to minimize logged operations

Plan for capacity by understanding resource utilization

Optimize the SQL lookup transformation, data source, and destination

Schedule and distribute it correctly

References: