I have been thinking about the various specialisations and opportunities within the world of data, and there are two areas that really appeal to me: data analysis and ETL. As part of our KPI package, we are all allowed to choose a personal KPI. The personal KPI should be in an area of interest to us, that will give us the opportunity to develop our skills in this area. For my personal KPI, I have chosen to explore ETL and SSIS.
For the following 6 months, I get to set my own personal KPI. Our personal KPIs are designed to help us learn, grow and develop in some professional capacity. I am fascinated by the processes required to “wrangle” data, the challenges of extracting, filtering and transforming raw data into some usable structure. Broadly this is know as ETL, Extract, Transform and Load.
In the SQL Server world, SQL Server Integration Services (SSIS) is the major tool for ETL processes. SSIS crafts the control and data flow aspects of ETL into a cohesive process. Because SSIS is focused on data integration, it provides convenient APIs to connect to a range of data sources, including OLAP, OLEDB, ODBC and flat files. A wide range of filtering and transformation tools makes these tasks as simple as drag-and-drop, in the familiar environment of Visual Studio. I have decided to explore SSIS and see what makes it so exceptional.
This morning I made my first, very simple SSIS package using SQL Server Data Tools. My first package is very simple, it extracts data from a csv file, divides the data alphabetically and inserts it into two different database tables. The SSIS data flow diagram is shown below:
I was impressed with how easy it was to jump into SQL Server Data Tools and just get going. Visual Studio is a superb IDE, it is consistent across all platforms (web, desktop, SSMS, SQL Server Data Tools) and therefore piecing this package together was just a case of ‘following my nose’. I was pleased, the package worked first time and with a minimum of hassle. SQL Server Central have an excellent series of tutorials, Stairway to Integration Services. In particular, Level 2 gives a great overview of the Data Flow Pipeline and how to configure a simple Data Flow pipeline.
That said, I have some niggling doubts. This package is so simple, that it would have been quicker to write a simple script in Python, or just as simple to bcp the data into a staging table and split it from there. In this case the use-friendly, convenient drag-and-drop did not simplify the task. So I am left wondering whether the full IDE-experience will be a help or a hindrance in a significantly more complex project? We shall have to wait and see…