Introduction to Matillion

Welcome to introduction to Matillion. This blog explores the power of Matillion and explains how it can be leveraged to improve ETL/ELT processes to make data business-ready faster.

ETL & ELT

ETL and ELT  are data integration processes used in the data engineering space. Both processes have their strengths but there are distinct differences to take note of when choosing one for a particular use case.

ETL (Extract, Transform, Load): Is a data integration process that involves extracting data from various sources, transforming it into the desired format, before loading it into a target database or data warehouse. It is a traditional approach to data integration that has been widely used for many years. However, traditional ETL can be limited by the processing capabilities of the systems used to perform the transformations and can be time-consuming and resource intensive. 

ELT (Extract, Load, Transform): Is a modern approach to data integration that differs from traditional ETL in that the transformation of data takes place after it has been loaded into the target data warehouse. In ELT, data is first extracted from various sources and loaded into the target data warehouse, often in its raw format. The data is then transformed using the processing power and scalability of the data warehouse itself, rather than relying on a separate system. This approach allows for faster processing times, reduced data latency and increased efficiency.

What is Matillion?

Matillion is a data integration platform that not only integrates with data sources to ingest data into cloud platforms (e.g. AWS, Azure, GCP and Snowflake), but also transforms data into an analytics-ready state which then can be used by analytics and BI tools (e.g. Tableau, PowerBI, and Sigma). It leverages the power of these cloud data platforms to load, transform, sync and orchestrate data, to make data business-ready faster.

As a cloud-native ELT/ETL Software, Matillion can help to leverage cloud data warehouses to their fullest potential. It has the ability to integrate and transform of data from a wide range of sources, including databases, cloud applications and flat files. Matillion can also help with:

  • Data integration: Making it easy to extract data from a variety of sources and load it into your data warehouse. It supports multi-cloud platforms including, Amazon Web Services, Google Cloud Platform, Microsoft Azure and Snowflake making it easy for organisations to integrate and transform data from a variety of sources and targets.
  • Data transformation: With the help of a powerful transformation engine that facilitates a variety of transformations on data including, data cleansing, enrichment and aggregation. This helps get the data into the format needed for analysis and reporting.
  • Scalability: Matillion is designed to scale with businesses by making it easy to add more data sources, perform more complex transformations and process larger data volumes.
  • Ease of use: Matillion provides a user-friendly, drag-and-drop interface that make it easy to get started with ELT. It’s low-code! This allows for focus on data, rather than on the technology.
  • Cost-effectiveness: By leveraging cloud data warehouses, Matillion helps to reduce the costs associated with traditional data warehousing. This includes reducing the costs of hardware, software and maintenance, as well as freeing up resources to focus on other priorities.

With Matillion, organisations can quickly and easily perform complex data transformations, join data from multiple sources and load the transformed data into their data warehouse, all while leveraging the unlimited scalability and processing power of the cloud. This allows them to gain insights from their data faster, make data-driven decisions and support their digital transformation initiatives.

Matillion ETL’s User Interface

  1. Navigation Panel: This is where created jobs can be found.
  2. Job Canvas: This is where the content of the selected job, e.g., its components and how they are connected, can be found.
  3. Components Panel: This is where components that can be dragged to the main canvas to build the jobs can be found. Available components vary by the job type selected. In the example above, a transformation job was selected, hence the transformation components in the components panel. Clicking on the orchestration job will show orchestration components.
  1. Properties Panel: Selecting a component in the job canvas will reveal more information about the component in the Properties Panel where component properties can be configured.
  2. Tasks Panel: Monitor the progress of the current job.
  3. Project Menu: Shows further options about the current Project (located in the upper left corner).
  1. Help Menu: Contains further information about Matillion ETL (located in the upper right corner).

Learn more about UI and Basic Functions of Matillion in this document.

Orchestration and Transformation Jobs

To extract, load, and transform data into the desired cloud data platform, Matillion uses transformation and orchestration jobs. Orchestration jobs bring external data to your chosen warehouse and transformation jobs are used to transform data within your chosen warehouse.

Creating an Orchestration Job

In the Navigation Panel, right click on the relevant location (in this case it’s ‘default’) and select Add Orchestration Job. In this example, we have data from an Azure Storage Account (Blob Storage) and we want to load it to Snowflake. The data is an excel file, so go to the Components Panel and select the Excel Query component , drag it into the Job Canvas, connect it to the Start component and configure its properties.

Creating a Transformation Job

Again, on the Navigation Panel, right click on the  relevant location (in this case it’s ‘default’) but this time select Add Transformation Job. Start by pulling the data  in Snowflake by using a Table Input component.

Looking at the data, there is an ‘Order Date’ column which has format: yy-mm-dd hh:mm:ss. The Transform components will be utilised to split the ‘Order Date’ column into separate columns of year, month, date and time. To do that, add two Split Field components, the first one to split Order Date column into two columns – the date column (yy-mm-dd) and the time column (hh:mm:ss) and the second one to split the new date column into three columns – year, month and date. Preview your data by clicking the Sample tab next to the Properties tab and clicking the Data button. Next, use the Rewrite Table component to write the transformed data to a final table orderlist_final.

To finalise, drag the transformation job to the orchestration job

Finally, it’s time to run the job which runs all components.

Start by  getting the orderlist data from Azure, load it to the Snowflake table orderlist, then transform the data (i.e., splitting the Order Date column to have separate columns of year, month, day and time). Write this transformed data to the orderlist_final table. Checking Snowflake, we will see a new table, orderlist_final, with our desired columns.

Conclusion

When utilising Matillion, in just a few minutes, data can be loaded data from Azure and transformed in Snowflake by Matillion – all without having to write any custom SQL script. Matillion helps businesses save time, resources, money and increase accuracy and as a Global Platinum Partner we have the expertise to help integrate it into existing technology stacks. Interested? Contact us using the form below.

Leave a Comment