Data rarely arrives in the perfect format. In fact, research tells us that data scientists spend 80% of their time on data preparation and 20% of their time building models and analysing data.
Basically, data preparation is an incredibly time-consuming task within any data analytics or data science project. However, Alteryx Transpose and Cross Tab Tools can dramatically decrease the time it takes to prepare data and this blog provides a crash course on how to do it.
Why is data layout important?
Structuring data in the correct format is one of the essential steps in data preparation for analysis or machine learning. How the data is structured is dependent on the use case and desired output. For example, data needs to be prepped a specific way for reporting or visualisation purposes and for the kind of analysis it will undertake. So, if a dataset has months as columns, it would be better to pivot the columns to rows so that you can analyse trends over time using a visualisation tool.
Similarly, when combining two datasets for analysis, both datasets need to have the same layout so that they can blend together seamlessly.
It’s also vital to consider how the data is stored in its database. Row-oriented databases for Online Transaction Processing (OLTP) store data in rows, whereas data warehouses such as Snowflake which adopt column-oriented databases for Online Analytical Processing (OLAP) store data in columns. If your data isn’t set up correctly for its database, then you’re likely to end up with a mess.
Finally, if your data is in a tidy and practical layout then it is significantly easier to pivot the data to view it from different perspectives. This makes identifying patterns, trends and insights that you may have missed otherwise so much easier.
The Concept of Tidy Data
Tidy data is essential for data visualisations as it provides a consistent and structured format, making it easier to analyse and visualise. The three characteristics of tidy data are:
- Each variable forms a column
- Each observation forms a row
- Each value is in its cell
Most visualisation tools, such as Tableau, requires the input data to be in a tidy format. So, knowing how to transform your data to the required format is essential.
Achieving the Desired Data Layout
There are many ways that you can achieve your desired data layout. Firstly, by manually changing the position of each cell value by copying and pasting. This method works if your dataset is small, but is very tedious and time-consuming if your dataset is large.
Secondly, data can be transposed from columns to rows and vice versa by using Excel’s TRANSPOSE() function. If you are a seasoned Excel user, you are probably familiar with this function. The TRANSPOSE() function takes in an array formula and returns a vertical range of cells as a horizontal range or vice versa.
Thirdly, we can use programming languages like Python or R to pivot data into the desired format. For example, we can use the pivot_table function from the Pandas library in Python. This method is useful if you are planning to conduct your entire analysis (data preparation, data modelling, data visualisation) using a single programming language. However, the con is that you must know exactly which parameters to change to get the data layout that you want.
Finally, rather than using syntax and programming methods, an easier and more intuitive approach is using drag-and-drop tools, such as the Transpose and Cross Tab tools in Alteryx Designer to structure data. This method provides a more visual approach making it easy to understand how each step transforms your data structure.
Alteryx is a leading analytics automation platform solution that provides a range of tools to help users to analyse their data. Alteryx Designer is one of its products that automate data workflows, which are especially useful for Extract, Transform and Load (ETL) operations. In this article, we will be introducing these two powerful tools for data preparation and showing a simple case study on how to use them.
Using Alteryx’s Transpose and Cross Tab Tools
The dataset below shows the sales across the different categories and segments of a department store. This dataset will be used for the analysis throughout the rest of this post.
The Transpose Tool pivots data by moving horizontal data onto a vertical axis, i.e., transforming rows to columns. As shown in the picture below, the Transpose Tool groups all the field names in one column and all the corresponding values in another column, making the dataset longer and narrower.
In the Transpose Tool configuration, field names (e.g. “Segment”) that are selected under ‘Key Columns’ will remain unchanged and duplicated in the new rows created by the transposed data columns. The fields that are not selected under ‘Key Columns’ will be automatically selected under ‘Data Columns’. If you deselect a field name in ‘Data Columns’, for example, ‘Furniture’, the ‘Furniture’ column will be dropped from the output. All field names selected under ‘Data Columns’ will be grouped into a column called ‘Name’, while the corresponding values under the selected field names will be grouped into a column called ‘Value’.
After transposing there are three fields with six rows of data. The columns can then be given more meaningful names such as, Name, Value, Category and Sales, respectively. The data is now in a tidy format which can be inputted into Tableau for data visualisation.
The Transpose Tool is incredibly useful when you want to change the way data is organised for analysis or visualisation. For example, if the dataset has several columns of data, the Transpose Tool can be used to pivot the data so that each column becomes a row. This can help to better understand the relationships between different data points and make it easier to spot trends and patterns.
Cross Tab Tool
The Cross Tab Tool performs the opposite of what the Transpose Tool does. The Cross Tab Tool pivots data by moving vertical data onto a horizontal axis, i.e., transforming columns to rows. As shown in the picture below, the Cross Tab tool makes the dataset shorter and wider by transforming a column into a header row.
In the Cross Tab Tool configuration, field names selected under ‘Group data by these values’, for example, ‘Segment’, are used to group the data in the transposed rows and will remain unchanged. For each unique value in the field selected (“Name”) under ‘Change Column Headers’, a new column is created and populated by aggregating the values in the field selected (“Value”) under ‘Values for New Columns’. Our resulting header row will consist of values in ‘Name’, such as ‘Office Supplies’ and ‘Technology’.
Next, as it will be useful to know the total sales by segment and by category for reporting purposes, we can select ‘Sum’, ‘Total Row’ and ‘Total Column’ under ‘Method for Aggregating Values’. This step will output the sum of all the values by row and by column, resulting in an extra row and extra column that are appended to the ends of the table.
After cross-tabulating, there are four fields with four rows of data. It is important to note that the aggregation method, in this case, ‘Sum’, will be appended as a prefix to the pivoted header names. All spaces or special characters will be replaced with an underscore and headers will be automatically sorted in ascending order. Thus, it’s recommended that headers are renamed for clarity.
The Cross Tab Tool is great for aggregating and summarising data to uncover new insights, such as the total number of sales by region or the average salary by department.
Differences Between Both Tools
|Transpose Tool||Cross Tab Tool|
|Moves rows to columns||Moves columns to rows|
|Does not require a pattern to pivot||Requires a pattern (grouping, aggregation method) to pivot|
|Usually for visualisation purposes||Usually for reporting purposes|
Use Case for Transpose and Cross Tab Tool
Here is a simple example of how the Transpose and Cross Tab tools in Alteryx can be used to create a simple data visualisation and report table using the configurations outlined above.
The result is a report in PDF format that is useful for reporting purposes.
In summary, the Transpose and Cross Tab tools in Alteryx are powerful and can help you to manipulate and analyse your data more effectively. Both tools can be used together or separately, depending on specific needs. Whether working with large datasets or simply needing to reorganise data for more clarity, these tools are a valuable addition to any data analytics toolkit.
Enhancing your organisation’s capabilities, efficiency and outcomes with Alteryx can be an easy process. Billigence has the expertise and experience to help you get started. Want to know more? Submit an enquiry form below and we’ll be in contact.