Regex in Alteryx Decorative Banner

Regex in Alteryx | Explained & Use Cases

What is Regex?

A regular expression (shortened as ‘regex’) is a sequence of characters that specify or define a search pattern in text. Regex allows a user to detect patterns that assist to help match, locate, and manage a text or string data.

What classifies as text?

Text can be made up of multiple different characters including:

  • Alphabets: (Characters A to Z)
  • Numeric: (Numbers 0 to 9)
  • National characters: (@, $, #)
  • Special characters: (!, %, ^, &, *, (, ), -, _, =, +, [, ], {, }, \, |, :, ;, ‘, “, <, >, ?, / )

Regex in everyday applications

Now that we know what regex is, the question would then be – how is this relevant and what can it be used for? Funnily enough, you’ve likely already been using it.

Well, everyone has used the ‘CTL + F’ function at least once in a spreadsheet right? Does the ‘Find’ or ‘Find and Replace’ feature sound familiar?

Or, maybe you’ve seen the ‘Match Case’ or ‘Match entire cell contents’ options.

That is regex in its simplest form! This regex use case allows you to search for a string of characters that matches a text or matches an entire cell content in a table.

Image of Microsoft Excel Find and Replace window with Match Case higlighted.

Simple use-case of regex

In the introduction, we explained that regex allows a user to create patterns that assist to help match a text or string of data.

A quick and simple use case of regex is email validation. An email address is a perfect example of how we can create a pattern to help match a string of data. The text can be broken down into a few patterns to be deemed valid.

Let’s create a sample email address: regex@billigence.com

From the above example we can derive that any valid email address consists of a few variable checkpoints.

  • regex
  • @billigence
  • .com

Item 1: This is the email address name, in this case, it is ‘regex

Item 2: The email address name is followed by the ‘@’ character and the domain name

Item 3: The domain name is followed by the domain name extension .com

Now to perform an email validation using regex, we would use the following code:

\w+@\w+.\w+

It may look complex but  it can be easily broken down into three parts:

  1. Any characters with alphanumeric characters are denoted by the code : \w+
  2. An @ followed by the domain name with alphanumeric characters is denoted by the code: @\w+
  3. A ‘.’ is needed, followed by the domain name extension with alphanumeric characters denoted by the code : .\w+

Regex Cheat Sheet

Summarised in the table below are some common regex expressions that might be useful. This cheat sheet can help you get started on your regex journey. However, by utilising Alteryx you can skip the cheat sheet as their regex tool has it all stored and ready for use.

Syntax

Description

.

Any single character

*

Zero or more

+

One or more

\<

Start of word

\>

End of word

^

Beginning of line

$

End of line

\n

New line

\t

Tab

\s

White space character

\

Escape special character

\d

Digit

\l

Lower case letter

\u

Upper case letter

[::alpha::]

Any letter

\w

Word Character

\W

Non-word character

\<\w+\>

Entire word

[]

Any one character in the set

[^]

Any one character not in the set

|

Or

()

Marked group

(?:)

Unmarked group

Gif of Alteryx interface workflow with regex tool in workflow

Introduction to Alteryx

Alteryx is an advanced analytics platform which delivers end-to-end automation including, but not limited to, access, manipulation, analysis and output of data. It is a low to no code environment and the simple drag-and-drop functionality makes it easy for users with little coding experience to perform advanced analytics. In this blog, we will be exploring their regex tool and how we can utilise it to simplify data cleaning and preparation.

Regex in Alteryx

Regex tool icon in Alteryx's Platform

Now that we have learned the basic of regex, how to use it and some syntax, let’s dive into how Alteryx can make it easier.

What can Alteryx’s regex tool do?

In Alteryx, we can use the regex tool to parse, match, tokenize or replace data.

Parse: Separate the expression into new columns and set the name, type and size of these new columns.

Match: Append a column containing a number:

  • True if the expressions match
  • False if the expressions do not match

Replace: Replace the expression you searched for with another text.

Tokenize: Split a column based on a regex function.

Regex in Alteryx – Use Case Example

In this section we will demonstrate how to use regex, using the simple email validation use case again.

We will be using two datasets for this analysis:

  1. Customers Dataset containing email addresses of customers who subscribed to an email newsletter
  2. Company Dataset containing the details of existing corporate businesses

Regex Match – Validating email address

Step 1: Open Alteryx Designer > Drag an Input Data Tool > choose the two files listed above (Emails) and Customers).

Image of Alteryx workflow showing Regex Match step one.

Step 2: Click on run and the output anchor of the input data tool to verify the data is populated.

Image of Alteryx workflow showing Regex Match step two.

Step 3:

3.1 Drag regex tool to the customers input data tool.

3.2 Change the column to parse to ‘Email Address’ as this is the column we want to search.

3.3 Key in the regular expression ‘\w+@\w+.\w+’ and output method as ‘Match’.

3.4 Rename the column for ‘Valid Email?’ and click on Run.

Image of Alteryx workflow showing Regex Match step three.
Regex Match Results with 'Valid Email?' column highlighting the True or False result

From the results, we can see that rows one and two are ‘True‘ while the result for the last row is ‘False’. This is because the email address for Jennifer contains special characters such as $ and #, whereas the search parameters used was \w which indicates that only alphanumeric characters are being searched for.

Regex Parse – Parsing Out Domain Name

With Alteryx, we can further process the data by making use of the parse function, where we are able to extract text based on a regex function.

To parse any expression, wrap the regex function with a set of round brackets ( and ) for the expression you want to parse.

Step 1: Drag and drop another regex function and change the column to parse to email address.

Regex Parse GIF step one outlining parsing procedure

Step 2: Insert the regular expression \w+@(\w+.\w+).

In this case, the expression we want to parse is the domain name after the @ so we wrap it in round brackets ( ). Next, change the output method to ‘Parse‘ and you will be able to change the Output Column name to ‘Domain Name’.

Regex Parse GIF step two outlining parsing procedure

Step 3: Click ‘Run’ then check the output anchor of the new regex tool.

Regex Parse GIF step three outlining parsing procedure
Results window in Regex showing the parse results in the Domain Name column.

The results show that the domain name has been successfully extracted.

Advanced Cleansing of Datasets

We have used regex to perform a match to check for valid email addresses and used regex parse to extract the domain name.

Remember how we have a secondary dataset of the existing corporate clients? With Alteryx, we can incorporate the various tools available to perform further data cleansing and do a match on whether the customers who have subscribed to the mailing list are existing clients.

Step 1: When we do a check on the column ‘Company Website’, we will need to change them to lowercase so that we are able to do a match with the results from the domain name we extracted.

To do this, drop a Data Cleansing Tool and select the ‘Company Website‘ column. Check the modify case and select ‘Lower Case‘. Click run and preview the results by clicking on the output anchor of the data cleansing tool.

Gif depicting advanced data cleansing step one in Alteryx

Step 2: To perform a match, we use the join tool which will connect the data from both datasets. Drag a join tool and point the ‘L input anchor’ to the ‘regex output anchor’ and point the ‘R input anchor’ of the join tool to the ‘data cleansing output anchor’ and click run.

From here, we can see that the L output anchor of the join tool contains Jennifer, whose domain name is not included in the list of existing corporate customers excel. And in the J output anchor of the join tool contains the domain name of existing corporate clients, billigence.com and alteryx.com.

Gif depicting advanced data cleansing step two in Alteryx

Step 3: Now that we can determine which are existing clients and which are not via the join tool, we want to consolidate that list and create a column to indicate whether they are existing clients or not. Drop a union tool and join the L and J output anchor from the join tool into the input anchor of the union tool. Click run to preview the results.

Gif depicting advanced data cleansing step three in Alteryx
Alteryx Advanced Data Cleansing Results highlighting successful cleansing within the 'Company Name' column.

Step 4: Having successfully joined the results, we can drop a formula tool to create a new column based on whether the join tool has a company name or not. Drop a formula tool and click ‘Add Column’ in the output column and enter ‘Existing Client?’. Paste the below code into the formula field:

if IsNull([Company Name])

then “No”

else “Yes”

endif

Click run and preview the results.

We can verify that the companies Billigence and Alteryx, are existing clients and the customers who have subscribed to the mailing list are successfully shown as existing clients in the new column created ‘Existing client?’.

Conclusion

Regex is just one example of the advanced data capabilities within Alteryx. As shown in the cleansing of this blog’s datasets, we can see that by incorporating the multiple tools available in Alteryx, we are able to extract, manipulate and output data to perform a quick comparison of two datasets.

If you’re interested in getting to know more about Alteryx and how you can use automated analytics to better your analytics get in contact below.

Leave a Comment

Scroll to Top