What is Regex?
A regular expression (shortened as โregexโ) is a sequence of characters that specify or define a search pattern in text. Regex allows a user to detect patterns that assist to help match, locate, and manage a text or string data.
What classifies as text?
Text can be made up of multiple different characters including:
- Alphabets: (Characters A to Z)
- Numeric: (Numbers 0 to 9)
- National characters: (@, $, #)
- Special characters: (!, %, ^, &, *, (, ), -, _, =, +, [, ], {, }, \, |, :, ;, โ, โ, <, >, ?, / )
Regex in everyday applications
Now that we know what regex is, the question would then be – how is this relevant and what can it be used for? Funnily enough, youโve likely already been using it.
Well, everyone has used the โCTL + Fโ function at least once in a spreadsheet right? Does the โFindโ or โFind and Replaceโ feature sound familiar?
Or, maybe youโve seen the โMatch Caseโ or โMatch entire cell contentsโ options.
That is regex in its simplest form! This regex use case allows you to search for a string of characters that matches a text or matches an entire cell content in a table.

Simple use-case of regex
In the introduction, we explained that regex allows a user to create patterns that assist to help match a text or string of data.
A quick and simple use case of regex is email validation. An email address is a perfect example of how we can create a pattern to help match a string of data. The text can be broken down into a few patterns to be deemed valid.
Letโs create a sample email address: regex@billigence.com
From the above example we can derive that any valid email address consists of a few variable checkpoints.
- regex
- @billigence
- .com
Item 1: This is the email address name, in this case, it is โregexโ
Item 2: The email address name is followed by the โ@โ character and the domain name
Item 3: The domain name is followed by the domain name extension .com
Now to perform an email validation using regex, we would use the following code:
\[email protected]\w+.\w+
It may look complex but ย it can be easily broken down into three parts:
- Any characters with alphanumeric characters are denoted by the code : \w+
- An @ followed by the domain name with alphanumeric characters is denoted by the code: @\w+
- A โ.โ is needed, followed by the domain name extension with alphanumeric characters denoted by the code : .\w+
Regex Cheat Sheet
Summarised in the table below are some common regex expressions that might be useful. This cheat sheet can help you get started on your regex journey. However, by utilising Alteryx you can skip the cheat sheet as their regex tool has it all stored and ready for use.
Syntax | Description |
. | Any single character |
* | Zero or more |
+ | One or more |
\< | Start of word |
\> | End of word |
^ | Beginning of line |
$ | End of line |
\n | New line |
\t | Tab |
\s | White space character |
\ | Escape special character |
\d | Digit |
\l | Lower case letter |
\u | Upper case letter |
[::alpha::] | Any letter |
\w | Word Character |
\W | Non-word character |
\<\w+\> | Entire word |
[] | Any one character in the set |
[^] | Any one character not in the set |
| | Or |
() | Marked group |
(?:) | Unmarked group |

Introduction to Alteryx
Alteryx is an advanced analytics platform which delivers end-to-end automation including, but not limited to, access, manipulation, analysis and output of data. It is a low to no code environment and the simple drag-and-drop functionality makes it easy for users with little coding experience to perform advanced analytics. In this blog, we will be exploring their regex tool and how we can utilise it to simplify data cleaning and preparation.
Regex in Alteryx
Now that we have learned the basic of regex, how to use it and some syntax, let’s dive into how Alteryx can make it easier.
What can Alteryxโs regex tool do?
In Alteryx, we can use the regex tool to parse, match, tokenize or replace data.
Parse:ย Separate the expression into new columns and set the name, type and size of these new columns.
Match:ย Append a column containing a number:
- True if the expressions match
- False if the expressions do not match
Replace:ย Replace the expression you searched for with another text.
Tokenize:ย Split a column based on a regex function.
Regex in Alteryx โ Use Case Example
In this section we will demonstrate how to use regex, using the simple email validation use case again.
We will be using two datasets for this analysis:
- Customers Dataset containing email addresses of customers who subscribed to an email newsletter
- Company Dataset containing the details of existing corporate businesses
Regex Match – Validating email address
Step 1: Open Alteryx Designer > Drag an Input Data Tool > choose the two files listed above (Emails) and Customers).

Step 2: Click on run and the output anchor of the input data tool to verify the data is populated.

Step 3:
3.1 Drag regex tool to the customers input data tool.
3.2 Change the column to parse to โEmail Addressโ as this is the column we want to search.
3.3 Key in the regular expression โ\[email protected]\w+.\w+โ and output method as โMatchโ.
3.4 Rename the column for โValid Email?โ and click on Run.


From the results, we can see that rows one and two are โTrueโ while the result for the last row is โFalseโ. This is because the email address for Jennifer contains special characters such as $ and #, whereas the search parameters used was \w which indicates that only alphanumeric characters are being searched for.
Regex Parse โ Parsing Out Domain Name
With Alteryx, we can further process the data by making use of the parse function, where we are able to extract text based on a regex function.
To parse any expression, wrap the regex function with a set of round brackets ( and ) for the expression you want to parse.
Step 1: Drag and drop another regex function and change the column to parse to email address.

Step 2: Insert the regular expression \[email protected](\w+.\w+).
In this case, the expression we want to parse is the domain name after the @ so we wrap it in round brackets ( ). Next, change the output method to โParseโ and you will be able to change the Output Column name to โDomain Nameโ.

Step 3: Click โRunโ then check the output anchor of the new regex tool.


The results show that the domain name has been successfully extracted.
Advanced Cleansing of Datasets
We have used regex to perform a match to check for valid email addresses and used regex parse to extract the domain name.
Remember how we have a secondary dataset of the existing corporate clients? With Alteryx, we can incorporate the various tools available to perform further data cleansing and do a match on whether the customers who have subscribed to the mailing list are existing clients.
Step 1: When we do a check on the column โCompany Websiteโ, we will need to change them to lowercase so that we are able to do a match with the results from the domain name we extracted.
To do this, drop a Data Cleansing Tool and select the โCompany Websiteโ column. Check the modify case and select โLower Caseโ. Click run and preview the results by clicking on the output anchor of the data cleansing tool.

Step 2: To perform a match, we use the join tool which will connect the data from both datasets. Drag a join tool and point the โL input anchorโ to the โregex output anchorโ and point the โR input anchorโ of the join tool to the โdata cleansing output anchorโ and click run.
From here, we can see that the L output anchor of the join tool contains Jennifer, whose domain name is not included in the list of existing corporate customers excel. And in the J output anchor of the join tool contains the domain name of existing corporate clients, billigence.com and alteryx.com.

Step 3: Now that we can determine which are existing clients and which are not via the join tool, we want to consolidate that list and create a column to indicate whether they are existing clients or not. Drop a union tool and join the L and J output anchor from the join tool into the input anchor of the union tool. Click run to preview the results.


Step 4: Having successfully joined the results, we can drop a formula tool to create a new column based on whether the join tool has a company name or not. Drop a formula tool and click โAdd Columnโ in the output column and enter โExisting Client?โ. Paste the below code into the formula field:
if IsNull([Company Name])
then โNoโ
else โYesโ
endif
Click run and preview the results.


We can verify that the companies Billigence and Alteryx, are existing clients and the customers who have subscribed to the mailing list are successfully shown as existing clients in the new column created โExisting client?โ.
Conclusion
Regex is just one example of the advanced data capabilities within Alteryx. As shown in the cleansing of this blog’s datasets, we can see that by incorporating the multiple tools available in Alteryx, we are able to extract, manipulate and output data to perform a quick comparison of two datasets.
If youโre interested in getting to know more about Alteryx and how you can use automated analytics to better your analytics get in contact below.