Automated Document Processing in Alteryx

Companies spend countless hours and resources managing invoices and dealing with document processing. This critical yet time-consuming task, involving the verification and payment of various invoices, poses significant challenges, especially with paper-based systems still prevalent in many countries.  This blog explores how Alteryx simplifies invoice processing by enabling users to build solutions tailored to their specific needs. The real-world impact is proven with a case study of a customs department that successfully automated the verification of customs decisions using Alteryx.

How Does Alteryx Help with Invoice Processing?

Alteryx Designer

Designer, often considered a ‘Swiss Army Knife’ in the world of data wrangling and analytics, is the main Alteryx product and simplifies complex tasks. Its user-friendly, no-code interface allows for easy creation of computational workflows through drag-and-drop actions. These workflows, once set up, can be reused with new data, making repetitive tasks more efficient. Additionally, its compatibility with a wide range of file formats and systems streamlines data integration.

Alteryx Intelligence Suite

The Intelligence Suite extends the capabilities of Designer, introducing advanced features like computer vision, text mining and machine learning. Particularly for processing PDF invoices, its computer vision tools are crucial. Optical Character Recognition (OCR) is employed to extract content from documents accurately. By using keyword- or machine learning-based document classification documents can be sorted with the highest precision, a vital feature in managing diverse invoice types.

Alteryx Server

Alteryx Server is designed to enhance scalability and automation. It allows for efficient management of workflow versions and scheduling, which is important for handling large volumes of invoices. The server facilitates seamless information exchange with invoice issuers and its automated workflow scheduling capabilities help in the prompt identification of clarification cases. This includes options for direct API connections or automated email notifications to caseworkers, ensuring smooth and continuous invoice and document processing.

Case Study | Verification of Import Duties with Alteryx

In the following case study, we will refer to the company by the pseudonym ‘Ipsum.’ Please note that ‘Ipsum’ is a fabricated name, chosen solely for this narrative and has no connection to any real businesses currently operating under this name.


Ipsum faces significant challenges in managing imports from non-domestic territories. Ensuring accurate customs registration and processing import duties is crucial to avoid penalties. Despite outsourcing customs declaration, the risk of human errors persists. To mitigate this, Ipsum meticulously reviews all customs notices, focusing on two key steps: calculating duties (import VAT and customs duties) and comparing these calculations with the customs decisions.

Initial Situation 

Initially, Ipsum relied on a manual Excel-based method for calculating customs and tax duties. This process involved pulling information from up to ten different sources, including supplier and freight invoices, as well as certificates of origin, freight surcharges or exchange rates. All this data, along with the applicable customs rates stored in another sheet, required extensive manual input and comparison, making the process prone to errors and inefficiencies.

Solutions with Alteryx

To enhance both speed and accuracy of this process, Ipsum, in collaboration with Billigence, developed an Alteryx-based solution consisting of three main workflows:

  1. Data Extraction and Calculation: This workflow automates the extraction of necessary data from various sources, inputting it directly into the Excel template.
  2. Invoice Comparison: A second workflow extracts VAT and customs duties data from scanned customs invoices, comparing them with the company’s own calculations. This ensures accuracy and helps in identifying discrepancies.
  3. Customs Master Data Management: A third workflow is dedicated to updating and managing customs master data within the Excel template. This setup still allows for manual adjustments in cases of anomalies or errors in source data.

5 Benefits of Alteryx for Invoice Processing

1. Retrieving data directly from the ERP system

Due to the multitude of parties involved in the import process document processing is often challenging particularly when establishing a uniform standards for the format, transmission or naming of documents. While the shift towards electronic invoices is increasing, many supplier and freight invoices still arrive in formats like PDF or scanned images. Efficient processing of these documents demands accurate sorting and renaming, a task that, if done manually, requires extensive human resources. Ideally, importers should strive for uniform standards and naming conventions with business partners to reduce the need for manual preparation.

Keyword-Based Classification

By identifying unique keywords in the document text, files can be pre-sorted and renamed for further processing. This approach is simple and quick to implement, predictable and traceable, as the system will always deliver the same result for the same text.

Machine Learning-Based Classification

The Alteryx Intelligence Suite offers various options for training custom classification models. For instance, the Image Recognition Tool allows for classification based on the image pixels of PDF documents. With the Text Classifier Tool, classification is based on text previously extracted via OCR. For very large volumes of documents, an AI model can significantly reduce the integration of additional suppliers and manual readjustments. However, creating a useful model requires many manually classified PDF documents, which often involve increased development effort upfront.

2. Automated Data Extraction from PDF Documents

 As previously mentioned, the Computer Vision Tool palette of the Intelligence Suite allows for the conversion of scanned documents into editable text. The Image Template Tool enables the creation of a template for the invoice layout and allows for marking the desired part of the text to be extracted and assigning it to a column in the output. This template, in combination with the Image to Text Tool, can then be used for every subsequent invoice of the same layout. This makes it simple to extract import dates, invoice numbers, or customs tariff numbers from the documents.

Marking also allows for automatic table recognition. If the tables always have the same size and the lines are clearly recognisable, only minimal post-processing is necessary. However, in reality, processing tables on PDF invoices involves various moving parts. The number of items is variable. The number of pages is also variable because the number of items is variable. Additionally, the number of lines per item is variable, as the contents of the columns per item are variable.

Image depicts invoice exampe
Image depicts Image to Text Tool configuration in Alteryx

Alteryx’s data processing tools make it possible to manage this variability with little effort. For instance, instead of using a template, the entire text of a document can be read using the Image-to-Text Tool. The tables can then be isolated using the RegEx Tool through start and end words and split into rows with the Text-to-Columns Tool. The Multirow-Formula Tool can be used to assign a position number that increments whenever a defined condition is met, such as the presence of a product number. Extracting individual details per position is then quickly achieved using the RegEx Tool.

Alteryx RegEx workflow example
Image depicts workflow output / extracted table in Alteryx

3. Integration of Various Data Sources

Effortless Data Retrieval from ERP Systems

When data such as customs rates and exchange rates are not readily formatted in the ERP system, Alteryx enables direct fetching from external data sources. Connecting to databases is streamlined with the Visual Query Builder in Alteryx, which allows users to craft queries without any SQL knowledge, using an intuitive drag-and-drop interface.

For those working with SAP systems, integrating data into workflows is made seamless with the use of Theobald or DVW connectors. This negates the need for an SAP developer, as all data accessible through the SAP GUI, within one’s own permissions, can be integrated into Alteryx workflows.

Leveraging Public APIs for External Data

If data are not available in internal company systems, external sources must be used. For instance, accurate currency conversion requires the latest exchange rates, which can be fetched from customs using the Alteryx Download Tool. This tool facilitates data retrieval through APIs, such as pulling the monthly exchange rate from By inputting the currency code and the import date – data points previously extracted from PDF documents – Alteryx can automate this data collection. Moreover, this process can be encapsulated into a reusable Alteryx Macro, making it easy to integrate into an array of workflows.

Alteryx Macro example

Webscraping and Automatic File Download

With the Download Tool in Alteryx, it’s also possible to create a web scraper. A web scraper, for example, can read out all available information for each customs tariff number from the source code of EZT-Online and format it into a tabular format. It’s even simpler if the web scraper only needs to download already available files with current customs rates, such as from For websites with dynamic content, Alteryx allows the integration of Python code directly as a node in the workflows. Using Python libraries like Selenium and BeautifulSoup, it’s possible to write a web scraper that extracts data from dynamic websites. With the support of ChatGPT, such a program can be created even without in-depth programming knowledge.

Python tool in Alteryx example

Through these robust data integration capabilities, Alteryx not only ensures that customs departments have access to the most current and relevant data but also greatly reduces the time and effort required in data collection and overall document processing.

4. Integration of Suppliers and Service Providers with Alteryx Server, APIs, and Email Tool

Upstream companies can be more deeply integrated into the process through Alteryx Server. Alteryx Server is a platform that allows users to share and manage Alteryx workflows, apps and macros in a secure and centralised environment. Most importantly, it enables the scheduled execution of workflows.

This allows the systems of service providers to be automatically integrated into ongoing processes. For example, it’s conceivable to send the values required for registration to the service providers in advance, based on the documents transmitted. This can be done using the Download Tool via existing interfaces or simply via email with the Email Tool. The recipient of these emails is fully customisable and can be sent to the responsible clerk depending on the supplier. The content of the email is also adjustable. It’s also conceivable to set up notification functions if data are incorrect or incomplete.

5. Support in the Tariff Classification of Products

In the review process, it’s also important to determine whether the service provider has declared the product with the correct customs tariff number. Poor product descriptions, cumbersome definitions of the customs tariff, or differing interpretations are just some of the challenges that come with tariff classification. Alteryx can assist in automating this process of document processing.

Fuzzy Matching

If comprehensive master data with already classified products are available and the products have a high degree of similarity, the Fuzzy Match Tool can be helpful. With Fuzzy Matching or approximate search in Alteryx, the product descriptions of new products can be examined for similarities with already classified products. If the similarity is above a threshold, the customs tariff number of the new product could be used. However, this method might quickly reach its limits if it depends on the product descriptions from the supplier, as these can vary significantly in language and extent.

Machine Learning-Based Solutions

To achieve high accuracy in tariff classification, advanced methods utilising machine learning are indispensable. As mentioned earlier, Alteryx offers a tool palette in this area that could already produce good results with a relatively small number of products to classify. With the Python Tool, additional flexibility would be achieved, as there are specialized deep learning libraries for this purpose.


The example of customs departments clearly shows why Alteryx is rightfully referred to as the ‘Swiss Army Knife’ among analytical tools. Its intuitive drag-and-drop interface enables end-to-end automation of complex invoice processing. The result: Enhanced compliance and optimised use of human resources. Whether processing PDFs, combining different data sources, or exchanging data with business partners – Alteryx has the potential to fundamentally change internal business processes.

Not sure where to start? Get in contact below to find out how.

Leave a Comment