Time Series Forecasting

Table of Contents

What is Time Series Forecasting?

Time Series Forecasting is a technique that determines how future target variables will change by observing historical data and assuming that past and future patterns will be similar. These patterns are defined and captured to yield short or long-term predictions on the changes that will occur.

Time series data is collected at adjacent periods hence, there is potential for correlation between observations, distinguishing time series data from cross-sectional data.

Applications of Time Series Forecasting

There are many use cases of time series forecasting that leverage the power of time series data to make informed decisions and anticipate future trends. Some examples include:

Demand Forecasting

In the retail sector, improving demand forecasting accuracy is a vital component of supply chain management as it allows businesses to promptly meet evolving customer needs and optimise their inventory accordingly. Accurate demand forecasting helps model the delivery of products, affecting costs, sales, customer satisfaction, experience and brand value.

Anomaly Detection for Fraud and Cyber Security

Time series can also be used to detect anomalies in the data by discovering irregular spikes or drops that significantly deviate from regular seasons and trends. For example, PayPal utilises time series analysis to track irregular activity in transactions which are then checked against other suspicious activities such as recent changes in the shipping address to identify likely fraudulent transactions.

Financial Forecasting

Time series analysis is used to predict stock prices, exchange rates, market trends and asset price movements. It helps investors and financial institutions make informed decisions about trading, risk management and portfolio optimisation.

Components of Time Series

When analysing the temporal data of a time series, the components that can be observed and analysed include seasonality, trend and noise. These components can be explored individually to gain a clearer understanding of the data. However, by looking at the data as a whole, two other components can be observed: autocorrelation and stationarity. Understanding these components is crucial in selecting an appropriate time series model to achieve accurate forecast results.

Seasonality

Seasonality in time series data exhibits regular and predictable patterns at time intervals that are smaller than a year. For example, the temperature of a place will be higher in the summer months and lower in the winter months.

Graph showing Seasonality in Time Series Forecasting

Trend

Trends in time series data exhibit an upward or downward trajectory over time. For example, while a place experiences summer vs winter seasonality, they may also be experiencing an overall increase in average temperature over time as a result of global warming.

Time Series Forecasting Trend Graph

Noise

Noise exhibits variability in a time series that can’t be explained by seasonality nor trend. Time series models are usually built by incorporating the seasonal and trend components to achieve accuracy. However, this model will never be 100% accurate due to noise, also known as the error, that will always remain.

Besides these individual components, it is also important to explore the time series data in its entirety to gain a clear understanding of how the data behaves and interacts with one another.

 

Stationarity

A time series is stationary when all statistical characteristics of that series (E.g., mean and variance) are unchanged by shifts in time. Many time series analysis techniques, such as those used for forecasting, are based on the idea that the underlying patterns and relationships in the data are stable over time.

Augmented Dickey-Fuller (ADF) test

The Dickey-Fuller test is a statistical hypothesis test that allows you to detect non-stationarity. It tests the null hypothesis that a time series has a unit root (i.e., it is non-stationary). Rejecting the null hypothesis means considering the time series as stationary.

Differencing

Once the time series data is found to be non-stationary, differencing is required to transform it. Differencing refers to subtracting consecutive observations from one another (E.g., subtracting value at time period t-1 from value at t). Many advanced time series models include a differencing component that will transform non-stationary data according to user specified parameters.

Autocorrelation

Autocorrelation is the correlation between a time series’ current value with past values. For example, revenue of entertainment venues experiencing a rise on a Saturday of the current week and Saturday of the previous week indicates an autocorrelation between those two time series data points.

Autocorrelation and Partial Autocorrelation Plots

As autocorrelation might be hard to determine from the observations of the data alone, Autocorrelation and Partial Autocorrelation plots are used to aid in this process.

The x-axis represents the time steps (back in time) which is also called the number of lags. The y-axis represents the degree of autocorrelation between the lag and the ‘present’ time. For example, in the following autocorrelation function, it can be observed that the degree of autocorrelation between ‘present time’ and ‘one period back in time’ is roughly 0.45.

Time Series Forecasting ACF Graph
Time Series Forecasting PACF Graph

Common Time Series Models

Once there is a clear understanding of and familiarity with the nature of the time series data, it is now time to explore the models that can be used with this data. There is a plethora of time series forecasting models available making use of various mathematical concepts and ranging from varying levels of complexity. The two most common families of time series forecasting are ARIMA and Smoothing.

The ARIMA family

ARIMA, short for ‘Auto Regressive Integrated Moving Average’ is a class of models that predicts the future values of a given time series based on its own past values (lags) and the lagged forecast errors.

Any ‘non-seasonal’ time series that exhibits patterns, and is not a random white noise, can be modelled with ARIMA.

Additionally, if the time series exhibits seasonal components as well, seasonal terms are added to the ARIMA model which then becomes SARIMA (Seasonal ARIMA).

Finally, if there are external variables to consider that could help explain and improve the model, the model can be upgraded to SARIMAX (Seasonal autoregressive integrated moving-average with exogenous regressors).

When should you use an ARIMA model?

ARIMA models are well studied and easy to understand, which is beneficial when working with stakeholders who may not be keen to explore complex machine learning models that they don’t understand. ARIMA is also very agile in adapting to many different types of time series data and can be trained on relatively smaller datasets which is a huge advantage over neural networks or deep learning models.

However, if the data experiences a large mean shift or displays multiple seasonality, the ARIMA family may not be the most equipped to handle those cases. As such, alternative models are worth exploring for example, Facebook’s Prophet model for large mean shift or TBATS model for multiple seasonality.

Smoothing

The smoothing method is a basic statistical technique used to smoothen out time series data that often exhibits both long-term variability and short-term variability. This is done by applying uniform or exponential weights to past observations. By using smoothening on your data, long-term variability becomes more evident as short-term variability (noise) gets removed.

The Simple Moving Average Method

Calculates the current value as the average of past values over a specific period.

Exponential Smoothing

This is an adaptation of the simple moving average where a weight is applied to each past value and the weight decreases exponentially as it moves further away from the current period.

Double Exponential Smoothing

Also known as Holt’s Method, is a further extension made to the simple exponential smoothing method that allows the user to model the data that has a trend.

Triple Exponential Smoothing

Also known as Holt’s Winter Method, further complements the double exponential smoothing method by allowing the user to model data that has both trend and seasonal patterns.

When should you use exponential smoothing?

Just like ARIMA, Exponential Smoothing models are simple enough to be understood by non-technical stakeholders. They are a particularly good option when you have data that is not stationary and cannot easily be transformed into stationary data.

However, exponential smoothing models are not the most optimal choice if the data requires peak performance. For example, if even the slightest inaccuracy in predictive performance will result in a large difference in delivered business value, it is best to transform the data into stationary and opt for a more sophisticated model. Nonetheless, the exponential smoothing family is great for making time series forecasting accessible and easy to understand for beginners.

Advanced Time Series Models

Besides ARIMA and Exponential Smoothing which make up the classical family of time series, there are much more advanced time series specific and deep learning models available for complex uses.

GARCH

GARCH stands for Generalised Autoregressive Conditional Heteroskedasticity. It is an approach to estimating the volatility of financial markets and is generally used specifically for this use case.

TBATS

TBATS, which stands for Trigonometric Seasonal Box-Cox ARMA Time Series, is a forecasting method that incorporates multiple components such as trend, seasonality and irregularities. It is particularly useful for data with multiple seasonalities. For instance, electric consumption is likely to experience yearly seasonality (summer vs winter usage), weekly seasonality (weekday vs weekend usage) and daily seasonality (daytime vs nighttime usage). In such a scenario, TBATS is suitable for accurately modelling the time series data as it has no seasonal constraints, unlike ARIMA or Exponential Smoothing.

N-BEATS

N-BEATS, which stands for Neural Basis Expansion Analysis for Interpretable Time Series Forecasting, is a deep learning-based forecasting model. It aims to provide interpretable predictions by decomposing the time series into a set of interpretable basis functions to capture different patterns and trends in the data. By combining these basis functions, N-BEATS can make predictions for future time points.

PROPHET

Prophet is a black-box model which will generate forecasts without much user specification. While it is advantageous that the model does most of the heavy lifting without the user requiring much knowledge, there exists the risk that the generated model may not work well for the data or specific use case. Therefore, extensive model validation and evaluation are required when working with black-box models.

Time Series Model Selection

After performing the necessary data exploration and decomposition steps, the user will now have a clearer understanding of their data which can then narrow down the list of appropriate models to choose from. The following decision tree serves as guide to achieve that:

Model Evaluation Metrics

Once the list of suitable models has been narrowed down, an evaluation metric needs to be defined to access the performance of each model. There are many popular evaluation metrics available used in forecasting such as:

MetricExplanationEvaluation
Mean Squared Error (MSE)Measures the average of the squares of the errors at each point in time. The error is calculated as the difference between the estimated values and the actual value. By squaring the differences before taking the mean, MSE puts more emphasis on larger errors, making it sensitive to outliers or large prediction errors.
Root Mean Squared Error (RMSE)Measures the square root of the mean of the squared differences between the observed values and the predicted values. It is in the same unit as the predicted and observed values, making it easier to interpret and compare directly with the actual values.
Mean Absolute Error (MAE)Measures the average of the absolute differences between the observed values and the predicted values. Unlike MSE and RMSE, it gives equal weight to all errors, regardless of their magnitude. Hence, it is more suitable for applications where the magnitude of the error is important.
Mean Absolute Percent Error (MAPE)Measures the average of the absolute percentage differences between the observed values and the predicted values. It captures the relative error as a percentage of the observed value, allowing for easier interpretation of the model's accuracy. However, MAPE may become unreliable when the observed values are close to zero, as division by zero is undefined.
Mean Absolute Scaled ErrorCompare the mean absolute error of the model with the mean absolute error of a naive baseline model. This is particularly useful for time series forecasting as it allows for more meaningful and interpretable comparisons across different time series or forecasting tasks.

Cross-Validation in Time Series

Besides the simple train-test split, cross-validation can be used to prevent overfitting of the data and evaluate model accuracy in a more robust way. One example of Cross Validation is K-fold-Cross-Validation where the dataset is split into several folds. The model is then trained on all folds except one remaining fold where the model will be tested. This is repeated until the model has been tested on all the models trained on different training folds.

However, this method is unsuitable for time series modelling as splitting random samples into train and test sets can potentially result in future values being used to predict past values – this is undesirable. Instead, cross-validation can be performed on time series data on a rolling basis. The process involves taking a small portion of the data to train a model, then using that model to make predictions for the remaining data points. The accuracy of the predictions is then measured. The same predicted data points are included in the next training dataset and the process is repeated to make predictions for subsequent data points.

The final model accuracy will be determined by the average of all the evaluation scores from each of the models.

Univariate vs Multivariate Time Series models

Univariate time series models are forecasting models that use only one variable (the target variable) and are specific to time series.

However, it is possible that time series data will contain other variables that could serve as explanatory data about the future. For example, if users are trying to predict future sales for a store, factors such as consumers’ purchasing preferences or economic climate (E.g., Higher disposable income leads to higher purchasing power) play an important role in determining the future sales. In such cases, users can use Multivariate time series models to integrate such external variables into your forecast.

Univariate Time Series modelsMultivariate Time Series models
Use only one variableUse multiple variables
Cannot use external dataCan use external data
Based only on relationships between past and presentBased on relationships between past and present, and between variables

Multivariate Time Series models

One common multivariate time series model is the Vector Autoregression (VAR) model where each variable is regressed on its lagged values as well as the lagged values of other variables in the system. This means that the current value of each variable is influenced by its own past values, as well as the past values of other variables in the data.

Imagine you have several economic variables such as inflation, GDP growth and interest rates and their historical data have been collected over a specific period. VAR considers the past values of each variable and observes how they interact with each other. It assumes that each variable depends on its own past values (lags) as well as the past values of the other variables in the system. For instance, in our example, the current inflation rate might be influenced by the past inflation rate, past GDP growth, and past interest rates. By estimating the coefficients and parameters of the VAR model based on the historical data, we can analyse the dynamic relationships between the variables. This allows us to make predictions or forecast future values of the variables based on their historical patterns and the estimated relationships.

In short, VAR is a time series analysis technique that accounts for the interdependencies among multiple variables by modeling their past values to predict future values.

Summary

Time Series Forecasting is a crucial tool that empowers businesses to anticipate future trends, make informed decisions and optimise their operations. Moreover, with the emergence of drag-and-drop tools such as Alteryx that offers efficient and reliable predictive analytics capabilities, organisations can now easily access and leverage on the power of time series to gain a competitive edge in a rapidly evolving marketplace. Contact us to unlock the potential of time series forecasting and propel your business towards success in an increasingly data-centric world!