Verteego | Pourquoi XGBoost n'est pas suffisant pour l'extrapolation…

XGBoost for Time Series
The math underlying XGBoost
XGBoost cannot extrapolate !!!
Why should you bother with interpolation?
Can we hack XGBoost to overcome this?
Conclusion

XGBoost for Time Series

XGBoost has even been used profitably for forecasting time series here and here for instance. The secret is to feed it with time-related features: lag, frequencies, wavelet coefficients, periods…

As XGBoost is very good at identifying pattern in data, if you have enough temporal features describing your dataset, it will provide very decent predictions.

However, XGBoost lacks an essential feature, that is absolutely critical for time series. Let’s analyse the math that underlies this model to understand what is critically missing for XGBoost to be a good model for time series forecasting.

The math underlying XGBoost

On the XGBoost doc, there is a very didactic article that explains in details how the XGBoost model is derived from mathematical formulas. I strongly advise you too read carefully this paper, as it is essential to truly understand the role of hyperparameters like gamma, alpha, …

As you are all aware, XGBoost is a tree-based model. It stacks as many trees as you want, each additional tree trying to reduce the error. The overall idea is to combine many simple, weak predictors to build a strong one.

But let’s focus on the most important formula of XGBoost documentation: how predictions are computed. It’s a pretty simple formula:

Extract from XGBoost doc.

Where estimation y_i is the prediction, x_i is a vector of features, f_k(x_i) are the values computed for each tree, and K is the total number of trees.

As you can see, XGBoost model is essentially an additional model, with respect to each tree. Let’s have a look at f_k to understand how tree scores are computed and see what kind of function we are talking about here.

Again, the XGBoost doc gives us the answer, and once again it’s quite easy to understand:

Extract from XGBoost doc.

q(x) is a function that attributes features x to a specific leaf of the current tree t. w_q(x) is then the leaf score for the current tree t and the current features x.

To summarize, once you have trained your model, which is the hardest part of the problem, predicting simply boils down to identify the right leaf for each tree, based on features, and sum up the values attached to each leaf.

Now let’s see what’s the concrete consequence of such a model, and what’s the impact on time series forecasting.

XGBoost cannot extrapolate !!!

Once again, XGBoost is a very powerful and efficient tool for classification and regression, but it lacks a very critical feature: it cannot extrapolate! Or at least, it cannot extrapolate something better than a simple constant. No linear, quadratic, or cubic interpolation is possible.

As we have seen in the previous formulas, XGBoost predictions are only based on a sum of values attached to tree leaves. No transformations are applied to these values: no scaling, no log, no exponential, nothing.

This means that XGBoost can only make a good prediction for situations already encountered in the training history. It won’t capture trends!

The few lines of code below are very eloquent, and should be enough to illustrate this limitation and convince you that XGBoost fails at extrapolating:

These few lines of code are trying to use an XGBoost model to forecast the values of a very basic, purely linear system whose output is just proportional to time. As shown in the plot below, XGBoost is very good when interpolating, as you can see for the predictions for t between 0 and 10.

But it completely fails when trying to extrapolate, as we expected after analysis the underlying mathematical model. Indeed, as stated above, an XGBoost model can not predict an event that did not belongs to its training.

Why should you bother with interpolation?

Unfortunately, time series, or at least the ones that are worthy of interest, are usually non-stationary. This means that their statistical characteristics, average, variance, and standard deviation do change with time.

And accurately forecasting this kind of time series requires models that not only capture variations with respect to time but can also extrapolate.

We can study two examples to illustrate this. In the first one, we want to estimate the amount of solar energy received by squared meter on a specific location where the sky is never cloudy, depending on the day. With a few years of data, XGboost will be able to make a very decent estimation, as the quantity of energy received is essentially a geometric problem, and as the motion of the earth around the sun is almost perfectly periodic. We are then facing a stationary system.

On the opposite, let says that we want no longer to predict the solar irradiance but the temperature. As we are now (all?) aware, the earth is overcoming a global warm-up, due to human activities, and the average temperature on earth is rising for more than a century. See figure below:

Global average temperatures between 1850 and 2019

Global average temperature is increasing. Extracted from berkeley earth.

Even though for a given location we observe seasonal effects, the average temperature is not steady in time. Building an XGBoost model, with as many meteorological or climatic feature as you can imagine will never produce good estimations for the future.

Can we hack XGBoost to overcome this?

With some models, it is sometimes possible to hack the underlying math to expand their scope of application.

For instance, you can use simple linear regressive models for modelling and predicting non-linear system by simply feeding them with non-linear features. Hence, by feeding a linear model with the 7 first power of wind speed, you can achieve good performances for wind turbine energy production.

Unfortunately, it’s not possible to tweak the formulas used for prediction in the XGBoost model to introduce support for extrapolation.

One option to combine the powerful pattern identification of XGBoost with extrapolation is to augment XGBoost with a side model in charge of this.

Another one could be to normalize data to remove non-stationary effects and fall back to the stationary case.

Conclusion

XGBoost, and any other tree-based model, cannot mathematically perform any kind of extrapolation of order greater than 0. I.e. they can only extrapolate a constant value. This is a huge limitation to consider when trying to apply this kind of model to non-stationary time series.

However, XGBoost still remains a very attractive tool for bringing out structure in complex data with many features. Using it for forecasting time series can be a good win, as long your target is stationary. If it’s not the case, then you need to preprocess your data to ensure that it is or consider coupling XGBoost with another model responsible for handling trends.

Pourquoi XGBoost n'est pas suffisant pour l'extrapolation des séries temporelles ?

XGBoost for Time Series

The math underlying XGBoost

XGBoost cannot extrapolate !!!

Why should you bother with interpolation?

Can we hack XGBoost to overcome this?

Conclusion

Be informed of the latest news

Calculer des intervalles de confiance avec XGBoost

Guillaume Saupin, CTO at Verteego proposes us to discover how to extend the possibilities of the famous XGBoost algorithm thanks to new objective functions. These new functions allow the calculation of confidence intervals, which are a way for data specialists to calculate sales forecasts in retail.

Verteego cité par Gartner comme fournisseur représentatif dans le "Market Guide for Retail Forecasting and Replenishment Solutions 2022"

Verteego intègre le connecteur de données Snowflake à sa plateforme

Verteego integrates to its platform a new data connector: Snowflake.