Collected data at the heart of the sales forecasting strategy

What is sales forecasting, if not a valuable probability method to support the company's strategy in achieving its objectives? Based on the analysis of past information, data is the raw material of sales forecasting, because how can you make a prediction from nothing? This is one of its limitations, especially when launching a new product or opening a new store, for example.

But data, or should we say big data, is everywhere and is constantly evolving. So a predictive model is not static and must evolve according to changes or resources. Moreover, the relevance of the numerous data collected is one of the indispensable criteria for an accurate and reliable calculation of expected results. As a result, sales forecasting is a new challenge for companies today.

Let's take a look at the data needed for sales forecasting.

Quelles sont les données nécessaires à la prévision des ventes ?-1

The two main types of data crucial to sales forecasting

A sales forecast must integrate a lot of data, because the more important and qualified the data is, the more accurate it will be. These data are grouped into two main categories: internal variables, or endogenous, and external variables (exogenous).

Endogenous data from the company

The endogenous data are therefore the information coming from the company and directly linked to the flows to be predicted. These are variables that are known, controlled and listed in a database, or at least easily collected, if necessary.

But data is omnipresent in the business world. This is why the whole issue of the internal data processing process allowing the elaboration of a sales forecast is based on their extraction, which must be targeted and meet precise objectives for a greater accuracy of the model. It is indeed useless, or even counterproductive, to include too many variables in the probability calculation.

Here is a non-exhaustive list of the most commonly used internal data for sales forecasting:

product variables (product category, brand, packaging, etc.);
price data (sales price, production cost, promotion, price changes, etc.);
information concerning the points of sale (surface area, stocks, location, average turnover, etc.);
sales team information (number, training, qualifications, etc.);
marketing data (promotion, catalog, social networks, etc.);
sales channel variables (delivery, pick-up, point of sale, online sales, etc.).

Exogenous data related to the company's environment

Exogenous data refers to all variables external to the company and the forecasting process. They are specific to the company's direct environment and can have an impact on sales. This is unknown data that must be collected and analyzed in order to keep and integrate into the model only the most relevant and reliable data, as they can lead to errors (noise) in the modeling. Moreover, these are subjugated variables insofar as the company has no influence on them.

The relevance of exogenous data depends on the company's sector of activity. While weather or traffic are factors that can impact the sales of some firms, other variables are common:

seasonality (time of year, month, day of the week, such as payday or weekends, but also vacations);
competition (products, prices, location, target customers, catchment area, etc.);
current events, such as the regulatory context;
consumer behavior and buying habits;
a new trend.

All internal and external data are evolving and not fixed in time and space. Sales forecasts must therefore evolve according to each change or according to a specific period (monthly, quarterly or yearly). This explains why certain data that are considered "noise" at a given period may be interesting to integrate into the model in a different context.

Finally, some specific and temporary variables, but with a strong impact, such as the health crisis linked to Covid-19, can be integrated into the predictive model thanks to the system of binary classifications that indicate to the algorithm the exceptional character of the situation and not a normality.

Quelles sont les données nécessaires à la prévision des ventes ?-2

Historical data for accurate sales forecasting

Advances in artificial intelligence have made it possible to improve the accuracy of forecasting results, particularly through machine learning, which reproduces the incredible ability of living species to learn. Now, algorithms are also able to learn from experience.

If human actions and behaviors are the result of life learning through past experiences, determining our personality and influencing our daily life, the same is true for artificial intelligence. Indeed, machine learning is not simply based on the analysis of data at a defined period. For its predictions to be as reliable and accurate as possible, the calculation must integrate past experiences, called historical data, to capture the average trend.

As we mentioned earlier, the relevance and reliability of the internal and external variables integrated into the model are therefore determining factors for the training of the mathematical calculation. For how can we hope to obtain accurate forecasts free of all algorithmic biases (distorted reality, discrimination, lack of neutrality, etc.) if the training is based on data that are already biased or erroneous? This is one of the challenges that data intelligence must address. According to a Gartner survey, 70% of companies say that the poor quality of processed data has a negative impact on their business.

Nevertheless, machine learning allows to reduce the risk that data errors (or past actions) impact the prediction. To do this, learning is done using thousands of recurrences in order to reduce the gap between the prediction and the real data (a gap called penalty). It is only when the penalties are reduced to a maximum that the model is considered optimal and usable.

For this purpose of forecast quality, the data history must be sufficient and go back over an acceptable period of time (2 years minimum) to be able to integrate a maximum of factors, such as the numerous fluctuations of internal or external origin to the company (seasonality, promotion, context, etc).

Finally, it is important to remember that any sudden change can render the data history useless if the artificial intelligence is not given the necessary knowledge to understand a particular context, hence the interest of binary classifications. The Covid-19 health crisis is, once again, a good example of this, as it has radically changed consumer habits, such as the increase in online orders at the expense of physical stores' turnover or the more frequent use of the drive-through.

We recommend these other pages: