ExMetrix – building predictive models – advanced mode

Through the ExMetrix platform it is possible to build a model of a process or a phenomena that occur around us.
There are two modes of modelling that we can choose from:
1. Simple Mode – it is available from the main menu and building a forecasting model is almost immediate. It is for people who take first steps in data analytics. The user enters just a few parameters. Other settings are performed automatically by the system.
2. Advanced Mode – it is mainly dedicated to users who are a bit more familiar with the platform and modelling. In this mode the user himself determines the parameters, or the form of a mathematical model and, as a result, the implementation of the process is flexible.
The following guide presents the Advanced Mode. The process starts after choosing the subpage “Series”, bookmark “Data groups”.

M1

Each model is built on a basis of pre-selected variables (the description of searching for variables related to each other in simple and advanced mode are in separate articles).
The user chooses the variable for which the model will be built out of all self-prepared list.

M2

After entering the group from the list of variables, select and mark a reference variable (if you have not selected yet or if you want to change it to another)

M3

Then, make sure that is created a matrix of correlation coefficients for a group (at the end of the group). If the matrix has not been created yet, or if the group was modified in the meantime or reference variable was changed, launch the recalculation of the matrix of correlation coefficients.

M4

The first column and the last row of the matrix displayed on the platform include correlation coefficients of all the variables against the reference variable.

M5

The matrix displayed is calculated for maximum shift of variables set in the selection process. If you update the matrix, the maximal shift is 20. A more detailed interpretation of the shifts are described in the article about the advanced mode selection of variables. You can view the matrix of correlation for all shifts (from 0 to max), if you export a file * .xlsx. The last worksheet will include correlation coefficients calculated for optimum shifts and obtained values of optimal shifts (in square brackets).

The first column and the first row in each worksheet of matrix exported to a file contain the correlation coefficients of all variables against the reference variable.

M6

For the zero shift (lack of shift) the matrix is symmetric. For shifts greater than zero, the matrix is asymmetric, because in the columns are the correlation coefficients, e.g. of variable X (in a specific direction) against the variable Y, and in the rows – of the variable Y (moved in the same direction) relative to the variable X. They are usually not equal.
After calculation of the matrix of correlation coefficients we must decide whether the group of variables will be modified before building the model.

M7

Transformation of variables is carried out, for example, in order to obtain a distribution more similar to normal, to receive stationary process (e.g. by differentiation) or to bring variables to known and the same compartments for all variables (scaling). The scaling is particularly used for variables with large nominal values (during the annealing process variables cannot reach values outside the allowable intervals) as well as for the “transfer” of variables into the intervals of positive values, which enables the use of a function in the model of the domains only positive (logarithm, extraction of roots, etc.).

M8

As a result we will obtain a new group of extended name by “_ (postfix)”, containing the converted data (transformed).

M9
Now you are ready to create the model.

M10

Model building is carried out in four steps, in the following sub-bookmarks:
Model> Settings> Initialize> Learning

In sub-tab Model> we set frequency at the beginning (here Month – the model will be created on the basis of the variables aggregated to monthly data, and anticipation of forecast which is set in the next step (the number of periods) will be the anticipation in months (number of months).

We can choose the dependent variable (reference), although the system sets the variable by default here which was previously selected from the variable group.

M11

We do not need to use all the selected variables that are in the group. The number of variables certainly should not exceed the number of observations (quotation). Therefore, we can uncheck those variables that do not enter the model.

M12

Checking a variable TREND means that we will perform a hybrid model consisting of classical decomposition of a reference variable to the trend and cycles. Selecting a variable TIME allows to enter to the model the additional functions of time.

M13

In addition to current values of the variables in the model you can use their past values as nominal values and / or in the form of change. It usually enhances “knowledge” supplied to the learning process of model but also it significantly increases the number of its parameters. We will use the information about previous listings (before how many intervals – here months) by entering it manually and by the boxes ticked “Auto” we will receive the optimum shift, obtained from the previously calculated matrix of correlation coefficients.

M14

M15

Then, go to the sub-tab Settings>. It is generally used to implement mathematical form of our model (functions of the model), generate an appropriate amount of structural parameters, giving them the initial values (in the form of intervals from which will be performed draws by clicking the sub-tab Initialization>) and the imposition of restrictive parameters.

M16

M17
The importance of individual records (signatures) entered in the box Function of the model:

M18

We can enter the function of the model by selecting it from the three ready-made available on the platform (extraction of roots, logarithmic, sinusoidal) or you enter your own form of this function. In both cases, you can not forget that all variables must belong to the domain of the function (eg. extraction of roots, logarithms – positive values, division- values different from zero, etc.). Otherwise, the model may generate error message (at the stage of initiation and learning).

M19

M20

Edition of the number of parameters, their intervals of initial values and boundary conditions for any form of the function model:

M21

M22

In the next step, we go to the tab Initiating>, where undergoes the draw of the initial parameters for the simulated annealing algorithm and (optionally) the initial learning that starts from a few (four) sets of initial parameters. Generally at the end of Initiation we get the best preliminary model for further learning.

M23

M24

The illustration below explains each setting of simulated annealing algorithm. It also determines the observations (with intervals from … to …, which can be several) taken into account in the process of learning and observations (intervals from… to …) for testing (validation) of the model. You should not to exceed the maximum available number of observations (down below, here it is 53 months) or periods of listing available for the model.

M25

maxDe means an additional multiplicator for the Boltzmann distribution which allows the algorithm to increase the probability of choosing a worse solution and thus it is easier to leave the local extreme of the function and continue to explore its global extreme.
We go to the sub-tab Learning> and we set the proper properties of the process of annealing, settings and initiation.

The “Finish” button starts the process of optimizing the structural parameters of the model:

M26

M27

maxDe means an additional multiplicator for the Boltzmann distribution which allows the algorithm to increase the probability of choosing a worse solution and thus it is easier to leave the local extreme of the function and continue to explore its global extreme.
We go to the sub-tab Learning> and we set the proper properties of the process of annealing, settings and initiation.

The “Finish” button starts the process of optimizing the structural parameters of the model:

M28

The estimated model:

M29

You can re-optimise the estimated model (without re-drawing the initial parameters) or start the process again by drawing the initial parameters once again with all other settings unchanged.

M30

M31

M32

After the database update, the model answer can be upgraded (computed on a basis of before estimated values of structural parameters).

M33

It is also possible to use estimated parameters of model and its answers to individual changes of given variables (we can check if the model is susceptible to change of variables or not and whether it may be the basis of reduction of its value eliminating variables of weak influence. Then we can estimate the model again).

M34

M35

The green colour means positive response of the model and the red negative. The variables are organised in decreasing order in terms of answers of the model to individual changes.
M36

Our model is in here:

M37

File to download:
models_advanced_mode