ExMetrix – searching for variables related to each other – advanced mode

Through the ExMetrix platform it is easy to analyze nature and level of relations among data.
It is possible to select a data group, that is strongly related to a specified market or economic process or price of a resource.
There are two modes for variable selection:
1. Simple mode – it is available from the main menu and building a forecasting model is almost immediate. It is for people who take first steps in data analytics. The user enters just a few parameters. The properties left are set by the system automatically.
2. Advanced mode – it is mainly dedicated to users who have the analytics and modelling knowledge. This time the user defines parameters and mathematical form of the model.
The following description presents the Easy Mode.
The process starts after choosing the option „Make prognosis” from the main menu.
First step of the process is to choose the subpage “Series”, bookmark “List”

1

Next step is to choose a reference variable, for which similar data will be found. In advanced mode the reference variable is directly chosen from database by entering the name of the variable or key words it contains.

2

Additionally, it is possible to determine attributes of the chosen variable (e.g. frequency of ratings, source of ratings or category the variable belongs to), what results in a faster search. The search starts after clicking ‘search’ button.

3

After the search is complete, results are displayed in the form of a list of variables that fulfill search conditions.

4

The user selects the referential variable from the list(in the above example there is only one variable found).

In order to use automatic selection of related variables, click ‘magnifying glass’ button.

5

Next issue is the process of selecting variables related to the referential variable. Parameters and logical conditions of selection are to be set before starting the process. In accordance with the set parameters, every variable in database will be compared to the referential variable.
‘Date from’ and ‘Date to’ conditions of scanned variables represent a range of dates, for which associated rating values must occur in the database. Rating values may go beyond the determined range. The main goal in here is to eliminate variables listed through too short periods of time.

6

There is an option to set additional requirements for attributes of scanned variables such as name or category, additional requirements are selected from a list (for example a variable may/may not belong to a specified category, its name must/mustn’t contain key words entered by a user). This allows for directing the search to certain branches, markets or categories of data, ignoring data user accounts as unrelated to the referential variable.

7

Selection mechanism allows for setting any number of conditions and grouping them, if needed, conditions and groups of conditions may be deleted.

8

Operators (AND – logical conjunction, OR – logical disjunction) are to be inserted between all elements (conditions and groups of conditions). In this case click one of the blue buttons ‘AND’ or ‘OR’, active button changes shade to a darker one.

9

Using correct logical operators is very important. For example setting correlation coefficient to be smaller than -0.2 and bigger than 0.5 will result in lack of variables related to referential variable. The same result would be for setting the first listing later than end of the listing. Generally speaking, it is a good idea to consider the logic of each setting.

Next stage is to choose a measure used to determine the level of relation between a scanned variable and referential variable. Currently, there are two options of the measure: correlation coefficient or the difference between correlation coefficient and calculated for optimal shift and correlation coefficient calculated for zero shift. Shift terms are explained below.
10

Next step is to determine the time frame for examining the level of relation between variables. In the following example we will analyse the coefficient calculated for last five years, few last moths etc. and to set time frame we use calendar.

11

Then, a maximal shift in time between referential variable and a scanned variables is to be specified.
In the selection process, both variables are set so that quotation dates match (in case of a missing rating dates are synchronized). Therefore, it is ‘zero shift’ – lack of shift in time between compared variables. Next, the process shifts compared variables step by step shifting them gradually until the specified maximal shift, calculating the value of the chosen relation measure for each step. The shift for which the relation measure is at extremum is the ’optimal shift’.

12

There is also the option to set minimal required value of optimal shift. Setting minimal optimal shift results in ignoring variables, for which optimal shift is less than or equal to the set value. Variables with changes in trajectory, that precede changes of the referential variable are the best for forecasting models. Setting optimal shift is an important part of searching for them.

13

Other parameters of the search for variables related to the referential variable are the number of search results in the outcome group of related variables and the frequency of quotations.

14

15

Additional search conditions:

16

Setting the conditions as in the above example: the selection process, based on variables that fulfill the conditions set previously, selects only the ones with correlation coefficient at optimal shift higher than 0.7 or lower than -0.5.
The beginning of the selection process:

17

Confirmation of the selection task:

18

A group of variables related to the referential variable is the product of the selection process. They are selected in accordance with all specified parameters and conditions.
19

File to download:

searching_advanced_mode