normalise
normalises the elementary indicators using a suitable normalisation method
(e.g., ranking, min-max, dichotomisation, etc.).
Arguments
- data
data matrix of elementary indicators (as returned by
create_indicator_matrix()
)- method
normalisation method, to be chosen among:
"binary"
: each elementary indicator is dichotomised (0/1) using a suitable threshold, to be specified through the argumentcutoff
. Specifically, the normalised indicator will be equal to 1 if the original indicator is greater then the threshold, and 0 otherwise;"ranking"
: each elementary indicator is normalised according to the ranking (seerank()
);"z-score"
: each elementary indicator is standardised into z-scores (seescale()
). Let \(x_{qc}\) be the original value of elementary indicator \(q\) for target unit \(c\). Then, the z-score is obtained as follows:
$$I_{qc} = \frac{x_{qc} - \mu_q(x_{qc})}{\sigma_q(x_{qc})}$$
"minmax"
: each elementary indicator is normalised using the 'min-max' criterion:
$$I_{qc} = \frac{x_{qc} - min(x_{qc})}{max(x_{qc}) - min(x_{qc})}$$
"distref"
: each elementary indicator is normalised by dividing it by its maximum;"catscale"
: each elementary indicator is discretised into five categories, according to suitable sample quantiles.
- cutoff
threshold for dichotomising the indicators (when
method = "binary"
).
Details
In the CO.R.E. project, according to the proposed set of elementary indicators,
a suitable normalisation method should be the 'dichotomisation' (i.e., method = "binary"
),
in order to make all the indicators as binary. In particular, if a given normalised elementary
indicator is equal to 1, this means that the target unit is considered at risk on the basis
of that indicator. On the other hand, the target unit is considered not at risk.
In fact, some elementary indicators perform a statistical test and return, as risk metric, one minus the p-value of the test (hence, a continuous scale, even if bounded in \([0,1]\)); some others do not consider any statistical tests, are binary by nature and directly return the target unit as 'not at risk/at risk' (0/1).
In order to bring all the elementary indicators to the same metric, that is, 'not at risk/at risk' (0/1),
the group of indicators that rely on statistical testing must be normalised (i.e., dichotomised)
by specifying a suitable threshold for the significance of the involved tests.
It corresponds to one minus the threshold for the significance of the p-value of the test performed
by the indicator. Example: cutoff = 0.95
means that the threshold for the significance of the p-values
is the usual 0.05.
Examples
if (FALSE) {
if (interactive()) {
# sample of 100k contracts
set.seed(12345)
i <- sample(1:nrow(mock_data_core), size = 1e5)
mock_sample0 <- mock_data_core[sort(i), ]
# indicators for companies
mock_sample <- tidyr::unnest(mock_sample0, aggiudicatari, keep_empty = TRUE)
mock_sample_variants <- tidyr::unnest(mock_sample, varianti, keep_empty = TRUE)
out_companies <- ind_all(
data = mock_sample,
data_ind8 = mock_sample_variants,
emergency_name = "coronavirus",
target_unit = "companies"
)
indicator_data_matrix <- create_indicator_matrix(out_companies)
indicator_data_matrix_norm <- normalise(data_matrix, method = "binary", cutoff = 0.99)
}
}