Normalise the elementary indicators — normalise • coresoi

normalise normalises the elementary indicators using a suitable normalisation method (e.g., ranking, min-max, dichotomisation, etc.).

Usage

normalise(data, method = "binary", cutoff = 0.95)

Arguments

data

data matrix of elementary indicators (as returned by create_indicator_matrix())

method

normalisation method, to be chosen among:

"binary": each elementary indicator is dichotomised (0/1) using a suitable threshold, to be specified through the argument cutoff. Specifically, the normalised indicator will be equal to 1 if the original indicator is greater then the threshold, and 0 otherwise;
"ranking": each elementary indicator is normalised according to the ranking (see rank());
"z-score": each elementary indicator is standardised into z-scores (see scale()). Let $x_{qc}$ be the original value of elementary indicator $q$ for target unit $c$. Then, the z-score is obtained as follows:

$$I_{qc} = \frac{x_{qc} - \mu_q(x_{qc})}{\sigma_q(x_{qc})}$$

"minmax": each elementary indicator is normalised using the 'min-max' criterion:

$$I_{qc} = \frac{x_{qc} - min(x_{qc})}{max(x_{qc}) - min(x_{qc})}$$

"distref": each elementary indicator is normalised by dividing it by its maximum;
"catscale": each elementary indicator is discretised into five categories, according to suitable sample quantiles.

cutoff

threshold for dichotomising the indicators (when method = "binary").

Value

data matrix of normalised indicators according to the chosen method.

Details

In the CO.R.E. project, according to the proposed set of elementary indicators, a suitable normalisation method should be the 'dichotomisation' (i.e., method = "binary"), in order to make all the indicators as binary. In particular, if a given normalised elementary indicator is equal to 1, this means that the target unit is considered at risk on the basis of that indicator. On the other hand, the target unit is considered not at risk.

In fact, some elementary indicators perform a statistical test and return, as risk metric, one minus the p-value of the test (hence, a continuous scale, even if bounded in $[0,1]$); some others do not consider any statistical tests, are binary by nature and directly return the target unit as 'not at risk/at risk' (0/1).

In order to bring all the elementary indicators to the same metric, that is, 'not at risk/at risk' (0/1), the group of indicators that rely on statistical testing must be normalised (i.e., dichotomised) by specifying a suitable threshold for the significance of the involved tests. It corresponds to one minus the threshold for the significance of the p-value of the test performed by the indicator. Example: cutoff = 0.95 means that the threshold for the significance of the p-values is the usual 0.05.

Examples

if (FALSE) {
if (interactive()) {
  # sample of 100k contracts
  set.seed(12345)
  i <- sample(1:nrow(mock_data_core), size = 1e5)
  mock_sample0 <- mock_data_core[sort(i), ]

  # indicators for companies
  mock_sample <- tidyr::unnest(mock_sample0, aggiudicatari, keep_empty = TRUE)
  mock_sample_variants <- tidyr::unnest(mock_sample, varianti, keep_empty = TRUE)

  out_companies <- ind_all(
    data = mock_sample,
    data_ind8 = mock_sample_variants,
    emergency_name = "coronavirus",
    target_unit = "companies"
  )
  indicator_data_matrix <- create_indicator_matrix(out_companies)
  indicator_data_matrix_norm <- normalise(data_matrix, method = "binary", cutoff = 0.99)
}
}