Perform a complete sensitivity analysis of the composite indicator

composite_sensitivity performs a complete sensitivity analysis of the composite indicator, by computing it using all the possible combinations of methodological choices -- about normalisation, management of missing values, weighting and aggregation -- as well as evaluating the contribution of each elementary indicator to the final composite, by removing each indicator at a time from the computation. Note: the unique normalisation method considered within the CO.R.E. project is the 'dichotomisation'. Then, the sensitivity is based on the choice of the threshold. See Details.

Usage

composite_sensitivity(
  indicator_list,
  cutoff = c(0.9, 0.95, 0.99, 0.995),
  expert_weights = NULL,
  ...
)

Arguments

indicator_list: list of outputs about each indicator computable for the target unit (e.g., company or contracting authority), as returned by ind_all().
cutoff: vector of thresholds for normalising the indicators (i.e., for their dichotomisation).
expert_weights: user-provided expert weights (not available at the moment for sensitivity analysis).
...: optional arguments of mirt::mirt() function (for getting IRT weights).

Value

a list with two versions (wide and long) of a dataframe (sens_wide and sens_long), which includes, for each target unit, all the possible values of the composite indicator obtained by combining methodological choices and indicator removals. See Details.

Details

This is the main function for carrying out the sensitivity analysis of the composite indicator. It requires a list of indicator outputs as returned by ind_all(). This list is given to the internal function create_indicator_matrix() for obtaining the data matrix of elementary indicators.

Thereafter, several steps for the sensitivity analysis are performed, as follows.

Elementary indicators are normalised through the 'dichotomisation' method, using several thresholds provided through argument cutoff.
Missing values in the elementary indicators (if any) are managed with both the proposed methods, that is, by replacing missing values with '0' ('not at risk'), or by means of logistic regression models (see internal function manage_missing()).
Vectors of weights are obtained according to all the three proposed ways, that is, equal weights, expert weights and IRT weights (see get_weights()). The last method can require much time, as it depends on the data at hand, which are different every time according to step 1 and 2. Moreover, for weights provided by experts, the user can input a set of own expert weights, using argument expert_weights.
Composite indicator computation. For each target unit, the composite indicator is computed on the basis of each combination of the above methodological choices (steps 1-3), hence \(k \times 2 \times 3\) combinations, where \(k\) is the number of normalisation thresholds (cutoff). In addition, the composite indicator is computed by removing each elementary indicator at a time from the computation. Finally, given \(Q\) elementary indicators, the composite indicator is computed, for each target unit, \(k \times 2 \times 3 \times (Q+1)\) times.

Results are returned in two dataframes. In the wide version (sens_wide), we have the target unit in the rows and as many columns as the number of the above combinations, which report the computed composite. Specifically, the first column is the target unit ID, whereas the subsequent columns contain the composite computed according to the different combinations (steps 1-4 above). The names of these columns have the following structure: cx.my.w_abc.rrr, where

x is the cut-off value for normalising the elementary indicators (e.g., 0.95);
y is the label for missing management method (0 or 1, see [manage_missing());
abc is the weighting scheme ('eq' for equal weights; 'exp' for expert weights; 'irt' for IRT weights);
rrr is the indication of the removed indicator ('all' means that no indicator is removed).

For example, column labelled as c0.95.m0.w_eq.all contains the composite indicators computed using: 0.95 as cut-off value for the normalisation; method '0' for missing management; equal weights (w_eq); without removal of elementary indicators (all).

The function directly returns also the long version of the above dataframe (sens_long), where the target unit is repeated for each 'sensitivity combination'. Here, the columns refer to the variables that enter in the sensitivity analysis. In particular, we have:

aggregation_name: target unit ID
ci: value of the composite
cutoff: possible cut-off values for the normalisation
miss: '0' or '1'
weights: 'eq', 'exp' or 'irt'
ind_removed: 'none', '-ind1', '-ind2', ...,

The long version can be useful for further specific analysis.

Examples

if (FALSE) {
if (interactive()) {
  # sample of 100k contracts
  set.seed(12345)
  i <- sample(1:nrow(mock_data_core), size = 1e5)
  mock_sample0 <- mock_data_core[sort(i), ]

  # indicators for companies
  mock_sample <- tidyr::unnest(mock_sample0, aggiudicatari, keep_empty = TRUE)
  mock_sample_variants <- tidyr::unnest(mock_sample, varianti, keep_empty = TRUE)

  out_companies <- ind_all(
    data = mock_sample,
    data_ind8 = mock_sample_variants,
    emergency_name = "coronavirus",
    target_unit = "companies"
  )
  out_sens <- composite_sensitivity(
    indicator_list = out_companies,
    cutoff = c(0.90, 0.95, 0.99),
    TOL = 0.1 # argument for mirt::mirt function
  )
  View(out_sens$sens_wide)
  View(out_sens$sens_long)

  # regression and ANOVA on sensitivity output
  datl <- out_sens1$sens_long
  datl$ci100 <- 100 * datl$ci
  datl$ind_removed2 <- factor(datl$ind_removed) %>%
    relevel(ref = "none")
  X <- model.matrix(ci100 ~ factor(cutoff) + factor(miss) + factor(weights) + ind_removed2,
    data = datl
  )
  y <- na.omit(datl$ci100)
  dat <- data.frame(y, X)
  formula <- paste(names(dat)[-1], collapse = "+")
  formula <- paste("y~0+", formula)
  mod <- lm(formula, data = dat)
  summary(mod)
  mod_anova <- anova(mod)
  mod_anova

  # graphical visualisation of the results
  # median/mean of CI for each target unit ID
  datw$medianCI <- apply(datw[, -c(1, 2)], MARGIN = 1, FUN = median)
  datw$meanCI <- apply(datw[, -c(1, 2)], MARGIN = 1, FUN = mean)
  datw <- datw %>%
    dplyr::relocate(medianCI, .after = aggregation_name) %>%
    dplyr::relocate(meanCI, .after = medianCI)
  datw_no0 <- datw %>%
    dplyr::filter(meanCI != 0)

  # long data about the first 400 units
  datl_no0 <- datl %>%
    dplyr::filter(aggregation_name %in% datw_no0$aggregation_name[1:400]) %>%
    dplyr::left_join(datw_no0 %>%
      dplyr::select(aggregation_name, medianCI, meanCI)) %>%
    dplyr::arrange(meanCI, aggregation_name)
  aggr <- datl_no0$aggregation_name %>%
    unique()
  aggr <- data.frame(id = 1:length(aggr), aggregation_name = aggr)
  datl_no0 <- datl_no0 %>%
    dplyr::left_join(aggr)
  boxplot(datl_no0$ci ~ datl_no0$id,
    ylab = "Composite indicator",
    xlab = "Company index (1:400)"
  )
}
}