Perform a complete sensitivity analysis of the composite indicator
Source:R/composite.R
composite_sensitivity.Rd
composite_sensitivity
performs a complete sensitivity analysis of the composite
indicator, by computing it using all the possible combinations of methodological choices --
about normalisation, management of missing values, weighting and aggregation -- as well as evaluating
the contribution of each elementary indicator to the final composite, by removing each indicator
at a time from the computation.
Note: the unique normalisation method considered within the CO.R.E. project is the 'dichotomisation'.
Then, the sensitivity is based on the choice of the threshold. See Details.
Usage
composite_sensitivity(
indicator_list,
cutoff = c(0.9, 0.95, 0.99, 0.995),
expert_weights = NULL,
...
)
Arguments
- indicator_list
list of outputs about each indicator computable for the target unit (e.g., company or contracting authority), as returned by
ind_all()
.- cutoff
vector of thresholds for normalising the indicators (i.e., for their dichotomisation).
- expert_weights
user-provided expert weights (not available at the moment for sensitivity analysis).
- ...
optional arguments of
mirt::mirt()
function (for getting IRT weights).
Value
a list with two versions (wide and long) of a dataframe (sens_wide
and sens_long
),
which includes, for each target unit, all the possible values of the composite indicator obtained
by combining methodological choices and indicator removals. See Details.
Details
This is the main function for carrying out the sensitivity analysis of the composite
indicator. It requires a list of indicator outputs as returned by ind_all()
. This list is given to the internal function
create_indicator_matrix()
for obtaining the data matrix of elementary indicators.
Thereafter, several steps for the sensitivity analysis are performed, as follows.
Elementary indicators are normalised through the 'dichotomisation' method, using several thresholds provided through argument
cutoff
.Missing values in the elementary indicators (if any) are managed with both the proposed methods, that is, by replacing missing values with '0' ('not at risk'), or by means of logistic regression models (see internal function
manage_missing()
).Vectors of weights are obtained according to all the three proposed ways, that is, equal weights, expert weights and IRT weights (see
get_weights()
). The last method can require much time, as it depends on the data at hand, which are different every time according to step 1 and 2. Moreover, for weights provided by experts, the user can input a set of own expert weights, using argumentexpert_weights
.Composite indicator computation. For each target unit, the composite indicator is computed on the basis of each combination of the above methodological choices (steps 1-3), hence \(k \times 2 \times 3\) combinations, where \(k\) is the number of normalisation thresholds (
cutoff
). In addition, the composite indicator is computed by removing each elementary indicator at a time from the computation. Finally, given \(Q\) elementary indicators, the composite indicator is computed, for each target unit, \(k \times 2 \times 3 \times (Q+1)\) times.
Results are returned in two dataframes. In the wide version (sens_wide
), we have the target unit
in the rows and as many columns as the number of the above combinations, which report
the computed composite. Specifically, the first column is the target unit ID, whereas the subsequent
columns contain the composite computed according to the different combinations (steps 1-4 above).
The names of these columns have the following structure: cx.my.w_abc.rrr, where
x is the cut-off value for normalising the elementary indicators (e.g., 0.95);
y is the label for missing management method (0 or 1, see [
manage_missing()
);abc is the weighting scheme ('eq' for equal weights; 'exp' for expert weights; 'irt' for IRT weights);
rrr is the indication of the removed indicator ('all' means that no indicator is removed).
For example, column labelled as c0.95.m0.w_eq.all
contains the composite indicators computed using:
0.95 as cut-off value for the normalisation; method '0' for missing management;
equal weights (w_eq); without removal of elementary indicators (all).
The function directly returns also the long version of the above dataframe (sens_long
), where
the target unit is repeated for each 'sensitivity combination'. Here, the columns refer to the variables
that enter in the sensitivity analysis. In particular, we have:
aggregation_name
: target unit IDci
: value of the compositecutoff
: possible cut-off values for the normalisationmiss
: '0' or '1'weights
: 'eq', 'exp' or 'irt'ind_removed
: 'none', '-ind1', '-ind2', ...,
The long version can be useful for further specific analysis.
Examples
if (FALSE) {
if (interactive()) {
# sample of 100k contracts
set.seed(12345)
i <- sample(1:nrow(mock_data_core), size = 1e5)
mock_sample0 <- mock_data_core[sort(i), ]
# indicators for companies
mock_sample <- tidyr::unnest(mock_sample0, aggiudicatari, keep_empty = TRUE)
mock_sample_variants <- tidyr::unnest(mock_sample, varianti, keep_empty = TRUE)
out_companies <- ind_all(
data = mock_sample,
data_ind8 = mock_sample_variants,
emergency_name = "coronavirus",
target_unit = "companies"
)
out_sens <- composite_sensitivity(
indicator_list = out_companies,
cutoff = c(0.90, 0.95, 0.99),
TOL = 0.1 # argument for mirt::mirt function
)
View(out_sens$sens_wide)
View(out_sens$sens_long)
# regression and ANOVA on sensitivity output
datl <- out_sens1$sens_long
datl$ci100 <- 100 * datl$ci
datl$ind_removed2 <- factor(datl$ind_removed) %>%
relevel(ref = "none")
X <- model.matrix(ci100 ~ factor(cutoff) + factor(miss) + factor(weights) + ind_removed2,
data = datl
)
y <- na.omit(datl$ci100)
dat <- data.frame(y, X)
formula <- paste(names(dat)[-1], collapse = "+")
formula <- paste("y~0+", formula)
mod <- lm(formula, data = dat)
summary(mod)
mod_anova <- anova(mod)
mod_anova
# graphical visualisation of the results
# median/mean of CI for each target unit ID
datw$medianCI <- apply(datw[, -c(1, 2)], MARGIN = 1, FUN = median)
datw$meanCI <- apply(datw[, -c(1, 2)], MARGIN = 1, FUN = mean)
datw <- datw %>%
dplyr::relocate(medianCI, .after = aggregation_name) %>%
dplyr::relocate(meanCI, .after = medianCI)
datw_no0 <- datw %>%
dplyr::filter(meanCI != 0)
# long data about the first 400 units
datl_no0 <- datl %>%
dplyr::filter(aggregation_name %in% datw_no0$aggregation_name[1:400]) %>%
dplyr::left_join(datw_no0 %>%
dplyr::select(aggregation_name, medianCI, meanCI)) %>%
dplyr::arrange(meanCI, aggregation_name)
aggr <- datl_no0$aggregation_name %>%
unique()
aggr <- data.frame(id = 1:length(aggr), aggregation_name = aggr)
datl_no0 <- datl_no0 %>%
dplyr::left_join(aggr)
boxplot(datl_no0$ci ~ datl_no0$id,
ylab = "Composite indicator",
xlab = "Company index (1:400)"
)
}
}