Skip to contents

dimensionality_check performs a dimensionality assessment of a set of elementary indicators using either the Item Response Theory (IRT) framework or the Factor Analysis (FA).

Usage

dimensionality_check(
  indicator_list,
  dim_method = "IRT",
  cutoff = 0.95,
  missing = 0,
  max_ndim = length(indicator_list),
  nrep = 5,
  seed = NULL,
  arg_tech_list = NULL,
  ...
)

Arguments

indicator_list

list of outputs about each indicator computable for the target unit (e.g., company or contracting authority), as returned by, for example, ind_1(), ind_2(), etc.

dim_method

method for the dimensionality assessment, to be chosen between "IRT" and "FA". If the former is selected, the dimensionality of elementary indicators is evaluated in the IRT framework using mirt::mirt() function. On the other hand, exploratory factor analysis is used by means of function psych::fa(). See Details.

cutoff

threshold for dichotomising the indicators (see normalise()).

missing

method for imputing missing values (see manage_missing()):

  • missing = 0: missing values are replaced with '0' (not at risk);

  • missing = 1: missing values are imputed using logistic regression.

max_ndim

maximum number of dimensions to check in the IRT framework (not greater than the number of elementary indicators).

nrep

number of replicates for random initialisation of the algorithm for fitting IRT models.

seed

seed number used during estimation. Default is 12345

arg_tech_list

a list containing lower level technical parameters for estimation. May be:

  • NCYCLES maximum number of EM or MH-RM cycles; defaults are 500 and 2000

  • MAXQUAD maximum number of quadratures, which you can increase if you have more than 4GB or RAM on your PC; default 20000

  • theta_lim range of integration grid for each dimension; default is c(-6, 6).

Note that when itemtype = 'ULL' a log-normal distribution is used and the range is change to c(.01, and 6^2), where the second term is the square of the theta_lim input instead

...

optional arguments for mirt::mirt() (e.g., estimation algorithm, convergence threshold, etc.) or psych::fa() (e.g., method for factor extraction, rotation method, etc.).

Value

different objects according to dim_method:

  • dim_method = "IRT": a list of IRT models (as returned by mirt::mirt()) for each possible dimensional solution, from one to max_ndim dimensions;

  • dim_method = "FA": best factorial solution (as returned by psych::fa()).

Details

The function for dimensionality evaluation about a set of the elementary indicators is implemented as follows. Firstly, it deals with dichotomised indicators (as those proposed in CO.R.E.), without missing values. Consequently, before carrying out the dimensionality assessment, the user has to provide the list of indicators (see argument indicator_list) together with two further arguments for their dichotomisation (cutoff) and missing management (missing).

Then, the dimensionality check is performed according to the chosen method (dim_method).

If dim_method = "IRT", the IRT framework is considered (by means of mirt::mirt() function). In this case, as first step, the function evaluates the model fitting of the Rasch model against the 2PL model (two-parameter logistic), two of the most widely used IRT models for binary data. It is evaluated under the unidimensional setting, in order to understand which type of model has a better fit on the data at hand (using common penalised likelihood metrics, such as AIC, SABIC, BIC, etc.).

As second step, multidimensional models are estimated by incrementing the number of dimensions each time, from two onwards (until max_ndim). For a given number of dimensions, say \(d\), several estimates of the IRT model (i.e., Rasch or 2PL, according to step 1) are obtained on the data at hand according the different initialisations of the estimation algorithm:

  • a first initialisation is deterministic, based on observed data;

  • the others (according to nrep) are random, in order to completely explore the likelihood function to maximise.

Finally 1 + nrep estimates of the IRT model with \(d\) dimensions are obtained and that with the largest value of maximised likelihood is saved in the list to be returned.

Step 2 is repeated starting from \(d = 2\) until \(d =\) max_ndim. As the function ends, a list of max_ndim IRT models is returned, one for each potential number of dimensions. Moreover, a summary of the dimensionality check is displayed, showing, for each \(d\), the model fitting metrics of the best model with \(d\) dimensions. This summary helps in selecting the most suitable dimensional solution.

If dim_method = "FA", exploratory factor analysis is considered (using psych::fa() function). In particular, in order to find the suitable number of factors to extract, this function computes the eigenvalues of the correlation matrix among the elementary indicators (computed using the tetrachoric correlation, given the binary nature of our indicators). Then, given the "eigenvalues > 1" rule, the suitable number of factors is retained and used in calling psych::fa(). The plot of the eigenvalues against the number of factors is also displayed.

See also

Examples

if (FALSE) {
if (interactive()) {
  # sample of 200k contracts
  set.seed(456)
  i <- sample(1:nrow(mock_data_core), size = 2e5)
  mock_sample0 <- mock_data_core[sort(i), ]

  # indicators for companies
  mock_sample <- tidyr::unnest(mock_sample0, aggiudicatari, keep_empty = TRUE)
  mock_sample_variants <- tidyr::unnest(mock_sample, varianti, keep_empty = TRUE)

  out_companies <- ind_all(
    data = mock_sample,
    data_ind8 = mock_sample_variants,
    emergency_name = "coronavirus",
    target_unit = "companies"
  )
  out_dim <- dimensionality_check(
    indicator_list = out_companies,
    dim_method = "IRT",
    max_ndim = 4,
    cutoff = 0.95,
    missing = 0,
    nrep = 3,
    TOL = 0.1,
    verbose = TRUE,
    method = "QMCEM"
  )
}
}