dimensionality_check
performs a dimensionality assessment of a set of elementary indicators
using either the Item Response Theory (IRT) framework or the Factor Analysis (FA).
Usage
dimensionality_check(
indicator_list,
dim_method = "IRT",
cutoff = 0.95,
missing = 0,
max_ndim = length(indicator_list),
nrep = 5,
seed = NULL,
arg_tech_list = NULL,
...
)
Arguments
- indicator_list
list of outputs about each indicator computable for the target unit (e.g., company or contracting authority), as returned by, for example,
ind_1()
,ind_2()
, etc.- dim_method
method for the dimensionality assessment, to be chosen between
"IRT"
and"FA"
. If the former is selected, the dimensionality of elementary indicators is evaluated in the IRT framework usingmirt::mirt()
function. On the other hand, exploratory factor analysis is used by means of functionpsych::fa()
. See Details.- cutoff
threshold for dichotomising the indicators (see
normalise()
).- missing
method for imputing missing values (see
manage_missing()
):missing = 0
: missing values are replaced with '0' (not at risk);missing = 1
: missing values are imputed using logistic regression.
- max_ndim
maximum number of dimensions to check in the IRT framework (not greater than the number of elementary indicators).
- nrep
number of replicates for random initialisation of the algorithm for fitting IRT models.
- seed
seed number used during estimation. Default is 12345
- arg_tech_list
a list containing lower level technical parameters for estimation. May be:
NCYCLES maximum number of EM or MH-RM cycles; defaults are 500 and 2000
MAXQUAD maximum number of quadratures, which you can increase if you have more than 4GB or RAM on your PC; default 20000
theta_lim range of integration grid for each dimension; default is
c(-6, 6)
.
Note that when
itemtype = 'ULL'
a log-normal distribution is used and the range is change toc(.01, and 6^2)
, where the second term is the square of the theta_lim input instead- ...
optional arguments for
mirt::mirt()
(e.g., estimation algorithm, convergence threshold, etc.) orpsych::fa()
(e.g., method for factor extraction, rotation method, etc.).
Value
different objects according to dim_method
:
dim_method = "IRT"
: a list of IRT models (as returned bymirt::mirt()
) for each possible dimensional solution, from one tomax_ndim
dimensions;dim_method = "FA"
: best factorial solution (as returned bypsych::fa()
).
Details
The function for dimensionality evaluation about a set of the elementary indicators is implemented
as follows. Firstly, it deals with dichotomised indicators (as those proposed in CO.R.E.),
without missing values. Consequently, before carrying out the dimensionality assessment, the user has to provide
the list of indicators (see argument indicator_list
) together with two further arguments for their dichotomisation
(cutoff
) and missing management (missing
).
Then, the dimensionality check is performed according to the chosen method (dim_method
).
If dim_method = "IRT"
, the IRT framework is considered (by means of mirt::mirt()
function). In this case,
as first step, the function evaluates the model fitting of the Rasch model against the 2PL model
(two-parameter logistic), two of the most widely used IRT models for binary data. It is evaluated under the
unidimensional setting, in order to understand which type of model has a better fit on the data at hand
(using common penalised likelihood metrics, such as AIC, SABIC, BIC, etc.).
As second step, multidimensional models are estimated by incrementing the number of dimensions each time, from two
onwards (until max_ndim
). For a given number of dimensions, say \(d\), several estimates of the IRT model
(i.e., Rasch or 2PL, according to step 1) are obtained on the data at hand according the different initialisations
of the estimation algorithm:
a first initialisation is deterministic, based on observed data;
the others (according to
nrep
) are random, in order to completely explore the likelihood function to maximise.
Finally 1 + nrep
estimates of the IRT model with \(d\) dimensions are obtained and that with the largest
value of maximised likelihood is saved in the list to be returned.
Step 2 is repeated starting from \(d = 2\) until \(d =\) max_ndim
. As the function ends, a list of max_ndim
IRT models is returned, one for each potential number of dimensions. Moreover, a summary of the dimensionality check
is displayed, showing, for each \(d\), the model fitting metrics of the best model with \(d\) dimensions.
This summary helps in selecting the most suitable dimensional solution.
If dim_method = "FA"
, exploratory factor analysis is considered (using psych::fa()
function). In particular,
in order to find the suitable number of factors to extract, this function computes the eigenvalues of the
correlation matrix among the elementary indicators (computed using the tetrachoric correlation, given the binary
nature of our indicators). Then, given the "eigenvalues > 1" rule, the suitable number of factors is retained and
used in calling psych::fa()
. The plot of the eigenvalues against the number of factors is also displayed.
Examples
if (FALSE) {
if (interactive()) {
# sample of 200k contracts
set.seed(456)
i <- sample(1:nrow(mock_data_core), size = 2e5)
mock_sample0 <- mock_data_core[sort(i), ]
# indicators for companies
mock_sample <- tidyr::unnest(mock_sample0, aggiudicatari, keep_empty = TRUE)
mock_sample_variants <- tidyr::unnest(mock_sample, varianti, keep_empty = TRUE)
out_companies <- ind_all(
data = mock_sample,
data_ind8 = mock_sample_variants,
emergency_name = "coronavirus",
target_unit = "companies"
)
out_dim <- dimensionality_check(
indicator_list = out_companies,
dim_method = "IRT",
max_ndim = 4,
cutoff = 0.95,
missing = 0,
nrep = 3,
TOL = 0.1,
verbose = TRUE,
method = "QMCEM"
)
}
}