Replication methods (JK1 and JK2) for multilevel linear regression models and trend estimation.

Compute multilevel linear models for complex cluster designs with multiple imputed variables based on the Jackknife (JK1, JK2) procedure. Conceptually, the function combines replication methods and methods for multiple imputed data. Technically, this is a wrapper for the BIFIE.twolevelreg function of the BIFIEsurvey package. repLmer only adds functionality for trend estimation. Please note that the function is not suitable for logistic logit/probit models.

repLmer(datL, ID, wgt = NULL, L1wgt=NULL, L2wgt=NULL, type = c("JK2", "JK1"),
            PSU = NULL, repInd = NULL, jkfac = NULL, rho = NULL, imp=NULL,
            group = NULL, trend = NULL, dependent, formula.fixed, formula.random,
            doCheck = TRUE, na.rm = FALSE, clusters, verbose = TRUE)

Arguments

datL: Data frame in the long format (i.e. each line represents one ID unit in one imputation of one nest) containing all variables for analysis.
ID: Variable name or column number of student identifier (ID) variable. ID variable must not contain any missing values.
wgt: Optional: Variable name or column number of case weighting variable. If no weighting variable is specified, all cases will be equally weighted.
L1wgt: Name of Level 1 weight variable. This is optional. If it is not provided, L1wgt is calculated from the total weight (i.e., wgt) and L2wgt.
L2wgt: Name of Level 2 weight variable
type: Defines the replication method for cluster replicates which is to be applied. Depending on type, additional arguments must be specified (e.g., PSU and/or repInd or repWgt).
PSU: Variable name or column number of variable indicating the primary sampling unit (PSU). When a jackknife procedure is applied, the PSU is the jackknife zone variable. If NULL, no cluster structure is assumed and standard errors are computed according to a random sample.
repInd: Variable name or column number of variable indicating replicate ID. In a jackknife procedure, this is the jackknife replicate variable. If NULL, no cluster structure is assumed and standard errors are computed according to a random sample.
jkfac: Argument is passed to BIFIE.data.jack and specifies the factor for multiplying jackknife replicate weights.
rho: Fay factor for statistical inference. The argument is passed to the fayfac argument of the BIFIE.data.jack function from the BIFIEsurvey package. See the corresponding help page for further details. For convenience, if rho = NULL (the default) and type = "JK1", BIFIE.data.jack is called with jktype="JK_GROUP" and fayfac = rho, where \(\rho = (N_{cluster} - 1) \times N_{cluster}^{-1}\)
imp: Name or column number of the imputation variable.
group: Optional: column number or name of one grouping variable. Note: in contrast to repMean, only one grouping variable can be specified.
trend: Optional: name or column number of the trend variable which contains the measurement time of the survey. repLmer computes differences for all pairwise contrasts defined by trend variable levels. or three measurement occasions, i.e. 2010, 2015, and 2020, contrasts (i.e. trends) are computed for 2010 vs. 2015, 2010 vs. 2020, and 2015 vs. 2020.
dependent: Name or column number of the dependent variable
formula.fixed: An R formula for fixed effects
formula.random: An R formula for random effects
doCheck: Logical: Check the data for consistency before analysis? If TRUE groups with insufficient data are excluded from analysis to prevent subsequent functions from crashing.
na.rm: Logical: Should cases with missing values be dropped?
clusters: Variable name or column number of cluster variable.
verbose: Logical: Show analysis information on console?

Value

A list of data frames in the long format. The output can be summarized using the report function. The first element of the list is a list with either one (no trend analyses) or two (trend analyses) data frames with at least six columns each. For each subpopulation denoted by the groups statement, each dependent variable, each parameter and each coefficient the corresponding value is given.

group: Denotes the group an analysis belongs to. If no groups were specified and/or analysis for the whole sample were requested, the value of ‘group’ is ‘wholeGroup’.
depVar: Denotes the name of the dependent variable in the analysis.
modus: Denotes the mode of the analysis. For example, if a JK2 analysis without sampling weights was conducted, ‘modus’ takes the value ‘jk2.unweighted’. If a analysis without any replicates but with sampling weights was conducted, ‘modus’ takes the value ‘weighted’.
parameter: Denotes the parameter of the regression model for which the corresponding value is given further. Amongst others, the ‘parameter’ column takes the values ‘(Intercept)’ and ‘gendermale’ if ‘gender’ was the dependent variable, for instance. See example 1 for further details.
coefficient: Denotes the coefficient for which the corresponding value is given further. Takes the values ‘est’ (estimate) and ‘se’ (standard error of the estimate).
value: The value of the parameter estimate in the corresponding group.

If groups were specified, further columns which are denoted by the group names are added to the data frame.

Examples

### load example data (long format)
data(lsa)
### use only the first nest, use only reading
btRead <- subset(lsa, nest==1 & domain=="reading")

# \donttest{
### random intercept model with groups
mod1 <- repLmer(datL = btRead, ID = "idstud", wgt = "wgt", L1wgt="L1wgt", L2wgt="L2wgt",
        type = "jk2", PSU = "jkzone", repInd = "jkrep", imp = "imp",trend="year",
        group="country", dependent="score", formula.fixed = ~as.factor(sex)+mig,
        formula.random=~1, clusters="idclass")
#> Logical variable 'mig' will be transformed into numeric.
#> 
#> Trend group: '2010'
#> 1 analyse(s) overall according to: 'group.splits = 1'.
#> Assume unnested structure with 3 imputations.
#> 
#> `BIFIEsurvey::BIFIE.data.jack`(data = "datL", wgt = "wgt", jktype = "JK_TIMSS", 
#>     jkzone = "jkzone", jkrep = "jkrep", jkfac = NULL, fayfac = NULL, 
#>     cdata = FALSE)
#> MI data with 3 datasets || 92 replication weights with fayfac=1  || 3079 cases and 14 variables 
#>  
#> Imputation 1 | Group 1 |---------- 
#> Imputation 1 | Group 2 |---------- 
#> Imputation 1 | Group 3 |---------- 
#> Imputation 2 | Group 1 |---------- 
#> Imputation 2 | Group 2 |---------- 
#> Imputation 2 | Group 3 |---------- 
#> Imputation 3 | Group 1 |---------- 
#> Imputation 3 | Group 2 |---------- 
#> Imputation 3 | Group 3 |---------- 
#> 
#> 
#> Trend group: '2015'
#> 1 analyse(s) overall according to: 'group.splits = 1'.
#> Assume unnested structure with 3 imputations.
#> 
#> `BIFIEsurvey::BIFIE.data.jack`(data = "datL", wgt = "wgt", jktype = "JK_TIMSS", 
#>     jkzone = "jkzone", jkrep = "jkrep", jkfac = NULL, fayfac = NULL, 
#>     cdata = FALSE)
#> MI data with 3 datasets || 73 replication weights with fayfac=1  || 2928 cases and 14 variables 
#>  
#> Imputation 1 | Group 1 |-------- 
#> Imputation 1 | Group 2 |-------- 
#> Imputation 1 | Group 3 |-------- 
#> Imputation 2 | Group 1 |-------- 
#> Imputation 2 | Group 2 |-------- 
#> Imputation 2 | Group 3 |-------- 
#> Imputation 3 | Group 1 |-------- 
#> Imputation 3 | Group 2 |-------- 
#> Imputation 3 | Group 3 |-------- 
#> 
#> Note: No linking error was defined. Linking error will be defaulted to '0'.
res1 <- report(mod1)

### random slope without groups and without trend
mod2 <- repLmer(datL = subset(btRead, country=="countryA" & year== 2010),
        ID = "idstud", wgt = "wgt", L1wgt="L1wgt", L2wgt="L2wgt", type = "jk2",
        PSU = "jkzone", repInd = "jkrep", imp = "imp", dependent="score",
        formula.fixed = ~as.factor(sex)*mig, formula.random=~mig, clusters="idclass")
#> Logical variable 'mig' will be transformed into numeric.
#> 1 analyse(s) overall according to: 'group.splits = 0'.
#> Assume unnested structure with 3 imputations.
#> 
#> `BIFIEsurvey::BIFIE.data.jack`(data = "datL", wgt = "wgt", jktype = "JK_TIMSS", 
#>     jkzone = "jkzone", jkrep = "jkrep", jkfac = NULL, fayfac = NULL, 
#>     cdata = FALSE)
#> MI data with 3 datasets || 32 replication weights with fayfac=1  || 1034 cases and 15 variables 
#>  
#> Imputation 1 | Group 1 |--- 
#> Imputation 2 | Group 1 |--- 
#> Imputation 3 | Group 1 |--- 
#> 
res2 <- report(mod2)
# }