repGlm.Rd
Compute generalized linear models for complex cluster designs with multiple imputed variables based
on the Jackknife (JK1, JK2) or balanced repeated replicates (BRR) procedure. Conceptually, the function combines replication
methods and methods for multiple imputed data. Technically, this is a wrapper for the svyglm
function
of the survey
package.
repGlm(datL, ID, wgt = NULL, type = c("none", "JK2", "JK1", "BRR", "Fay"), PSU = NULL,
repInd = NULL, repWgt = NULL, nest=NULL, imp=NULL, groups = NULL,
group.splits = length(groups), group.delimiter = "_",
cross.differences = FALSE, trend = NULL, linkErr = NULL, formula,
family=gaussian, forceSingularityTreatment = FALSE,
glmTransformation = c("none", "sdY"), doCheck = TRUE, na.rm = FALSE,
poolMethod = c("mice", "scalar"), useWec = FALSE,
scale = 1, rscales = 1, mse=TRUE, rho=NULL, hetero=TRUE,
se_type = c("HC3", "HC0", "HC1", "HC2", "CR0", "CR2"),
clusters = NULL, crossDiffSE.engine= c("lavaan", "lm"),
stochasticGroupSizes = FALSE, verbose = TRUE, progress = TRUE,
nCores=NULL)
Data frame in the long format (i.e. each line represents one ID unit in one imputation of one nest) containing all variables for analysis.
Variable name or column number of student identifier (ID) variable. ID variable must not contain any missing values.
Optional: Variable name or column number of weighting variable. If no weighting variable is specified, all cases will be equally weighted.
Defines the replication method for cluster replicates which is to be applied. Depending on type
, additional
arguments must be specified (e.g., PSU
and/or repInd
or repWgt
).
Variable name or column number of variable indicating the primary sampling unit (PSU). When a jackknife procedure is applied,
the PSU is the jackknife zone variable. If NULL
, no cluster structure is assumed and
standard errors are computed according to a random sample.
Variable name or column number of variable indicating replicate ID. In a jackknife procedure, this is the jackknife replicate
variable. If NULL
, no cluster structure is assumed and standard errors are computed according to a random sample.
Normally, replicate weights are created by repGlm
directly from PSU
and repInd
variables. Alternatively,
if replicate weights are included in the data.frame, specify the variable names or column number in the repWgt
argument.
Optional: name or column number of the nesting variable. Only applies in nested multiple imputed data sets.
Optional: name or column number of the imputation variable. Only applies in multiple imputed data sets.
Optional: vector of names or column numbers of one or more grouping variables.
Optional: If groups are defined, group.splits
optionally specifies whether analysis should be done also
in the whole group or overlying groups. See examples for more details.
Character string which separates the group names in the output frame.
Either a list of vectors, specifying the pairs of levels for which cross-level differences should be computed.
Alternatively, if TRUE
, cross-level differences for all pairs of levels are computed. If FALSE
, no cross-level
differences are computed. (see examples 2a, 3, and 4 in the help file of the repMean
function)
Optional: name or column number of the trend variable which contains the measurement time of the survey.
Note: Levels of all grouping variables and predictors must be equal in all 'sub populations' partitioned by the discrete trend variable.
repGlm
computes differences for all pairwise contrasts defined by trend variable levels. or three measurement
occasions, i.e. 2010, 2015, and 2020, contrasts (i.e. trends) are computed for 2010 vs. 2015, 2010 vs. 2020, and
2015 vs. 2020.
Optional: name or column number of the linking error variable. If NULL
, a linking error of 0 will be assumed in trend estimation.
Model formula, see help page of glm
for details.
A description of the error distribution and link function to be used in the model. See help page of glm
for details.
Logical: Forces the function to use the workaround to handle singularities in regression models.
Optional: Allows for transformation of parameters from linear regression and logistic regression before pooling.
Useful to compare parameters from different glm models, see Mood (2010). Note: This argument applies only if
forceSingularityTreatment
is set to 'TRUE'.
Logical: Check the data for consistency before analysis? If TRUE
groups with insufficient data are excluded from
analysis to prevent subsequent functions from crashing.
Logical: Should cases with missing values be dropped?
Which pooling method should be used? The “mice” method is recommended.
Logical: use weighted effect coding?
scaling constant for variance, for details, see help page of svrepdesign
from the survey
package
scaling constant for variance, for details, see help page of svrepdesign
from the survey
package
Logical: If TRUE
, compute variances based on sum of squares around the point estimate, rather than the mean of the replicates.
See help page of svrepdesign
from the survey
package for further details.
Shrinkage factor for weights in Fay's method. See help page of svrepdesign
from the survey
package for further details.
Logical: Assume heteroscedastic variance for weighted effect coding? Only applies for random samples, i.e. if no replication analyses are executed.
The sort of standard error sought for cross level differences. Only applies if crossDiffSE == "wec"
and hetero == TRUE
and crossDiffSE.engine == "lm"
. See the help page of lm_robust
from the estimatr
package for further details.
Optional: Variable name or column number of cluster variable. Only necessary if weighted effecting coding
should be performed using heteroscedastic variances. See the help page of lm_robust
from the estimatr
package for further details.
Optional: Sort of estimator which should be used for standard error estimation in weighted effect coding regression.
Only applies if useWec == TRUE
. To date, only lavaan allows for stochastic group sizes.
Logical: Assume stochastic group sizes for using weighted effect coding regression with categorical predictors? Note: To date, only lavaan allows for stochastic group sizes. Stochastic group sizes cannot be assumed if any replication method (jackknife, BRR) is applied.
Logical: Show analysis information on console?
Logical: Show progress bar on console?
integer (default: NULL), number of cores to use for parallel processing, if engine = "survey"
. If NULL
,
single core processing is used.
Function first creates replicate weights based on PSU
and repInd
variables according to JK2 or
BRR procedure. According to multiple imputed data sets, a workbook with several analyses is created.
The function afterwards serves as a wrapper for svyglm
implemented in the survey
package.
The results of the several analyses are then pooled according to Rubin's rule, which is adapted for nested
imputations if the nest
argument implies a nested structure.
A list of data frames in the long format. The output can be summarized using the report
function.
The first element of the list is a list with either one (no trend analyses) or two (trend analyses)
data frames with at least six columns each. For each subpopulation denoted by the groups
statement, each dependent variable, each parameter and each coefficient the corresponding value is given.
Denotes the group an analysis belongs to. If no groups were specified and/or analysis for the whole sample were requested, the value of ‘group’ is ‘wholeGroup’.
Denotes the name of the dependent variable in the analysis.
Denotes the mode of the analysis. For example, if a JK2 analysis without sampling weights was conducted, ‘modus’ takes the value ‘jk2.unweighted’. If a analysis without any replicates but with sampling weights was conducted, ‘modus’ takes the value ‘weighted’.
Denotes the parameter of the regression model for which the corresponding value is given further. Amongst others, the ‘parameter’ column takes the values ‘(Intercept)’ and ‘gendermale’ if ‘gender’ was the dependent variable, for instance. See example 1 for further details.
Denotes the coefficient for which the corresponding value is given further. Takes the values ‘est’ (estimate) and ‘se’ (standard error of the estimate).
The value of the parameter estimate in the corresponding group.
If groups were specified, further columns which are denoted by the group names are added to the data frame.
te Grotenhuis, M., Pelzer, B., Eisinga, R., Nieuwenhuis, R., Schmidt-Catran, A., & Konig, R. (2017). When size matters: advantages of weighted effect coding in observational studies. International Journal of Public Health. 62, 163–167.
### load example data (long format)
data(lsa)
### use only the first nest
bt <- lsa[which(lsa[,"nest"] == 1),]
### use only data from 2010
bt2010 <- bt[which(bt[,"year"] == 2010),]
## use only reading data
bt2010read <- bt2010[which(bt2010[,"domain"] == "reading"),]
### Example 1: Computes linear regression from reading score on gender separately
### for each country. Assume no nested structure.
mod1 <- repGlm(datL = bt2010read, ID = "idstud", wgt = "wgt", type = "jk2",
PSU = "jkzone", repInd = "jkrep", imp = "imp", groups = "country",
formula = score~sex, family ="gaussian")
#> 1 analyse(s) overall according to: 'group.splits = 1'.
#> Assume unnested structure with 3 imputations.
#> Create 92 replicate weights according to JK2 procedure.
#>
res1 <- report(mod1, printGlm = TRUE)
#> Warning: `report()` was deprecated in eatRep 0.15.0.
#> ℹ For the original behavior of report() please use eatRep version 0.14.7:
#> 'https://cran.r-project.org/src/contrib/Archive/eatRep/'
#> Trend group: 'noTrend'.
#> groups: type = point; country = countryA; row = 1; id = 21583614_1
#> dependent Variable: score
#>
#> parameter est se t.value p.value sig
#> 1 (Intercept) 508.406 4.321 117.651 0.000 ***
#> 2 sexmale 6.088 5.831 1.044 0.297
#>
#> R-squared: 0.002; SE(R-squared): NA
#> 1034 observations and 1032 degrees of freedom.
#> ------------------------------------------------------------------
#> groups: type = point; country = countryB; row = 4; id = 21583614_4
#> dependent Variable: score
#>
#> parameter est se t.value p.value sig
#> 1 (Intercept) 502.607 5.049 99.554 0.000 ***
#> 2 sexmale 11.005 7.177 1.533 0.126
#>
#> R-squared: 0.005; SE(R-squared): NA
#> 959 observations and 957 degrees of freedom.
#> ------------------------------------------------------------------
#> groups: type = point; country = countryC; row = 7; id = 21583614_7
#> dependent Variable: score
#>
#> parameter est se t.value p.value sig
#> 1 (Intercept) 526.287 3.949 133.267 0.000 ***
#> 2 sexmale 15.268 5.873 2.600 0.009 **
#>
#> R-squared: 0.009; SE(R-squared): NA
#> 1086 observations and 1084 degrees of freedom.
# \donttest{
### Example 2: Computes log linear regression from pass/fail on ses and gender
### separately for each country in a nested structure. Assuming equally weighted
### cases by omitting "wgt" argument
dat <- lsa[intersect(which(lsa[,"year"] == 2010), which(lsa[,"domain"] == "reading")),]
mod2 <- repGlm(datL = dat, ID = "idstud", type = "JK2", PSU = "jkzone",
repInd = "jkrep", imp = "imp", nest="nest", groups = "country",
formula = passReg~sex*ses, family = quasibinomial(link="logit"))
#> Method 'mice' is not available for nested imputation. Switch to method 'scalar'.
#> 1 analyse(s) overall according to: 'group.splits = 1'.
#> Assume nested structure with 2 nests and 3 imputations in each nest. This will result in 2 x 3 = 6 imputation replicates.
#> Create 92 replicate weights according to JK2 procedure.
#> No sample sizes given. Will not compute standard error of pooled R squared.
#> No sample sizes given. Will not compute standard error of pooled R squared.
#> No sample sizes given. Will not compute standard error of pooled R squared.
#>
res2 <- report(mod2, printGlm = TRUE)
#> Trend group: 'noTrend'.
#> groups: type = point; country = countryA; row = 1; id = 21583862_1
#> dependent Variable: passReg
#>
#> parameter est se t.value p.value sig
#> 1 (Intercept) -1.765 0.357 -4.949 NA <NA>
#> 2 ses 0.027 0.005 4.929 NA <NA>
#> 3 sexmale -0.055 0.443 -0.125 NA <NA>
#> 4 sexmale:ses 0.004 0.008 0.435 NA <NA>
#>
#> R-squared: 0.098; SE(R-squared): 0.098
#> observations and degrees of freedom.
#> ------------------------------------------------------------------
#> groups: type = point; country = countryB; row = 4; id = 21583862_4
#> dependent Variable: passReg
#>
#> parameter est se t.value p.value sig
#> 1 (Intercept) -2.464 0.361 -6.834 NA <NA>
#> 2 ses 0.033 0.007 4.697 NA <NA>
#> 3 sexmale 0.179 0.528 0.339 NA <NA>
#> 4 sexmale:ses 0.000 0.010 0.025 NA <NA>
#>
#> R-squared: 0.13; SE(R-squared): 0.13
#> observations and degrees of freedom.
#> ------------------------------------------------------------------
#> groups: type = point; country = countryC; row = 7; id = 21583862_7
#> dependent Variable: passReg
#>
#> parameter est se t.value p.value sig
#> 1 (Intercept) -1.373 0.313 -4.386 NA <NA>
#> 2 ses 0.028 0.006 4.594 NA <NA>
#> 3 sexmale 0.453 0.413 1.097 NA <NA>
#> 4 sexmale:ses -0.003 0.008 -0.356 NA <NA>
#>
#> R-squared: 0.091; SE(R-squared): 0.091
#> observations and degrees of freedom.
### Example 3: Like example 1, but without any replication methods
### trend estimation (without linking error) and nested imputation
dat <- lsa[which(lsa[,"domain"] == "reading"),]
mod3 <- repGlm(datL = dat, ID = "idstud", wgt = "wgt", imp = "imp", nest = "nest",
groups = "country", formula = score~sex, trend = "year")
#> Method 'mice' is not available for nested imputation. Switch to method 'scalar'.
#>
#> Trend group: '2010'
#> 1 analyse(s) overall according to: 'group.splits = 1'.
#> Assume nested structure with 2 nests and 3 imputations in each nest. This will result in 2 x 3 = 6 imputation replicates.
#> No sample sizes given. Will not compute standard error of pooled R squared.
#> No sample sizes given. Will not compute standard error of pooled R squared.
#> No sample sizes given. Will not compute standard error of pooled R squared.
#>
#>
#> Trend group: '2015'
#> 1 analyse(s) overall according to: 'group.splits = 1'.
#> Assume nested structure with 2 nests and 3 imputations in each nest. This will result in 2 x 3 = 6 imputation replicates.
#> No sample sizes given. Will not compute standard error of pooled R squared.
#> No sample sizes given. Will not compute standard error of pooled R squared.
#> No sample sizes given. Will not compute standard error of pooled R squared.
#>
#> Note: No linking error was defined. Linking error will be defaulted to '0'.
res3 <- report(mod3, printGlm = TRUE)
#> Trend group: '2010'.
#> groups: type = point; country = countryA; row = 1; id = 21583940_1; year = 2010
#> dependent Variable: score
#>
#> parameter est se t.value p.value sig
#> 1 (Intercept) 508.643 3.611 140.859 NA <NA>
#> 2 sexmale 6.003 4.962 1.210 NA <NA>
#>
#> R-squared: 0.002; SE(R-squared): 0.002
#> observations and degrees of freedom.
#> ------------------------------------------------------------------
#> groups: type = point; country = countryB; row = 4; id = 21583940_4; year = 2010
#> dependent Variable: score
#>
#> parameter est se t.value p.value sig
#> 1 (Intercept) 503.053 4.483 112.221 NA <NA>
#> 2 sexmale 10.068 6.164 1.633 NA <NA>
#>
#> R-squared: 0.004; SE(R-squared): 0.004
#> observations and degrees of freedom.
#> ------------------------------------------------------------------
#> groups: type = point; country = countryC; row = 7; id = 21583940_7; year = 2010
#> dependent Variable: score
#>
#> parameter est se t.value p.value sig
#> 1 (Intercept) 526.381 3.684 142.874 NA <NA>
#> 2 sexmale 14.263 5.383 2.649 NA <NA>
#>
#> R-squared: 0.008; SE(R-squared): 0.008
#> observations and degrees of freedom.
#> ============================================================================
#> Trend group: '2015'.
#> groups: type = point; country = countryA; row = 1; id = 21583960_1; year = 2015
#> dependent Variable: score
#>
#> parameter est se t.value p.value sig
#> 1 (Intercept) 507.311 3.676 138.010 NA <NA>
#> 2 sexmale 1.158 5.018 0.231 NA <NA>
#>
#> R-squared: 0; SE(R-squared): 0
#> observations and degrees of freedom.
#> ------------------------------------------------------------------
#> groups: type = point; country = countryB; row = 4; id = 21583960_4; year = 2015
#> dependent Variable: score
#>
#> parameter est se t.value p.value sig
#> 1 (Intercept) 492.666 4.757 103.563 NA <NA>
#> 2 sexmale 6.548 6.390 1.025 NA <NA>
#>
#> R-squared: 0.002; SE(R-squared): 0.002
#> observations and degrees of freedom.
#> ------------------------------------------------------------------
#> groups: type = point; country = countryC; row = 7; id = 21583960_7; year = 2015
#> dependent Variable: score
#>
#> parameter est se t.value p.value sig
#> 1 (Intercept) 514.175 3.444 149.311 NA <NA>
#> 2 sexmale 12.004 4.813 2.494 NA <NA>
#>
#> R-squared: 0.005; SE(R-squared): 0.005
#> observations and degrees of freedom.
# }