Replication methods (JK1, JK2 and BRR) for quantiles and trend estimation.

Compute quantiles with standard errors for complex cluster designs with multiple imputed variables (e.g. plausible values) based on Jackknife (JK1, JK2) or balanced repeated replicates (BRR) procedure. Conceptually, the function combines replication methods and methods for multiple imputed data. Technically, this is a wrapper for the svyquantile() function of the survey package.

repQuantile(datL, ID, wgt = NULL, type = c("none", "JK2", "JK1", "BRR", "Fay"),
            PSU = NULL, repInd = NULL, repWgt = NULL, nest=NULL, imp=NULL,
            groups = NULL, group.splits = length(groups), cross.differences = FALSE,
            group.delimiter = "_", trend = NULL, linkErr = NULL, dependent,
            probs = c(0.25, 0.50, 0.75),  na.rm = FALSE, nBoot = NULL,
            bootMethod = c("wSampling","wQuantiles") , doCheck = TRUE,
            scale = 1, rscales = 1, mse=TRUE,
            rho=NULL, verbose = TRUE, progress = TRUE)

Arguments

datL: Data frame in the long format (i.e. each line represents one ID unit in one imputation of one nest) containing all variables for analysis.
ID: Variable name or column number of student identifier (ID) variable. ID variable must not contain any missing values.
wgt: Optional: Variable name or column number of weighting variable. If no weighting variable is specified, all cases will be equally weighted.
type: Defines the replication method for cluster replicates which is to be applied. Depending on type, additional arguments must be specified (e.g., PSU and/or repInd or repWgt).
PSU: Variable name or column number of variable indicating the primary sampling unit (PSU). When a jackknife procedure is applied, the PSU is the jackknife zone variable. If NULL, no cluster structure is assumed and standard errors are computed according to a random sample.
repInd: Variable name or column number of variable indicating replicate ID. In a jackknife procedure, this is the jackknife replicate variable. If NULL, no cluster structure is assumed and standard errors are computed according to a random sample.
repWgt: Normally, replicate weights are created by repQuantile directly from PSU and repInd variables. Alternatively, if replicate weights are included in the data.frame, specify the variable names or column number in the repWgt argument.
nest: Optional: name or column number of the nesting variable. Only applies in nested multiple imputed data sets.
imp: Optional: name or column number of the imputation variable. Only applies in multiple imputed data sets.
groups: Optional: vector of names or column numbers of one or more grouping variables.
group.splits: Optional: If groups are defined, group.splits optionally specifies whether analysis should be done also in the whole group or overlying groups. See examples for more details.
cross.differences: Either a list of vectors, specifying the pairs of levels for which cross-level differences should be computed. Alternatively, if TRUE, cross-level differences for all pairs of levels are computed. If FALSE, no cross-level differences are computed. (see examples 2a, 3, and 4 in the help file of the repMean function)
group.delimiter: Character string which separates the group names in the output frame.
trend: Optional: name or column number of the trend variable which contains the measurement time of the survey. Note: Levels of all grouping variables must be equal in all 'sub populations' partitioned by the discrete trend variable. repQuantile computes differences for all pairwise contrasts defined by trend variable levels. or three measurement occasions, i.e. 2010, 2015, and 2020, contrasts (i.e. trends) are computed for 2010 vs. 2015, 2010 vs. 2020, and 2015 vs. 2020.
linkErr: Optional: Either the name or column number of the linking error variable. If NULL, a linking error of 0 will be assumed in trend estimation. Alternatively, linking errors may be given as data.frame with following specifications: Two columns, named trendLevel1 and trendLevel2 which contain the levels of the trend variable. The contrasts between both values indicates which trend is meant. For only two measurement occasions, i.e. 2010 and 2015, trendLevel1 should be 2010, and trendLevel2 should be 2015. For three measurement occasions, i.e. 2010, 2015, and 2020, additional lines are necessary where trendLevel1 should be 2010, and trendLevel2 should be 2020, to mark the contrast between 2010 and 2020, and further additional lines are necessary where trendLevel1 should be 2015, and trendLevel2 should be 2020. The column depVar must include the name of the dependent variable. This string must correspond to the name of the dependent variable in the data. The column parameter indicates the parameter the linking error belongs to. Column linkingError includes the linking error value. Providing linking error in a data.frame is necessary for more than two measurement occasions.
dependent: Variable name or column number of the dependent variable.
probs: Numeric vector with probabilities for which to compute quantiles.
na.rm: Logical: Should cases with missing values be dropped?
nBoot: Optional: Without replicates, standard error cannot be computed in a weighted sample. Alternatively, standard errors may be computed using the boot package. nBoot therefore specifies the number of bootstrap samples. If not specified, no standard errors will be given. In analyses containing replicates or samples without specifying person weights, nBoot will be ignored.
bootMethod: Optional: If standard error are computed in a bootstrap, two possible methods may be applied. wSampling requests the function to draw nBoot weighted bootstrap samples for which unweighted quantiles are computed. wQuantiles requests the function to draw nBoot unweighted bootstrap samples for which weighted quantiles are computed.
doCheck: Logical: Check the data for consistency before analysis? If TRUE groups with insufficient data are excluded from analysis to prevent subsequent functions from crashing.
scale: scaling constant for variance, for details, see help page of svrepdesign from the survey package
rscales: scaling constant for variance, for details, see help page of svrepdesign from the survey package
mse: Logical: If TRUE, compute variances based on sum of squares around the point estimate, rather than the mean of the replicates. See help page of svrepdesign from the survey package for further details.
rho: Shrinkage factor for weights in Fay's method. See help page of svrepdesign from the survey package for further details.
verbose: Logical: Show analysis information on console?
progress: Logical: Show progress bar on console?

Details

Function first creates replicate weights based on PSU and repInd variables according to JK2 or BRR procedure implemented in WesVar. According to multiple imputed data sets, a workbook with several analyses is created. The function afterwards serves as a wrapper for svyquantile called by svyby implemented in the survey package. The results of the several analyses are then pooled according to Rubins rule, which is adapted for nested imputations if the dependent argument implies a nested structure.

Value

A list of data frames in the long format. The output can be summarized using the report function. The first element of the list is a list with either one (no trend analyses) or two (trend analyses) data frames with at least six columns each. For each subpopulation denoted by the groups statement, each dependent variable, each parameter (i.e., the values of the corresponding categories of the dependent variable) and each coefficient (i.e., the estimate and the corresponding standard error) the corresponding value is given.

group: Denotes the group an analysis belongs to. If no groups were specified and/or analysis for the whole sample were requested, the value of ‘group’ is ‘wholeGroup’.
depVar: Denotes the name of the dependent variable in the analysis.
modus: Denotes the mode of the analysis. For example, if a JK2 analysis without sampling weights was conducted, ‘modus’ takes the value ‘jk2.unweighted’. If a analysis without any replicates but with sampling weights was conducted, ‘modus’ takes the value ‘weighted’.
parameter: Denotes the parameter of the regression model for which the corresponding value is given further. For frequency tables, this is the value of the category of the dependent variable which relative frequency is given further.
coefficient: Denotes the coefficient for which the corresponding value is given further. Takes the values ‘est’ (estimate) and ‘se’ (standard error of the estimate).
value: The value of the parameter, i.e. the relative frequency or its standard error.

If groups were specified, further columns which are denoted by the group names are added to the data frame.

Examples

# \donttest{
data(lsa)
### Example 1: only means, SD and variances for each country
### We only consider domain 'reading'
rd     <- lsa[which(lsa[,"domain"] == "reading"),]

### We only consider the first "nest".
rdN1   <- rd[which(rd[,"nest"] == 1),]

### First, we only consider year 2010
rdN1y10<- rdN1[which(rdN1[,"year"] == 2010),]

### First example: Computes percentile in a nested data structure for reading 
### scores conditionally on country and for the whole group 
perzent   <- repQuantile(datL = rd, ID = "idstud", wgt = "wgt", type = "JK2",
             PSU = "jkzone", repInd = "jkrep", imp = "imp", nest="nest",
             groups = "country", group.splits = c(0:1), dependent = "score", 
             probs = seq(0.1,0.9,0.2) )
#> 2 analyse(s) overall according to: 'group.splits = 0 1'.
#>  
#>  analysis.number hierarchy.level groups.divided.by group.differences.by
#>                1               0                                     NA
#>                2               1           country                   NA
#> 
#> Assume nested structure with 2 nests and 3 imputations in each nest. This will result in 2 x 3 = 6 imputation replicates.
#> Create 98 replicate weights according to JK2 procedure.
#> 
res       <- report(perzent, add = list(domain = "reading"))

### Second example: Computes percentile for reading scores conditionally on country,
### use 100 bootstrap samples, assume no nested structure 
perzent2  <- repQuantile(datL = rdN1y10, ID = "idstud", wgt = "wgt",
             imp = "imp", groups = "country", dependent = "score",
             probs = seq(0.1,0.9,0.2), nBoot = 100 )
#> 1 analyse(s) overall according to: 'group.splits = 1'.
#> Assume unnested structure with 3 imputations.
#> 
res2      <- report(perzent, add = list(domain = "reading"))
# }