repMean.Rd
Compute totals, means, adjusted means, mean differences, variances and standard deviations
with standard errors in random or clustered or complex samples. Variance estimation in complex cluster
designs based on Jackknife (JK1, JK2) or Balanced Repeated Replicates (BRR) procedure. Moreover, analyses
can be customized for multiple or nested imputed variables, applying the combination rules of Rubin (1987)
for imputed data and Rubin (2003) for nested imputed data. Conceptually, the function combines replication
methods and methods for multiple imputed data. Trend estimation as usual in large-scale assessments is supported as well.
Technically, this is a wrapper for the svymean
and svyvar
functions of the survey
package.
repMean (datL, ID, wgt = NULL, type = c("none", "JK2", "JK1", "BRR", "Fay"), PSU = NULL,
repInd = NULL, jkfac=NULL, repWgt = NULL, nest=NULL, imp=NULL, groups = NULL,
group.splits = length(groups), group.differences.by = NULL,
cross.differences = FALSE, crossDiffSE = c("wec", "rep","old"),
adjust = NULL, useEffectLiteR = FALSE, nBoot = 100, group.delimiter = "_",
trend = NULL, linkErr = NULL, dependent, na.rm = FALSE, doCheck = TRUE,
engine = c("survey", "BIFIEsurvey"), scale = 1, rscales = 1, mse=TRUE,
rho=NULL, hetero=TRUE, se_type = c("HC3", "HC0", "HC1", "HC2", "CR0", "CR2"),
clusters = NULL, crossDiffSE.engine= c("lavaan", "lm"),
stochasticGroupSizes = FALSE, verbose = TRUE, progress = TRUE, nCores=NULL)
Data frame in the long format (i.e. each line represents one ID unit in one imputation of one nest) containing all variables for analysis.
Variable name or column number of student identifier (ID) variable. ID variable must not contain any missing values.
Optional: Variable name or column number of weighting variable. If no weighting variable is specified, all cases will be equally weighted.
Defines the replication method for cluster replicates which is to be applied. Depending on type
, additional
arguments must be specified (e.g., PSU
and/or repInd
or repWgt
).
Variable name or column number of variable indicating the primary sampling unit (PSU). When a jackknife procedure is applied,
the PSU is the jackknife zone variable. If NULL
, no cluster structure is assumed and
standard errors are computed according to a random sample.
Variable name or column number of variable indicating replicate ID. In a jackknife procedure, this is the jackknife replicate
variable. If NULL
, no cluster structure is assumed and standard errors are computed according to a random sample.
Only applies if engine = "BIFIEsurvey"
. Argument is passed to BIFIE.data.jack
and specifies
the factor for multiplying jackknife replicate weights.
Normally, replicate weights are created by repMean
directly from PSU
and repInd
variables. Alternatively,
if replicate weights are included in the data.frame, specify the variable names or column number in the repWgt
argument.
Optional: name or column number of the nesting variable. Only applies in nested multiple imputed data sets.
Optional: name or column number of the imputation variable. Only applies in multiple imputed data sets.
Optional: vector of names or column numbers of one or more grouping variables.
Optional: If groups are defined, group.splits
optionally specifies whether analysis should be done also
in the whole group or overlying groups. See examples for more details.
Optional: Specifies variable group differences should be computed for. The corresponding variable must be included in
the groups
statement. Exception: choose 'wholePop' if you want to estimate each's group difference from the
overall sample mean. See examples for further details.
Either a list of vectors, specifying the pairs of levels for which cross-level differences should be computed.
Alternatively, if TRUE
, cross-level differences for all pairs of levels are computed. If FALSE
, no cross-level
differences are computed. (see example 2a, 3, and 4)
Method for standard error estimation for cross level differences, where groups are dependent.
wec
uses weighted effect coding, rep
uses replication methods (bootstrap or jackknife) to
estimate the standard error between the total mean and group-specific means. old
does not account for dependent
groups and treat the groups as if they were independent from each other.
Variable name or column number of variable(s) for which adjusted means should be computed. Non-numeric variables (factors) will be converted to 0/1 dichotomous variables.
Logical: use the lavaan
-wrapper EffectLiteR
to compute adjusted means? Alternatively, adjusted means
are computed by applying a simple linear regression model in each group, using the variables in adjust
as
independent variables. Afterwards, the coefficients are weighted with the (weighted) means of the independent
variables. Standard errors for this procedure are received using the delta method by applying an augmented
variance-covariance matrix which assumes zero covariances between independent variable means and regression
coefficients. We recommend to set useEffectLiteR = TRUE
if no replication methods are applied. When
replication methods are used (jackknife-1, jackknife-2, BRR), we recommend to set useEffectLiteR = FALSE
,
because otherwise the estimation is very slow.
Without replicates (i.e., for completely random samples), the rep
method for standard error estimation for
cross level differences needs a bootstrap. nBoot
therefore specifies the number of bootstrap samples.
This argument is only necessary, if crossDiffSE = "rep"
and none of the replicate methods
(JK1, JK2, or BRR) is applied. Otherwise, nBoot
will be ignored.
Character string which separates the group names in the output frame.
Optional: name or column number of the trend variable which contains the measurement time of the survey.
Note: Levels of all grouping variables must be equal in all 'sub populations' partitioned by the discrete trend variable.
repMean
computes differences for all pairwise contrasts defined by trend variable levels. or three measurement
occasions, i.e. 2010, 2015, and 2020, contrasts (i.e. trends) are computed for 2010 vs. 2015, 2010 vs. 2020, and
2015 vs. 2020.
Optional: Either the name or column number of the linking error variable. If NULL
, a linking error of 0 will be assumed in trend estimation.
Alternatively, linking errors may be
given as data.frame
with following specifications: Two columns, named trendLevel1
and trendLevel2
which contain the
levels of the trend variable. The contrasts between both values indicates which trend is meant. For only two measurement occasions, i.e.
2010 and 2015, trendLevel1
should be 2010
, and trendLevel2
should be 2015
. For three measurement occasions,
i.e. 2010, 2015, and 2020, additional lines are necessary where trendLevel1
should be 2010
, and trendLevel2
should be
2020
, to mark the contrast between 2010 and 2020, and further additional lines are necessary where trendLevel1
should be
2015
, and trendLevel2
should be 2020
. The column depVar
must include the name of the dependent variable. This
string must correspond to the name of the dependent variable in the data. The column parameter
indicates the parameter the linking
error belongs to. Column linkingError
includes the linking error value. Providing linking error in a data.frame is necessary for
more than two measurement occasions. See the example 3a for further details.
Variable name or column number of the dependent variable.
Logical: Should cases with missing values be dropped?
Logical: Check the data for consistency before analysis? If TRUE
groups with insufficient data are excluded from
analysis to prevent subsequent functions from crashing.
Which package should be used for estimation?
scaling constant for variance, for details, see help page of svrepdesign
from the survey
package
scaling constant for variance, for details, see help page of svrepdesign
from the survey
package
Logical: If TRUE
, compute variances based on sum of squares around the point estimate, rather than the mean of the replicates.
See help page of svrepdesign
from the survey
package for further details.
Shrinkage factor for weights in Fay's method. If engine = "survey"
, argument is passed to the rho
argument of
the svrepdesign
function from the survey
package. See the corresponding help page for further
details. If engine = "BIFIEsurvey"
, argument is passed to the fayfac
argument of the
BIFIE.data.jack
function from the BIFIEsurvey
package. See the
corresponding help page for further details. For convenience, if rho = NULL
(the default)
and engine = "BIFIEsurvey"
and type = "JK1"
, BIFIE.data.jack
is called with jktype="JK_GROUP"
and fayfac = rho
, where \(\rho = (N_{cluster} - 1) \times N_{cluster}^{-1}\)
Logical: Assume heteroscedastic variance for weighted effect coding?
The sort of standard error sought for cross level differences. Only applies if crossDiffSE == "wec"
and hetero == TRUE
and crossDiffSE.engine == "lm"
. See the help page of lm_robust
from the estimatr
package for further details.
Optional: Variable name or column number of cluster variable. Only necessary if weighted effecting coding
should be performed using heteroscedastic variances. See the help page of lm_robust
from the estimatr
package for further details.
Software implementation used for estimating cross-level differences. Choices are either "lavaan"
(required if
stochasticGroupSites == "TRUE"
) or R function lm
. "lavaan"
is the default.
Logical: Assume stochastic group sizes for using weighted effect coding in cross-level differences? Note: To date,
only crossDiffSE.engine = "lavaan"
allows for stochastic group sizes. Stochastic group sizes are not yet
implemented for any replication method (jackknife, BRR).
Logical: Show analysis information on console?
Logical: Show progress bar on console?
integer (default: NULL), number of cores to use for parallel processing, if engine = "survey"
. If NULL
,
single core processing is used.
Function first creates replicate weights based on PSU and repInd variables (if defined) according to JK2 or BRR procedure
as implemented in WesVar. According to multiple imputed data sets, a workbook with several analyses is created.
The function afterwards serves as a wrapper for svymean
called by svyby
implemented in
the ‘survey’ package. The results of the several analyses are then pooled according to Rubin's rule.
A list of data frames in the long format. The output can be summarized using the report
function.
The first element of the list is a list with either one (no trend analyses) or two (trend analyses)
data frames with at least six columns each. For each subpopulation denoted by the groups
statement, each parameter (i.e., mean, variance, or group differences) and each coefficient (i.e., the estimate
and the corresponding standard error) the corresponding value is given.
Denotes the group an analysis belongs to. If no groups were specified and/or analysis for the whole sample were requested, the value of ‘group’ is ‘wholeGroup’.
Denotes the name of the dependent variable in the analysis.
Denotes the mode of the analysis. For example, if a JK2 analysis without sampling weights was conducted, ‘modus’ takes the value ‘jk2.unweighted’. If a analysis without any replicates but with sampling weights was conducted, ‘modus’ takes the value ‘weighted’.
Denotes the parameter of the regression model for which the corresponding value is given further. Amongst others, the ‘parameter’ column takes the values ‘mean’, ‘sd’, ‘var’ and ‘meanGroupDiff’ if group differences were requested.
Denotes the coefficient for which the corresponding value is given further. Takes the values ‘est’ (estimate) and ‘se’ (standard error of the estimate).
The value of the parameter estimate in the corresponding group.
If groups were specified, further columns which are denoted by the group names are added to the data frame.
te Grotenhuis, M., Pelzer, B., Eisinga, R., Nieuwenhuis, R., Schmidt-Catran, A., & Konig, R. (2017). When size matters: advantages of weighted effect coding in observational studies. International Journal of Public Health. 62, 163–167.
Sachse, K. A. & Haag, N. (2017). Standard errors for national trends in international large-scale assessments in the case of cross-national differential item functioning. Applied Measurement in Education, 30, (2), 102-116. http://dx.doi.org/10.1080/08957347.2017.1283315
Weirich, S., Hecht, M., Becker, B. et al. Comparing group means with the total mean in random samples, surveys, and large-scale assessments: A tutorial and software illustration. Behav Res (2021). https://doi.org/10.3758/s13428-021-01553-1
data(lsa)
### Example 1: only means, SD and variances for each country
### We only consider domain 'reading'
rd <- lsa[which(lsa[,"domain"] == "reading"),]
### We only consider the first "nest".
rdN1 <- rd[which(rd[,"nest"] == 1),]
### First, we only consider year 2010
rdN1y10<- rdN1[which(rdN1[,"year"] == 2010),]
# \donttest{
### mean estimation
means1 <- repMean(datL = rdN1y10, ID="idstud", wgt="wgt", type = "JK2", PSU = "jkzone",
repInd = "jkrep", imp="imp", groups = "country", dependent = "score",
na.rm=FALSE, doCheck=TRUE, engine = "BIFIEsurvey")
#> 1 analyse(s) overall according to: 'group.splits = 1'.
#> Assume unnested structure with 3 imputations.
#>
#> `BIFIEsurvey::BIFIE.data.jack`(data = "datL", wgt = "wgt", jktype = "JK_TIMSS",
#> jkzone = "jkzone", jkrep = "jkrep", jkfac = NULL, fayfac = NULL,
#> cdata = FALSE)
#> MI data with 3 datasets || 92 replication weights with fayfac=1 || 3079 cases and 10 variables
#> 'BIFIE.univar' for 'call = mean'. dependent = 'score'. group(s) = 'country'.
#>
### reporting function: the function does not know which content domain is being considered,
### so the user may add new columns in the output using the 'add' argument
res1 <- report(means1, add = list(domain = "reading"))
### Example 1a: Additionally to example 1, we decide to estimate whether
### each country's mean differ significantly from the overall mean as well
### as from the individual means of the other contries
means1a<- repMean(datL = rdN1y10, ID="idstud", wgt="wgt", type = "JK2", PSU = "jkzone",
repInd = "jkrep", imp="imp", groups = "country", group.splits = 0:1,
group.differences.by = "country", cross.differences = TRUE,
dependent = "score", na.rm=FALSE, doCheck=TRUE, hetero=FALSE)
#> 2 analyse(s) overall according to: 'group.splits = 0 1'.
#>
#> analysis.number hierarchy.level groups.divided.by group.differences.by
#> 1 0 <NA>
#> 2 1 country country
#>
#> Assume unnested structure with 3 imputations.
#> Create 92 replicate weights according to JK2 procedure.
#>
#> Compute cross level differences using 'wec' method. Assume homoscedastic variances.
#> 1 analyse(s) overall according to: 'group.splits = 0'.
#> Assume unnested structure with 3 imputations.
#> Create 92 replicate weights according to JK2 procedure.
#>
res1a <- report(means1a, add = list(domain = "reading"))
### See that the means of all countries significantly differ from the overall mean.
print(res1a[intersect(which(res1a[,"comparison"] == "crossDiff"),
which(res1a[,"parameter"] == "mean")),], digits = 3)
#> parameter modus depVar comparison group
#> 4 mean JK2.mean__survey score crossDiff crossDiff (countryA - total)
#> 10 mean JK2.mean__survey score crossDiff crossDiff (countryB - total)
#> 17 mean JK2.mean__survey score crossDiff crossDiff (countryC - total)
#> domain country es est p se
#> 4 reading countryA.vs.wholeGroup -0.139 -11.1 0.000 2.95
#> 10 reading countryB.vs.wholeGroup -0.169 -14.1 0.001 4.43
#> 17 reading countryC.vs.wholeGroup 0.141 11.6 0.000 2.07
### Example 2: Sex differences by country. Assume equally weighted cases by omitting
### 'wgt' argument.
means2 <- repMean(datL = rdN1y10, ID="idstud", type = "JK2", PSU = "jkzone",
repInd = "jkrep", imp="imp", groups = c("country", "sex"), group.splits = 0:2,
group.differences.by="sex", dependent = "score", na.rm=FALSE, doCheck=TRUE,
cross.differences =TRUE, crossDiffSE.engine= "lm")
#> 4 analyse(s) overall according to: 'group.splits = 0 1 2'.
#>
#> analysis.number hierarchy.level groups.divided.by group.differences.by
#> 1 0 <NA>
#> 2 1 country <NA>
#> 3 1 sex sex
#> 4 2 country + sex sex
#>
#> Assume unnested structure with 3 imputations.
#> Create 92 replicate weights according to JK2 procedure.
#>
#> Compute cross level differences using 'wec' method. Assume heteroscedastic variances.
#> 'wec' method: Assume equally weighted cases.
#> 1 analyse(s) overall according to: 'group.splits = 0'.
#> Assume unnested structure with 3 imputations.
#> Create 92 replicate weights according to JK2 procedure.
#>
#> 'wec' method: Assume equally weighted cases.
#> 1 analyse(s) overall according to: 'group.splits = 0'.
#> Assume unnested structure with 3 imputations.
#> Create 92 replicate weights according to JK2 procedure.
#>
#> 'wec' method: Assume equally weighted cases.
#> 1 analyse(s) overall according to: 'group.splits = 0'.
#> Assume unnested structure with 3 imputations.
#> Create 32 replicate weights according to JK2 procedure.
#>
#> 'wec' method: Assume equally weighted cases.
#> 1 analyse(s) overall according to: 'group.splits = 0'.
#> Assume unnested structure with 3 imputations.
#> Create 60 replicate weights according to JK2 procedure.
#>
#> 'wec' method: Assume equally weighted cases.
#> 1 analyse(s) overall according to: 'group.splits = 0'.
#> Assume unnested structure with 3 imputations.
#> Create 40 replicate weights according to JK2 procedure.
#>
#> 'wec' method: Assume equally weighted cases.
#> 1 analyse(s) overall according to: 'group.splits = 0'.
#> Assume unnested structure with 3 imputations.
#> Create 91 replicate weights according to JK2 procedure.
#>
#> 'wec' method: Assume equally weighted cases.
#> 1 analyse(s) overall according to: 'group.splits = 0'.
#> Assume unnested structure with 3 imputations.
#> Create 92 replicate weights according to JK2 procedure.
#>
res2 <- report(means2,add = list(domain = "reading"))
#> Warning: Standard error correction for 'crossDiff_of_groupDiff' is currently not supported.
#> Warning: Standard error correction for crossDifferences across multiple hierarchy levels is currently not supported.
### Example 2a: Additionally to example 2, we decide to estimate whether
### each country's mean differ significantly from the overall mean. (Note: by default,
### such cross level differences are estimated using 'weighted effect coding'. Use the
### 'crossDiffSE' argument to choose alternative methods.) Moreover, we estimate whether
### each country's sex difference differ significantly from the sex difference in the
### whole population.
means2a<- repMean(datL = rdN1y10, ID="idstud", wgt="wgt", type = "JK2", PSU = "jkzone",
repInd = "jkrep", imp="imp", groups = c("country", "sex"), group.splits = 0:2,
group.differences.by="sex", cross.differences = list(c(0,1), c(0,2)),
dependent = "score", na.rm=FALSE, doCheck=TRUE,
crossDiffSE.engine= "lm", clusters = "idclass")
#> 4 analyse(s) overall according to: 'group.splits = 0 1 2'.
#>
#> analysis.number hierarchy.level groups.divided.by group.differences.by
#> 1 0 <NA>
#> 2 1 country <NA>
#> 3 1 sex sex
#> 4 2 country + sex sex
#>
#> Assume unnested structure with 3 imputations.
#> Create 92 replicate weights according to JK2 procedure.
#>
#> Warning: Computation of cross level differences using 'wec' method is only possible for differences according to adjacent levels. Non-adjacent levels will be ignored.
#> 1 analyse(s) overall according to: 'group.splits = 0'.
#> Assume unnested structure with 3 imputations.
#> Create 92 replicate weights according to JK2 procedure.
#>
#> 1 analyse(s) overall according to: 'group.splits = 0'.
#> Assume unnested structure with 3 imputations.
#> Create 92 replicate weights according to JK2 procedure.
#>
res2a <- report(means2a,add = list(domain = "reading"))
#> Warning: Standard error correction for 'crossDiff_of_groupDiff' is currently not supported.
#> Warning: Standard error correction for crossDifferences across multiple hierarchy levels is currently not supported.
### Third example: like example 2a, but using nested imputations of dependent variable,
### and additionally estimating trend: use 'rd' instead of 'rdN1y10'
### assume equally weighted cases by omitting 'wgt' argument
### ignoring jackknife by omitting 'type', 'PSU' and 'repInd' argument
means3T<- repMean(datL = rd, ID="idstud", imp="imp", nest="nest",
groups = c("country", "sex"), group.splits = 0:2, group.differences.by="sex",
cross.differences = list(c(0,1), c(0,2)), dependent = "score", na.rm=FALSE,
doCheck=TRUE, trend = "year", linkErr = "leScore",
crossDiffSE = "wec", crossDiffSE.engine= "lavaan")
#>
#> Trend group: '2010'
#> 4 analyse(s) overall according to: 'group.splits = 0 1 2'.
#>
#> analysis.number hierarchy.level groups.divided.by group.differences.by
#> 1 0 <NA>
#> 2 1 country <NA>
#> 3 1 sex sex
#> 4 2 country + sex sex
#>
#> Assume nested structure with 2 nests and 3 imputations in each nest. This will result in 2 x 3 = 6 imputation replicates.
#>
#>
#> Trend group: '2015'
#> 4 analyse(s) overall according to: 'group.splits = 0 1 2'.
#>
#> analysis.number hierarchy.level groups.divided.by group.differences.by
#> 1 0 <NA>
#> 2 1 country <NA>
#> 3 1 sex sex
#> 4 2 country + sex sex
#>
#> Assume nested structure with 2 nests and 3 imputations in each nest. This will result in 2 x 3 = 6 imputation replicates.
#>
#> Warning: Computation of cross level differences using 'wec' method is only possible for differences according to adjacent levels. Non-adjacent levels will be ignored.
#> 'wec' method: Assume equally weighted cases.
#>
#> Trend group: '2010'
#> 1 analyse(s) overall according to: 'group.splits = 0'.
#> Assume nested structure with 2 nests and 3 imputations in each nest. This will result in 2 x 3 = 6 imputation replicates.
#>
#>
#> Trend group: '2015'
#> 1 analyse(s) overall according to: 'group.splits = 0'.
#> Assume nested structure with 2 nests and 3 imputations in each nest. This will result in 2 x 3 = 6 imputation replicates.
#>
#> Note: No linking error was defined. Linking error will be defaulted to '0'.
#> 'wec' method: Assume equally weighted cases.
#>
#> Trend group: '2010'
#> 1 analyse(s) overall according to: 'group.splits = 0'.
#> Assume nested structure with 2 nests and 3 imputations in each nest. This will result in 2 x 3 = 6 imputation replicates.
#>
#>
#> Trend group: '2015'
#> 1 analyse(s) overall according to: 'group.splits = 0'.
#> Assume nested structure with 2 nests and 3 imputations in each nest. This will result in 2 x 3 = 6 imputation replicates.
#>
#> Note: No linking error was defined. Linking error will be defaulted to '0'.
res3T <- report(means3T, add = list(domain = "reading"))
#> Warning: Standard error correction for 'crossDiff_of_groupDiff' is currently not supported.
#> Warning: Standard error correction for crossDifferences across multiple hierarchy levels is currently not supported.
### Example 3a: like example 3, but providing linking errors in an additional data.frame
### This is optional for two measurement occasions but mandatory if the analysis contains
### more than two measurement occasions
linkErr<- data.frame ( trendLevel1 = 2010, trendLevel2 = 2015, depVar = "score",
parameter = "mean", unique(lsa[,c("domain", "leScore")]),
stringsAsFactors = FALSE)
colnames(linkErr) <- car::recode(colnames(linkErr), "'leScore'='linkingError'")
### note that the linking errors for the specified domain have to be chosen via
### subsetting
means3a<- repMean(datL = rd, ID="idstud", imp="imp", nest="nest",
groups = c("country", "sex"),
group.splits = 0:2, group.differences.by="sex",
cross.differences = list(c(0,1), c(0,2)),
dependent = "score", na.rm=FALSE, doCheck=TRUE, trend = "year",
linkErr = linkErr[which(linkErr[,"domain"] == "reading"),],
crossDiffSE = "wec", crossDiffSE.engine= "lavaan")
#>
#> Trend group: '2010'
#> 4 analyse(s) overall according to: 'group.splits = 0 1 2'.
#>
#> analysis.number hierarchy.level groups.divided.by group.differences.by
#> 1 0 <NA>
#> 2 1 country <NA>
#> 3 1 sex sex
#> 4 2 country + sex sex
#>
#> Assume nested structure with 2 nests and 3 imputations in each nest. This will result in 2 x 3 = 6 imputation replicates.
#>
#>
#> Trend group: '2015'
#> 4 analyse(s) overall according to: 'group.splits = 0 1 2'.
#>
#> analysis.number hierarchy.level groups.divided.by group.differences.by
#> 1 0 <NA>
#> 2 1 country <NA>
#> 3 1 sex sex
#> 4 2 country + sex sex
#>
#> Assume nested structure with 2 nests and 3 imputations in each nest. This will result in 2 x 3 = 6 imputation replicates.
#>
#> Warning: Computation of cross level differences using 'wec' method is only possible for differences according to adjacent levels. Non-adjacent levels will be ignored.
#> 'wec' method: Assume equally weighted cases.
#>
#> Trend group: '2010'
#> 1 analyse(s) overall according to: 'group.splits = 0'.
#> Assume nested structure with 2 nests and 3 imputations in each nest. This will result in 2 x 3 = 6 imputation replicates.
#>
#>
#> Trend group: '2015'
#> 1 analyse(s) overall according to: 'group.splits = 0'.
#> Assume nested structure with 2 nests and 3 imputations in each nest. This will result in 2 x 3 = 6 imputation replicates.
#>
#> Note: No linking error was defined. Linking error will be defaulted to '0'.
#> 'wec' method: Assume equally weighted cases.
#>
#> Trend group: '2010'
#> 1 analyse(s) overall according to: 'group.splits = 0'.
#> Assume nested structure with 2 nests and 3 imputations in each nest. This will result in 2 x 3 = 6 imputation replicates.
#>
#>
#> Trend group: '2015'
#> 1 analyse(s) overall according to: 'group.splits = 0'.
#> Assume nested structure with 2 nests and 3 imputations in each nest. This will result in 2 x 3 = 6 imputation replicates.
#>
#> Note: No linking error was defined. Linking error will be defaulted to '0'.
res3a <- report(means3a, add = list(domain = "reading"))
#> Warning: Standard error correction for 'crossDiff_of_groupDiff' is currently not supported.
#> Warning: Standard error correction for crossDifferences across multiple hierarchy levels is currently not supported.
#> Warning: No linking errors for parameters 'Ncases', 'NcasesValid', 'sd', 'var'. Linking errors for these parameters will be defaulted to 0.
#> Warning: Found 23 missing linking errors for dependent variable 'score' and parameter(s) 'sd'. Assume linking error of 0 for these cases.
### Fourth example: using a loop do analyse 'reading' and 'listening' comprehension
### in one function call. Again with group and cross differences and trends, and
### trend differences
### we use weights but omit jackknife analysis by omitting 'type', 'PSU' and 'repInd'
### argument
means4T<- by ( data = lsa, INDICES = lsa[,"domain"], FUN = function (sub.dat) {
repMean(datL = sub.dat, ID="idstud", wgt="wgt", imp="imp", nest="nest",
groups = c("country", "sex"), group.splits = 0:2,
group.differences.by="sex",
cross.differences = list(c(0,1), c(0,2)), dependent = "score",
na.rm=FALSE, doCheck=TRUE,
trend = "year", linkErr = "leScore", crossDiffSE.engine= "lm") })
#>
#> Trend group: '2010'
#> 4 analyse(s) overall according to: 'group.splits = 0 1 2'.
#>
#> analysis.number hierarchy.level groups.divided.by group.differences.by
#> 1 0 <NA>
#> 2 1 country <NA>
#> 3 1 sex sex
#> 4 2 country + sex sex
#>
#> Assume nested structure with 2 nests and 3 imputations in each nest. This will result in 2 x 3 = 6 imputation replicates.
#>
#>
#> Trend group: '2015'
#> 4 analyse(s) overall according to: 'group.splits = 0 1 2'.
#>
#> analysis.number hierarchy.level groups.divided.by group.differences.by
#> 1 0 <NA>
#> 2 1 country <NA>
#> 3 1 sex sex
#> 4 2 country + sex sex
#>
#> Assume nested structure with 2 nests and 3 imputations in each nest. This will result in 2 x 3 = 6 imputation replicates.
#>
#> Warning: Computation of cross level differences using 'wec' method is only possible for differences according to adjacent levels. Non-adjacent levels will be ignored.
#>
#> Trend group: '2010'
#> 1 analyse(s) overall according to: 'group.splits = 0'.
#> Assume nested structure with 2 nests and 3 imputations in each nest. This will result in 2 x 3 = 6 imputation replicates.
#>
#>
#> Trend group: '2015'
#> 1 analyse(s) overall according to: 'group.splits = 0'.
#> Assume nested structure with 2 nests and 3 imputations in each nest. This will result in 2 x 3 = 6 imputation replicates.
#>
#> Note: No linking error was defined. Linking error will be defaulted to '0'.
#>
#> Trend group: '2010'
#> 1 analyse(s) overall according to: 'group.splits = 0'.
#> Assume nested structure with 2 nests and 3 imputations in each nest. This will result in 2 x 3 = 6 imputation replicates.
#>
#>
#> Trend group: '2015'
#> 1 analyse(s) overall according to: 'group.splits = 0'.
#> Assume nested structure with 2 nests and 3 imputations in each nest. This will result in 2 x 3 = 6 imputation replicates.
#>
#> Note: No linking error was defined. Linking error will be defaulted to '0'.
#>
#> Trend group: '2010'
#> 4 analyse(s) overall according to: 'group.splits = 0 1 2'.
#>
#> analysis.number hierarchy.level groups.divided.by group.differences.by
#> 1 0 <NA>
#> 2 1 country <NA>
#> 3 1 sex sex
#> 4 2 country + sex sex
#>
#> Assume nested structure with 2 nests and 3 imputations in each nest. This will result in 2 x 3 = 6 imputation replicates.
#>
#>
#> Trend group: '2015'
#> 4 analyse(s) overall according to: 'group.splits = 0 1 2'.
#>
#> analysis.number hierarchy.level groups.divided.by group.differences.by
#> 1 0 <NA>
#> 2 1 country <NA>
#> 3 1 sex sex
#> 4 2 country + sex sex
#>
#> Assume nested structure with 2 nests and 3 imputations in each nest. This will result in 2 x 3 = 6 imputation replicates.
#>
#> Warning: Computation of cross level differences using 'wec' method is only possible for differences according to adjacent levels. Non-adjacent levels will be ignored.
#>
#> Trend group: '2010'
#> 1 analyse(s) overall according to: 'group.splits = 0'.
#> Assume nested structure with 2 nests and 3 imputations in each nest. This will result in 2 x 3 = 6 imputation replicates.
#>
#>
#> Trend group: '2015'
#> 1 analyse(s) overall according to: 'group.splits = 0'.
#> Assume nested structure with 2 nests and 3 imputations in each nest. This will result in 2 x 3 = 6 imputation replicates.
#>
#> Note: No linking error was defined. Linking error will be defaulted to '0'.
#>
#> Trend group: '2010'
#> 1 analyse(s) overall according to: 'group.splits = 0'.
#> Assume nested structure with 2 nests and 3 imputations in each nest. This will result in 2 x 3 = 6 imputation replicates.
#>
#>
#> Trend group: '2015'
#> 1 analyse(s) overall according to: 'group.splits = 0'.
#> Assume nested structure with 2 nests and 3 imputations in each nest. This will result in 2 x 3 = 6 imputation replicates.
#>
#> Note: No linking error was defined. Linking error will be defaulted to '0'.
ret4T <- do.call("rbind", lapply(names(means4T), FUN = function ( domain ) {
report(means4T[[domain]], add = list(domain = domain))}))
#> Warning: Standard error correction for 'crossDiff_of_groupDiff' is currently not supported.
#> Warning: Standard error correction for crossDifferences across multiple hierarchy levels is currently not supported.
#> Warning: Standard error correction for 'crossDiff_of_groupDiff' is currently not supported.
#> Warning: Standard error correction for crossDifferences across multiple hierarchy levels is currently not supported.
### Fifth example: compute adjusted means, also with trend estimation
### Note: all covariates must be numeric or 0/1 dichotomous
rdN1[,"mignum"] <- as.numeric(rdN1[,"mig"])
rdN1[,"sexnum"] <- car::recode(rdN1[,"sex"], "'male'=0; 'female'=1", as.numeric=TRUE,
as.factor=FALSE)
means5 <- repMean(datL = rdN1, ID="idstud", wgt="wgt", type = "JK2", PSU = "jkzone",
repInd = "jkrep", imp="imp", groups = "country",
adjust = c("sexnum", "ses", "mignum"), useEffectLiteR = FALSE,
dependent = "score", na.rm=FALSE, doCheck=TRUE, trend = "year",
linkErr = "leScore")
#>
#> Trend group: '2010'
#> 1 analyse(s) overall according to: 'group.splits = 1'.
#> Assume unnested structure with 3 imputations.
#> Create 92 replicate weights according to JK2 procedure.
#>
#>
#> Trend group: '2015'
#> 1 analyse(s) overall according to: 'group.splits = 1'.
#> Assume unnested structure with 3 imputations.
#> Create 73 replicate weights according to JK2 procedure.
#>
res5 <- report(means5, add = list(domain = "reading"))
#> Cannot find standard deviations in output. Skip computation of effect sizes.
if (FALSE) { # \dontrun{
############################################################################################
# Example 6: R code for running the PISA 2015 science example to compare group means #
# with the total mean using weighted effect coding #
############################################################################################
# Warning: large PISA data set requires at least 16 GB free working memory (RAM):
### define necessary directories (note: writing permissions required)
folder <- tempdir()
### download PISA 2015 zipped student questionnaire data (420 MB) to a folder with
### writing permissions
download.file(url = "https://webfs.oecd.org/pisa/PUF_SPSS_COMBINED_CMB_STU_QQQ.zip",
destfile = file.path(folder, "pisa2015.zip"))
### unzip PISA 2015 student questionnaire data (1.5 GB) to temporary folder
zip::unzip(zipfile = file.path(folder, "pisa2015.zip"), files= "CY6_MS_CMB_STU_QQQ.sav",
exdir=folder)
### read data
pisa <- foreign::read.spss(file.path (folder, "CY6_MS_CMB_STU_QQQ.sav"),
to.data.frame=TRUE, use.value.labels = FALSE, use.missings = TRUE)
# dependent variables
measure.vars <- paste0("PV", 1:10, "SCIE")
### choose desired variables and reshape into the long format
# 'CNTSTUID' = individual student identifier
# 'CNT' = country identifier
# 'SENWT' = senate weight (assume a population of 5000 in each country)
# 'W_FSTUWT' = final student weight
# 'OECD' = dummy variable indicating which country is part of the OECD
# 'W_FSTURWT' (1 to 80) = balanced repeated replicate weights
# 'PV1SCIE' to 'PV10SCIE' = 10 plausible values of (latent) science performance
pisaLong <- reshape2::melt(pisa, id.vars = c("CNTSTUID", "CNT", "SENWT", "W_FSTUWT",
"OECD", paste0("W_FSTURWT", 1:80)),
measure.vars = measure.vars, value.name = "value", variable.name="imp",
na.rm=TRUE)
### choose OECD countries
oecd <- pisaLong[which(pisaLong[,"OECD"] == 1),]
### analyze data
### analysis takes approximately 30 minutes on an Intel i5-6500 machine with 32 GB RAM
means <- repMean( datL = oecd, # data.frame in the long format
ID = "CNTSTUID", # student identifier
dependent = "value", # the dependent variable in the data
groups = "CNT", # the grouping variable
wgt = "SENWT", # (optional) weighting variable. We use senate
# weights (assume a population of 5000 in each
# country)
type = "Fay", # type of replication method. Corresponding to
# the PISA sampling method, we use "Fay"
rho = 0.5, # shrinkage factor for weights in Fay's method
scale = NULL, # scaling constant for variance, set to NULL
# according to PISA's sampling method
rscales = NULL, # scaling constant for variance, set to NULL
# according to PISA's sampling method
repWgt = paste0("W_FSTURWT", 1:80), # the replicate weights,
# provided by the OECD
imp = "imp", # the imputation variable
mse = FALSE, # if TRUE, compute variances based on sum of
# squares around the point estimate, rather
# than the mean of the replicates.
group.splits = 0:1, # defining the 'levels' for which means should
# be computed. 0:1 implies that means for the
# whole sample (level 0) as well as for groups
# (level 1) are computed
cross.differences = TRUE, # defines whether (and which) cross level mean
# differences should be computed. TRUE means
# that all cross level mean differences are
# computed
crossDiffSE = "wec", # method for standard errors of mean
# differences
crossDiffSE.engine = "lm", # software implementation for standard
# errors of mean differences
hetero = TRUE, # assume heteroscedastic group variances
stochasticGroupSizes = FALSE # assume fixed group sizes
)
### call a reporting function to generate user-friendly output
results <- report(means, exclude = c("Ncases", "NcasesValid", "var", "sd"))
} # }# }