Parametric bootstrap for mixed-effects models

Author

Phillip Alday, Douglas Bates, and Reinhold Kliegl

Published

2024-06-27

The speed of MixedModels.jl relative to its predecessors makes the parametric bootstrap much more computationally tractable. This is valuable because the parametric bootstrap can be used to produce more accurate confidence intervals than methods based on standard errors or profiling of the likelihood surface.

This page is adapted from the MixedModels.jl docs

1 The parametric bootstrap

Bootstrapping is a family of procedures for generating sample values of a statistic, allowing for visualization of the distribution of the statistic or for inference from this sample of values. Bootstrapping also belongs to a larger family of procedures called resampling, which are based on creating new samples of data from an existing one, then computing statistics on the new samples, in order to examine the distribution of the relevant statistics.

A parametric bootstrap is used with a parametric model, m, that has been fit to data. The procedure is to simulate n response vectors from m using the estimated parameter values and refit m to these responses in turn, accumulating the statistics of interest at each iteration.

The parameters of a LinearMixedModel object are the fixed-effects parameters, β, the standard deviation, σ, of the per-observation noise, and the covariance parameter, θ, that defines the variance-covariance matrices of the random effects. A technical description of the covariance parameter can be found in the MixedModels.jl docs. Lisa Schwetlick and Daniel Backhaus have provided a more beginner-friendly description of the covariance parameter in the documentation for MixedModelsSim.jl. For today’s purposes – looking at the uncertainty in the estimates from a fitted model – we can simply use values from the fitted model, but we will revisit the parametric bootstrap as a convenient way to simulate new data, potentially with different parameter values, for power analysis.

Attach the packages to be used

Code

using AlgebraOfGraphics
using CairoMakie
using DataFrames
using MixedModels
using MixedModelsMakie
using Random
using SMLP2023: dataset

using AlgebraOfGraphics: AlgebraOfGraphics as AoG
CairoMakie.activate!(; type="svg") # use SVG (other options include PNG)

import ProgressMeter
ProgressMeter.ijulia_behavior(:clear);

Note that the precise stream of random numbers generated for a given seed can change between Julia versions. For exact reproducibility, you either need to have the exact same Julia version or use the StableRNGs package.

2 A model of moderate complexity

The kb07 data (Kronmüller & Barr, 2007) are one of the datasets provided by the MixedModels package.

kb07 = dataset(:kb07)

Arrow.Table with 1789 rows, 7 columns, and schema:
 :subj      String
 :item      String
 :spkr      String
 :prec      String
 :load      String
 :rt_trunc  Int16
 :rt_raw    Int16

Convert the table to a DataFrame for summary.

kb07 = DataFrame(kb07)
describe(kb07)

7×7 DataFrame

Row	variable	mean	min	median	max	nmissing	eltype
	Symbol	Union…	Any	Union…	Any	Int64	DataType
1	subj		S030		S103	0	String
2	item		I01		I32	0	String
3	spkr		new		old	0	String
4	prec		break		maintain	0	String
5	load		no		yes	0	String
6	rt_trunc	2182.2	579	1940.0	5171	0	Int16
7	rt_raw	2226.24	579	1940.0	15923	0	Int16

The experimental factors; spkr, prec, and load, are two-level factors.

contrasts = Dict(:spkr => EffectsCoding(),
                 :prec => EffectsCoding(),
                 :load => EffectsCoding(),
                 :subj => Grouping(),
                 :item => Grouping())

The EffectsCoding contrast is used with these to create a ±1 encoding. Furthermore, Grouping contrasts are assigned to the subj and item factors. This is not a contrast per-se but an indication that these factors will be used as grouping factors for random effects and, therefore, there is no need to create a contrast matrix. For large numbers of levels in a grouping factor, an attempt to create a contrast matrix may cause memory overflow.

It is not important in these cases but a good practice in any case.

We can look at an initial fit of moderate complexity:

form = @formula(rt_trunc ~ 1 + spkr * prec * load +
                          (1 + spkr + prec + load | subj) +
                          (1 + spkr + prec + load | item))
m0 = fit(MixedModel, form, kb07; contrasts)

Minimizing 874    Time: 0:00:01 ( 1.27 ms/it)
  objective:  28637.971010073215

	Est.	SE	z	p	σ_subj	σ_item
(Intercept)	2181.6424	77.3515	28.20	<1e-99	301.8721	362.4695
spkr: old	67.7496	17.9607	3.77	0.0002	33.0588	41.1159
prec: maintain	-333.9200	47.1563	-7.08	<1e-11	58.8512	247.3299
load: yes	78.8007	19.7269	3.99	<1e-04	66.9526	43.3991
spkr: old & prec: maintain	-21.9960	15.8191	-1.39	0.1644
spkr: old & load: yes	18.3832	15.8191	1.16	0.2452
prec: maintain & load: yes	4.5327	15.8191	0.29	0.7745
spkr: old & prec: maintain & load: yes	23.6377	15.8191	1.49	0.1351
Residual	669.0515

The default display in Quarto uses the pretty MIME show method for the model and omits the estimated correlations of the random effects.

The VarCorr extractor displays these.

VarCorr(m0)

	Column	Variance	Std.Dev	Corr.
subj	(Intercept)	91126.7545	301.8721
	spkr: old	1092.8862	33.0588	+1.00
	prec: maintain	3463.4603	58.8512	-0.62	-0.62
	load: yes	4482.6493	66.9526	+0.36	+0.36	+0.51
item	(Intercept)	131384.1084	362.4695
	spkr: old	1690.5210	41.1159	+0.42
	prec: maintain	61172.0781	247.3299	-0.69	+0.37
	load: yes	1883.4785	43.3991	+0.29	+0.14	-0.13
Residual		447629.9270	669.0515

None of the two-factor or three-factor interaction terms in the fixed-effects are significant. In the random-effects terms only the scalar random effects and the prec random effect for item appear to be warranted, leading to the reduced formula

# formula f4 from https://doi.org/10.33016/nextjournal.100002
form = @formula(rt_trunc ~ 1 + spkr * prec * load + (1 | subj) + (1 + prec | item))

m1 = fit(MixedModel, form, kb07; contrasts)

	Est.	SE	z	p	σ_item	σ_subj
(Intercept)	2181.7582	77.4709	28.16	<1e-99	364.7286	298.1109
spkr: old	67.8114	16.0526	4.22	<1e-04
prec: maintain	-333.8582	47.4629	-7.03	<1e-11	252.6687
load: yes	78.6849	16.0525	4.90	<1e-06
spkr: old & prec: maintain	-21.8802	16.0525	-1.36	0.1729
spkr: old & load: yes	18.3214	16.0526	1.14	0.2537
prec: maintain & load: yes	4.4710	16.0526	0.28	0.7806
spkr: old & prec: maintain & load: yes	23.5219	16.0525	1.47	0.1428
Residual	678.9318

VarCorr(m1)

	Column	Variance	Std.Dev	Corr.
item	(Intercept)	133026.918	364.729
	prec: maintain	63841.496	252.669	-0.70
subj	(Intercept)	88870.081	298.111
Residual		460948.432	678.932

These two models are nested and can be compared with a likelihood-ratio test.

MixedModels.likelihoodratiotest(m0, m1)

	model-dof	deviance	χ²	χ²-dof	P(>χ²)
rt_trunc ~ 1 + spkr + prec + load + spkr & prec + spkr & load + prec & load + spkr & prec & load + (1 \| subj) + (1 + prec \| item)	13	28658
rt_trunc ~ 1 + spkr + prec + load + spkr & prec + spkr & load + prec & load + spkr & prec & load + (1 + spkr + prec + load \| subj) + (1 + spkr + prec + load \| item)	29	28638	21	16	0.1979

The p-value of approximately 20% leads us to prefer the simpler model, m1, to the more complex, m0.

3 Bootstrap basics

To bootstrap the model parameters, first initialize a random number generator then create a bootstrap sample and extract the table of parameter estimates from it.

const RNG = MersenneTwister(42)
samp = parametricbootstrap(RNG, 5_000, m1)
tbl = samp.tbl

Table with 18 columns and 5000 rows:
      obj      β1       β2       β3        β4       β5        β6        ⋯
    ┌────────────────────────────────────────────────────────────────────
 1  │ 28691.9  2049.88  71.6398  -268.333  75.1566  -17.8459  -1.90968  ⋯
 2  │ 28642.6  2197.6   73.0832  -313.409  68.018   -18.062   30.2431   ⋯
 3  │ 28748.1  2175.11  85.4784  -311.786  48.8342  -27.3483  4.97874   ⋯
 4  │ 28683.7  2158.48  86.0498  -301.013  71.0117  -45.3329  27.3747   ⋯
 5  │ 28692.6  2342.6   81.9722  -305.764  78.9518  -37.0137  -5.59876  ⋯
 6  │ 28691.5  2198.64  90.4229  -387.855  93.4767  -37.2244  3.13351   ⋯
 7  │ 28687.7  2239.49  74.5619  -364.151  93.9154  -43.441   19.1572   ⋯
 8  │ 28673.4  2217.89  76.4077  -409.228  52.5047  -11.5777  26.4348   ⋯
 9  │ 28688.6  2184.89  81.3252  -298.693  89.8482  -5.99902  21.3162   ⋯
 10 │ 28597.6  2227.36  57.3436  -332.635  102.565  -3.85194  7.23222   ⋯
 11 │ 28700.8  2131.61  58.1253  -363.018  88.0576  -24.1592  2.26268   ⋯
 12 │ 28644.8  2145.05  94.7216  -242.292  103.158  -27.2788  27.0664   ⋯
 13 │ 28678.8  2113.56  55.4491  -382.425  89.8252  -12.9669  -2.74384  ⋯
 14 │ 28773.5  2123.38  117.383  -258.883  78.8998  -63.4576  26.0778   ⋯
 15 │ 28566.3  2168.89  52.8287  -289.968  71.5994  -14.7045  23.0165   ⋯
 16 │ 28764.1  2344.66  54.5199  -356.183  54.3082  -36.4798  32.3478   ⋯
 17 │ 28547.7  2230.3   56.2337  -329.448  88.8918  -27.7721  23.5031   ⋯
 18 │ 28729.3  2268.71  77.0147  -328.11   73.1191  -15.0292  17.2227   ⋯
 19 │ 28686.4  2138.38  88.8036  -273.697  68.418   -40.8522  17.7392   ⋯
 20 │ 28620.7  1955.46  97.2794  -334.827  92.7866  -1.49191  33.0718   ⋯
 21 │ 28664.1  2115.45  71.7916  -375.888  76.0984  -27.8142  24.5494   ⋯
 22 │ 28742.2  2173.11  83.8428  -351.655  76.0148  -15.7201  -8.95264  ⋯
 23 │ 28720.3  2192.85  62.7893  -327.012  47.9618  -23.188   16.7164   ⋯
 ⋮  │    ⋮        ⋮        ⋮        ⋮         ⋮        ⋮         ⋮      ⋱

An empirical density plot of the estimates of the residual standard deviation is obtained as

plt = data(tbl) * mapping(:σ) * AoG.density()
draw(plt; axis=(;title="Parametric bootstrap estimates of σ"))

A density plot of the estimates of the standard deviation of the random effects is obtained as

plt = data(tbl) * mapping(
  [:σ1, :σ2, :σ3] .=> "Bootstrap replicates of standard deviations";
  color=dims(1) => renamer(["Item intercept", "Item speaker", "Subj"])
) * AoG.density()
draw(plt; figure=(;supertitle="Parametric bootstrap estimates of variance components"))

The bootstrap sample can be used to generate intervals that cover a certain percentage of the bootstrapped values. We refer to these as “coverage intervals”, similar to a confidence interval. The shortest such intervals, obtained with the shortestcovint extractor, correspond to a highest posterior density interval in Bayesian inference.

We generate these for all random and fixed effects:

confint(samp)

DictTable with 2 columns and 13 rows:
 par   lower      upper
 ────┬─────────────────────
 β1  │ 2032.48    2340.42
 β2  │ 34.3982    97.8079
 β3  │ -428.09    -244.474
 β4  │ 48.642     110.179
 β5  │ -53.4336   8.42791
 β6  │ -11.3574   51.5299
 β7  │ -27.1799   36.3332
 β8  │ -9.28905   53.5917
 ρ1  │ -0.912778  -0.471128
 σ   │ 654.965    700.928
 σ1  │ 265.276    456.967
 σ2  │ 179.402    321.017
 σ3  │ 232.096    359.744

draw(
  data(samp.β) * mapping(:β; color=:coefname) * AoG.density();
  figure=(; resolution=(800, 450)),
)

For the fixed effects, MixedModelsMakie provides a convenience interface to plot the combined coverage intervals and density plots

ridgeplot(samp)

Often the intercept will be on a different scale and potentially less interesting, so we can stop it from being included in the plot:

ridgeplot(samp; show_intercept=false, xlabel="Bootstrap density and 95%CI")

4 Singularity

Let’s consider the classic dysetuff dataset:

dyestuff = dataset(:dyestuff)
mdye = fit(MixedModel, @formula(yield ~ 1 + (1 | batch)), dyestuff)

	Est.	SE	z	p	σ_batch
(Intercept)	1527.5000	17.6946	86.33	<1e-99	37.2603
Residual	49.5101

sampdye = parametricbootstrap(MersenneTwister(1234321), 10_000, mdye)
tbldye = sampdye.tbl

Table with 5 columns and 10000 rows:
      obj      β1       σ        σ1        θ1
    ┌────────────────────────────────────────────────
 1  │ 339.022  1509.13  67.4315  14.312    0.212245
 2  │ 322.689  1538.08  47.9831  25.5673   0.53284
 3  │ 324.002  1508.02  50.1346  21.7622   0.434076
 4  │ 331.887  1538.47  53.2238  41.0559   0.771383
 5  │ 317.771  1520.62  45.2975  19.1802   0.423428
 6  │ 315.181  1536.94  36.7556  49.1832   1.33812
 7  │ 333.641  1519.88  53.8161  46.712    0.867993
 8  │ 325.729  1528.43  47.8989  37.6367   0.785752
 9  │ 311.601  1497.46  41.4     15.1257   0.365355
 10 │ 335.244  1532.65  64.616   0.0       0.0
 11 │ 327.935  1552.54  57.2036  0.485275  0.00848329
 12 │ 323.861  1519.28  49.355   24.3703   0.493776
 13 │ 332.736  1509.04  59.6272  18.2905   0.306747
 14 │ 328.243  1531.7   51.5431  32.4743   0.630042
 15 │ 336.186  1536.17  64.0205  15.243    0.238096
 16 │ 329.468  1526.42  58.6856  0.0       0.0
 17 │ 320.086  1517.67  43.218   35.9663   0.832207
 18 │ 325.887  1497.86  50.8753  25.9059   0.509205
 19 │ 311.31   1529.24  33.8976  49.6557   1.46487
 20 │ 309.404  1549.71  33.987   41.1105   1.20959
 21 │ 327.973  1512.27  51.9135  29.8809   0.57559
 22 │ 310.973  1523.54  40.5191  16.8259   0.415258
 23 │ 323.794  1545.11  47.1842  32.9669   0.698686
 ⋮  │    ⋮        ⋮        ⋮        ⋮          ⋮

plt = data(tbldye) * mapping(:σ1) * AoG.density()
draw(plt; axis=(;title="Parametric bootstrap estimates of σ_batch"))

Notice that this density plot has a spike, or mode, at zero. Although this mode appears to be diffuse, this is an artifact of the way that density plots are created. In fact, it is a pulse, as can be seen from a histogram.

plt = data(tbldye) * mapping(:σ1) * AoG.histogram(;bins=100)
draw(plt; axis=(;title="Parametric bootstrap estimates of σ_batch"))

A value of zero for the standard deviation of the random effects is an example of a singular covariance. It is easy to detect the singularity in the case of a scalar random-effects term. However, it is not as straightforward to detect singularity in vector-valued random-effects terms.

For example, if we bootstrap a model fit to the sleepstudy data

sleepstudy = dataset(:sleepstudy)
msleep = fit(MixedModel, @formula(reaction ~ 1 + days + (1 + days | subj)),
             sleepstudy)

	Est.	SE	z	p	σ_subj
(Intercept)	251.4051	6.6323	37.91	<1e-99	23.7805
days	10.4673	1.5022	6.97	<1e-11	5.7168
Residual	25.5918

sampsleep = parametricbootstrap(MersenneTwister(666), 10_000, msleep)
tblsleep = sampsleep.tbl

Table with 10 columns and 10000 rows:
      obj      β1       β2       σ        σ1       σ2       ρ1         ⋯
    ┌───────────────────────────────────────────────────────────────────
 1  │ 1721.95  252.488  11.0328  22.4544  29.6185  6.33343  0.233383   ⋯
 2  │ 1760.85  260.763  8.55352  27.3836  20.806   4.32887  0.914702   ⋯
 3  │ 1750.88  246.709  12.4613  25.9951  15.8702  6.33404  0.200358   ⋯
 4  │ 1777.33  247.683  12.9824  27.7966  27.5413  4.9878   0.121411   ⋯
 5  │ 1738.05  245.649  10.5792  25.3596  21.5208  4.26131  0.052677   ⋯
 6  │ 1751.25  255.669  10.1984  26.1432  22.5389  4.58209  0.225968   ⋯
 7  │ 1727.51  248.986  7.62095  24.6451  19.0858  4.34881  0.212916   ⋯
 8  │ 1754.18  246.075  11.0469  26.9407  19.8341  4.55961  -0.202146  ⋯
 9  │ 1757.47  245.407  13.7475  25.8265  20.0014  7.7647   -0.266385  ⋯
 10 │ 1752.8   253.911  11.4977  25.7077  20.6409  6.27298  0.171494   ⋯
 11 │ 1707.8   248.887  10.1608  23.9684  10.5923  4.32041  1.0        ⋯
 12 │ 1773.69  252.542  10.7379  26.8795  27.7956  6.20553  0.156472   ⋯
 13 │ 1761.27  254.712  11.0373  25.7998  23.2005  7.30831  0.368175   ⋯
 14 │ 1737.0   260.299  10.5659  24.6504  29.0113  4.26877  -0.078572  ⋯
 15 │ 1760.12  258.949  10.1464  27.2088  8.02851  7.01925  0.727257   ⋯
 16 │ 1723.7   249.204  11.7868  24.9861  18.6887  3.08433  0.633219   ⋯
 17 │ 1734.14  262.586  8.96611  24.0011  26.7969  5.37598  0.297089   ⋯
 18 │ 1788.8   260.376  11.658   28.6099  26.1245  5.85587  0.0323618  ⋯
 19 │ 1752.44  239.962  11.0195  26.2388  23.2242  4.45586  0.482511   ⋯
 20 │ 1752.92  258.171  11.6339  25.7146  27.3026  4.87036  0.22569    ⋯
 21 │ 1740.81  254.09   7.91985  25.2195  16.2247  6.08679  0.462549   ⋯
 22 │ 1756.6   245.791  10.3434  26.2627  23.289   5.50225  -0.143374  ⋯
 23 │ 1759.01  256.131  9.10794  27.136   27.5008  3.51226  1.0        ⋯
 ⋮  │    ⋮        ⋮        ⋮        ⋮        ⋮        ⋮         ⋮      ⋱

the singularity can be exhibited as a standard deviation of zero or as a correlation of ±1.

confint(sampsleep)

DictTable with 2 columns and 6 rows:
 par   lower      upper
 ────┬───────────────────
 β1  │ 237.905    264.231
 β2  │ 7.44545    13.3516
 ρ1  │ -0.409926  1.0
 σ   │ 22.6345    28.5125
 σ1  │ 10.6756    33.3567
 σ2  │ 3.07053    7.82206

A histogram of the estimated correlations from the bootstrap sample has a spike at +1.

plt = data(tblsleep) * mapping(:ρ1) * AoG.histogram(;bins=100)
draw(plt; axis=(;title="Parametric bootstrap samples of correlation of random effects"))

or, as a count,

count(tblsleep.ρ1  .≈ 1)

Close examination of the histogram shows a few values of -1.

count(tblsleep.ρ1 .≈ -1)

Furthermore there are even a few cases where the estimate of the standard deviation of the random effect for the intercept is zero.

count(tblsleep.σ1 .≈ 0)

There is a general condition to check for singularity of an estimated covariance matrix or matrices in a bootstrap sample. The parameter optimized in the estimation is θ, the relative covariance parameter. Some of the elements of this parameter vector must be non-negative and, when one of these components is approximately zero, one of the covariance matrices will be singular.

The issingular method for a MixedModel object that tests if a parameter vector θ corresponds to a boundary or singular fit.

This operation is encapsulated in a method for the issingular function that works on MixedModelBootstrap objects.

count(issingular(sampsleep))

5 References

Kronmüller, E., & Barr, D. J. (2007). Perspective-free pragmatics: Broken precedents and the recovery-from-preemption hypothesis. Journal of Memory and Language, 56(3), 436–455. https://doi.org/10.1016/j.jml.2006.05.002