Bootstrapping a fitted model

Author

Phillip Alday, Douglas Bates, and Reinhold Kliegl

Published

2025-05-12

Begin by loading the packages to be used.

Code
using AlgebraOfGraphics
using CairoMakie
using DataFrames
using MixedModels
using Random
using SMLP2025: dataset

const progress=false

1 Data set and model

The kb07 data (Kronmüller & Barr, 2007) are one of the datasets provided by the MixedModels package.

kb07 = dataset(:kb07)
Arrow.Table with 1789 rows, 7 columns, and schema:
 :subj      String
 :item      String
 :spkr      String
 :prec      String
 :load      String
 :rt_trunc  Int16
 :rt_raw    Int16

Convert the table to a DataFrame for summary.

describe(DataFrame(kb07))
7×7 DataFrame
Row variable mean min median max nmissing eltype
Symbol Union… Any Union… Any Int64 DataType
1 subj S030 S103 0 String
2 item I01 I32 0 String
3 spkr new old 0 String
4 prec break maintain 0 String
5 load no yes 0 String
6 rt_trunc 2182.2 579 1940.0 5171 0 Int16
7 rt_raw 2226.24 579 1940.0 15923 0 Int16

The experimental factors; spkr, prec, and load, are two-level factors. The EffectsCoding contrast is used with these to create a \(\pm1\) encoding.

contrasts = Dict{Symbol,Any}(nm => EffectsCoding() for nm in (:spkr, :prec, :load))

The display of an initial model fit

kbm01 = let
  form = @formula(
    rt_trunc ~
      1 +
      spkr * prec * load +
      (1 + spkr + prec + load | subj) +
      (1 + spkr + prec + load | item)
  )
  fit(MixedModel, form, kb07; contrasts, progress)
end
Est. SE z p σ_subj σ_item
(Intercept) 2181.6729 77.3131 28.22 <1e-99 301.6736 362.3182
spkr: old 67.7490 18.2747 3.71 0.0002 42.5441 40.7073
prec: maintain -333.9206 47.1388 -7.08 <1e-11 61.9978 246.8078
load: yes 78.7702 19.5387 4.03 <1e-04 64.9874 42.5139
spkr: old & prec: maintain -21.9655 15.8070 -1.39 0.1646
spkr: old & load: yes 18.3838 15.8070 1.16 0.2448
prec: maintain & load: yes 4.5333 15.8070 0.29 0.7743
spkr: old & prec: maintain & load: yes 23.6072 15.8070 1.49 0.1353
Residual 668.5371

does not include the estimated correlations of the random effects.

The VarCorr extractor displays these.

VarCorr(kbm01)
Column Variance Std.Dev Corr.
subj (Intercept) 91006.9887 301.6736
spkr: old 1810.0012 42.5441 +0.79
prec: maintain 3843.7268 61.9978 -0.59 +0.02
load: yes 4223.3647 64.9874 +0.36 +0.85 +0.54
item (Intercept) 131274.4558 362.3182
spkr: old 1657.0830 40.7073 +0.44
prec: maintain 60914.1141 246.8078 -0.69 +0.35
load: yes 1807.4303 42.5139 +0.32 +0.15 -0.14
Residual 446941.7901 668.5371

None of the two-factor or three-factor interaction terms in the fixed-effects are significant. In the random-effects terms only the scalar random effects and the prec random effect for item appear to be warranted, leading to the reduced formula

kbm02 = let
  form = @formula(
    rt_trunc ~
      1 + spkr + prec + load + (1 | subj) + (1 + prec | item)
  )
  fit(MixedModel, form, kb07; contrasts, progress)
end
Est. SE z p σ_item σ_subj
(Intercept) 2181.8526 77.4681 28.16 <1e-99 364.7125 298.0259
spkr: old 67.8790 16.0785 4.22 <1e-04
prec: maintain -333.7906 47.4472 -7.03 <1e-11 252.5212
load: yes 78.5904 16.0785 4.89 <1e-05
Residual 680.0319
VarCorr(kbm02)
Column Variance Std.Dev Corr.
item (Intercept) 133015.242 364.713
prec: maintain 63766.937 252.521 -0.70
subj (Intercept) 88819.438 298.026
Residual 462443.388 680.032

These two models are nested and can be compared with a likelihood-ratio test.

MixedModels.likelihoodratiotest(kbm02, kbm01)
model-dof deviance χ² χ²-dof P(>χ²)
rt_trunc ~ 1 + spkr + prec + load + (1 | subj) + (1 + prec | item) 9 28664
rt_trunc ~ 1 + spkr + prec + load + spkr & prec + spkr & load + prec & load + spkr & prec & load + (1 + spkr + prec + load | subj) + (1 + spkr + prec + load | item) 29 28637 27 20 0.1431

The p-value of approximately 17% leads us to prefer the simpler model, kbm02, to the more complex, kbm01.

2 A bootstrap sample

Create a bootstrap sample of a few thousand parameter estimates from the reduced model. The pseudo-random number generator is initialized to a fixed value for reproducibility.

Random.seed!(1234321)
kbm02samp = parametricbootstrap(2000, kbm02)
kbm02tbl = kbm02samp.tbl
Table with 14 columns and 2000 rows:
      obj      β1       β2       β3        β4       σ        σ1       σ2       ⋯
    ┌───────────────────────────────────────────────────────────────────────────
 1  │ 28772.3  2163.91  60.5609  -312.083  53.8854  702.338  372.786  244.827  ⋯
 2  │ 28715.1  2232.6   83.7997  -383.283  47.4501  693.167  391.838  214.007  ⋯
 3  │ 28575.3  2194.67  65.0581  -348.837  78.2132  660.998  380.111  241.468  ⋯
 4  │ 28728.5  2261.27  66.174   -351.407  89.6083  690.344  334.599  306.404  ⋯
 5  │ 28631.4  2309.15  77.5687  -341.306  72.5969  676.497  323.766  269.468  ⋯
 6  │ 28627.8  2091.9   54.8119  -295.505  74.487   667.146  426.8    278.072  ⋯
 7  │ 28685.7  2291.62  73.7219  -396.564  84.4817  687.627  393.806  194.632  ⋯
 8  │ 28654.4  2019.04  72.7075  -203.806  87.8736  679.826  304.365  208.355  ⋯
 9  │ 28664.1  1999.72  60.8485  -308.63   74.0233  680.412  357.133  305.422  ⋯
 10 │ 28618.3  2249.85  52.1815  -394.464  89.8091  670.656  385.063  226.332  ⋯
 11 │ 28748.9  2166.95  77.9659  -326.597  81.0316  696.842  378.964  234.475  ⋯
 12 │ 28731.4  2153.68  74.5144  -341.393  44.1758  701.902  323.245  177.602  ⋯
 13 │ 28623.9  2155.23  93.193   -373.267  73.2941  671.256  335.979  217.312  ⋯
 14 │ 28615.9  2344.1   62.8553  -314.684  88.7448  665.313  410.652  259.437  ⋯
 15 │ 28636.2  2286.5   35.9889  -349.732  62.0524  670.473  361.649  304.037  ⋯
 16 │ 28676.6  2289.43  76.9805  -298.765  87.139   686.466  316.203  232.853  ⋯
 17 │ 28614.8  2044.53  53.2336  -304.859  112.995  675.065  354.352  220.038  ⋯
 ⋮  │    ⋮        ⋮        ⋮        ⋮         ⋮        ⋮        ⋮        ⋮     ⋱

One of the uses of such a sample is to form “confidence intervals” on the parameters by obtaining the shortest interval that covers a given proportion (95%, by default) of the sample.

confint(kbm02samp)
DictTable with 2 columns and 9 rows:
 par   lower      upper
 ────┬─────────────────────
 β1  │ 2022.91    2334.0
 β2  │ 34.0592    96.72
 β3  │ -430.238   -239.803
 β4  │ 46.5349    109.526
 ρ1  │ -0.907256  -0.490089
 σ   │ 657.341    702.655
 σ1  │ 270.067    451.968
 σ2  │ 181.737    325.124
 σ3  │ 233.834    364.512

A sample like this can be used for more than just creating an interval because it approximates the distribution of the estimator. For the fixed-effects parameters the estimators are close to being normally distributed, Figure 1.

Code
draw(
  data(kbm02samp.β) * mapping(:β; color=:coefname) * AlgebraOfGraphics.density();
  figure=(; resolution=(800, 450)),
)
Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
@ Makie ~/.julia/packages/Makie/KcEWO/src/scenes.jl:238
Figure 1: Comparative densities of the fixed-effects coefficients in kbm02samp
Code
let pars = ["σ1", "σ2", "σ3"]
  draw(
    data(kbm02tbl) *
    mapping(pars .=> "σ"; color=dims(1) => renamer(pars)) *
    AlgebraOfGraphics.density();
    figure=(; resolution=(800, 450)),
  )
end
Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
@ Makie ~/.julia/packages/Makie/KcEWO/src/scenes.jl:238
Figure 2: Density plot of bootstrap samples standard deviation of random effects
Code
draw(
  data(kbm02tbl) *
  mapping(:ρ1 => "Correlation") *
  AlgebraOfGraphics.density();
  figure=(; resolution=(800, 450)),
)
Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
@ Makie ~/.julia/packages/Makie/KcEWO/src/scenes.jl:238
Figure 3: Density plot of correlation parameters in bootstrap sample from model kbm02

3 References

Kronmüller, E., & Barr, D. J. (2007). Perspective-free pragmatics: Broken precedents and the recovery-from-preemption hypothesis. Journal of Memory and Language, 56(3), 436–455. https://doi.org/10.1016/j.jml.2006.05.002

This page was rendered from git revision 0e4cb20 using Quarto 1.7.29.

Back to top