Code
using AlgebraOfGraphics
using CairoMakie
using DataFrames
using MixedModels
using ProgressMeter
using Random
using SMLP2023: dataset
activate!(; type="svg")
CairoMakie.
ijulia_behavior(:clear) ProgressMeter.
Phillip Alday, Douglas Bates, and Reinhold Kliegl
2024-06-27
Begin by loading the packages to be used.
The kb07
data (Kronmüller & Barr, 2007) are one of the datasets provided by the MixedModels
package.
Arrow.Table with 1789 rows, 7 columns, and schema:
:subj String
:item String
:spkr String
:prec String
:load String
:rt_trunc Int16
:rt_raw Int16
Convert the table to a DataFrame for summary.
Row | variable | mean | min | median | max | nmissing | eltype |
---|---|---|---|---|---|---|---|
Symbol | Union… | Any | Union… | Any | Int64 | DataType | |
1 | subj | S030 | S103 | 0 | String | ||
2 | item | I01 | I32 | 0 | String | ||
3 | spkr | new | old | 0 | String | ||
4 | prec | break | maintain | 0 | String | ||
5 | load | no | yes | 0 | String | ||
6 | rt_trunc | 2182.2 | 579 | 1940.0 | 5171 | 0 | Int16 |
7 | rt_raw | 2226.24 | 579 | 1940.0 | 15923 | 0 | Int16 |
The experimental factors; spkr
, prec
, and load
, are two-level factors. The EffectsCoding
contrast is used with these to create a \(\pm1\) encoding. Furthermore, Grouping
contrasts are assigned to the subj
and item
factors. This is not a contrast per-se but an indication that these factors will be used as grouping factors for random effects and, therefore, there is no need to create a contrast matrix. For large numbers of levels in a grouping factor, an attempt to create a contrast matrix may cause memory overflow.
It is not important in these cases but a good practice in any case.
The display of an initial model fit
kbm01 = let
form = @formula(
rt_trunc ~
1 +
spkr * prec * load +
(1 + spkr + prec + load | subj) +
(1 + spkr + prec + load | item)
)
fit(MixedModel, form, kb07; contrasts)
end
Minimizing 874 Time: 0:00:01 ( 1.38 ms/it)
objective: 28637.971010073215
Est. | SE | z | p | σ_subj | σ_item | |
---|---|---|---|---|---|---|
(Intercept) | 2181.6424 | 77.3515 | 28.20 | <1e-99 | 301.8721 | 362.4695 |
spkr: old | 67.7496 | 17.9607 | 3.77 | 0.0002 | 33.0588 | 41.1159 |
prec: maintain | -333.9200 | 47.1563 | -7.08 | <1e-11 | 58.8512 | 247.3299 |
load: yes | 78.8007 | 19.7269 | 3.99 | <1e-04 | 66.9526 | 43.3991 |
spkr: old & prec: maintain | -21.9960 | 15.8191 | -1.39 | 0.1644 | ||
spkr: old & load: yes | 18.3832 | 15.8191 | 1.16 | 0.2452 | ||
prec: maintain & load: yes | 4.5327 | 15.8191 | 0.29 | 0.7745 | ||
spkr: old & prec: maintain & load: yes | 23.6377 | 15.8191 | 1.49 | 0.1351 | ||
Residual | 669.0515 |
does not include the estimated correlations of the random effects.
The VarCorr
extractor displays these.
Column | Variance | Std.Dev | Corr. | |||
---|---|---|---|---|---|---|
subj | (Intercept) | 91126.7545 | 301.8721 | |||
spkr: old | 1092.8862 | 33.0588 | +1.00 | |||
prec: maintain | 3463.4603 | 58.8512 | -0.62 | -0.62 | ||
load: yes | 4482.6493 | 66.9526 | +0.36 | +0.36 | +0.51 | |
item | (Intercept) | 131384.1084 | 362.4695 | |||
spkr: old | 1690.5210 | 41.1159 | +0.42 | |||
prec: maintain | 61172.0781 | 247.3299 | -0.69 | +0.37 | ||
load: yes | 1883.4785 | 43.3991 | +0.29 | +0.14 | -0.13 | |
Residual | 447629.9270 | 669.0515 |
None of the two-factor or three-factor interaction terms in the fixed-effects are significant. In the random-effects terms only the scalar random effects and the prec
random effect for item
appear to be warranted, leading to the reduced formula
kbm02 = let
form = @formula(
rt_trunc ~
1 + spkr + prec + load + (1 | subj) + (1 + prec | item)
)
fit(MixedModel, form, kb07; contrasts)
end
Est. | SE | z | p | σ_item | σ_subj | |
---|---|---|---|---|---|---|
(Intercept) | 2181.8526 | 77.4681 | 28.16 | <1e-99 | 364.7125 | 298.0259 |
spkr: old | 67.8790 | 16.0785 | 4.22 | <1e-04 | ||
prec: maintain | -333.7906 | 47.4472 | -7.03 | <1e-11 | 252.5212 | |
load: yes | 78.5904 | 16.0785 | 4.89 | <1e-05 | ||
Residual | 680.0319 |
Column | Variance | Std.Dev | Corr. | |
---|---|---|---|---|
item | (Intercept) | 133015.244 | 364.713 | |
prec: maintain | 63766.937 | 252.521 | -0.70 | |
subj | (Intercept) | 88819.437 | 298.026 | |
Residual | 462443.388 | 680.032 |
These two models are nested and can be compared with a likelihood-ratio test.
model-dof | deviance | χ² | χ²-dof | P(>χ²) | |
---|---|---|---|---|---|
rt_trunc ~ 1 + spkr + prec + load + (1 | subj) + (1 + prec | item) | 9 | 28664 | |||
rt_trunc ~ 1 + spkr + prec + load + spkr & prec + spkr & load + prec & load + spkr & prec & load + (1 + spkr + prec + load | subj) + (1 + spkr + prec + load | item) | 29 | 28638 | 26 | 20 | 0.1698 |
The p-value of approximately 17% leads us to prefer the simpler model, kbm02
, to the more complex, kbm01
.
Create a bootstrap sample of a few thousand parameter estimates from the reduced model. The pseudo-random number generator is initialized to a fixed value for reproducibility.
Table with 14 columns and 2000 rows:
obj β1 β2 β3 β4 σ σ1 σ2 ⋯
┌───────────────────────────────────────────────────────────────────────────
1 │ 28769.9 2152.69 60.3921 -359.134 54.1231 702.318 421.681 212.916 ⋯
2 │ 28775.3 2221.81 61.7318 -349.417 71.1992 697.028 442.812 278.931 ⋯
3 │ 28573.6 2325.31 53.9668 -404.225 81.3407 657.746 364.354 274.392 ⋯
4 │ 28682.9 2274.65 52.9438 -384.66 69.6788 682.801 398.901 319.725 ⋯
5 │ 28616.1 2191.34 54.7785 -357.919 111.829 673.878 253.244 235.912 ⋯
6 │ 28634.1 2176.73 95.9793 -378.383 82.1559 671.685 510.996 309.876 ⋯
7 │ 28620.4 2128.07 57.6928 -267.747 60.4621 666.986 396.878 295.009 ⋯
8 │ 28665.9 2152.59 55.8208 -274.382 76.5128 684.746 369.091 274.446 ⋯
9 │ 28651.2 2088.3 76.4636 -296.14 95.6124 680.282 265.684 186.041 ⋯
10 │ 28679.9 2375.23 72.2422 -423.922 94.0833 680.729 425.72 287.686 ⋯
11 │ 28686.8 2034.11 59.9149 -224.859 59.5448 682.433 308.677 272.183 ⋯
12 │ 28702.2 2130.98 83.3953 -271.61 79.974 689.679 401.98 264.258 ⋯
13 │ 28625.0 2055.35 55.3526 -363.937 67.7128 676.817 309.149 265.788 ⋯
14 │ 28570.3 2160.73 81.5575 -321.913 108.979 661.525 426.862 259.819 ⋯
15 │ 28613.6 2237.73 36.9996 -362.443 82.1812 673.873 312.859 241.592 ⋯
16 │ 28681.4 2306.64 43.8801 -429.983 103.082 686.437 360.536 204.514 ⋯
17 │ 28577.2 2143.78 66.2959 -373.598 93.157 668.15 321.469 183.119 ⋯
18 │ 28616.1 2197.03 55.9111 -356.06 74.1476 675.709 302.678 212.917 ⋯
19 │ 28771.6 2067.71 56.5032 -324.984 64.804 699.136 344.776 225.602 ⋯
20 │ 28674.3 2181.99 40.6566 -251.315 63.1301 683.81 340.569 295.054 ⋯
21 │ 28677.6 2129.89 72.7198 -245.006 84.7693 681.415 359.093 287.654 ⋯
22 │ 28582.1 2179.8 74.8364 -405.798 68.2978 664.168 357.696 231.402 ⋯
23 │ 28714.9 2326.34 74.7091 -400.441 67.79 692.381 354.032 259.052 ⋯
⋮ │ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋱
One of the uses of such a sample is to form “confidence intervals” on the parameters by obtaining the shortest interval that covers a given proportion (95%, by default) of the sample.
DictTable with 2 columns and 9 rows:
par lower upper
────┬─────────────────────
β1 │ 2028.01 2337.92
β2 │ 38.431 99.5944
β3 │ -439.321 -245.864
β4 │ 46.0262 107.511
ρ1 │ -0.897984 -0.445605
σ │ 655.25 701.497
σ1 │ 261.196 448.51
σ2 │ 175.489 312.045
σ3 │ 228.099 357.789
A sample like this can be used for more than just creating an interval because it approximates the distribution of the estimator. For the fixed-effects parameters the estimators are close to being normally distributed, Figure 1.