I have stated that the likelihood criterion used to fit linear mixed-effects can be considered as balancing fidelity to the data (i.e. fits the observed data well) versus model complexity.
This is similar to some of the criterion used in Machine Learning (ML), except that the criterion for LMMs has a rigorous mathematical basis.
In the shrinkage plot we consider the values of the random-effects coefficients for the fitted values of the model versus those from a model in which there is no penalty for model complexity.
If there is strong subject-to-subject variation then the model fit will tend to values of the random effects similar to those without a penalty on complexity.
If the random effects term is not contributing much (i.e. it is “inert”) then the random effects will be shrunk considerably towards zero in some directions.
Code
using CairoMakie
using DataFrames
using LinearAlgebra
using MixedModels
using MixedModelsMakie
using Random
using ProgressMeter
ProgressMeter.ijulia_behavior (: clear);
Load the kb07 data set (don’t tell Reinhold that I used these data).
kb07 = MixedModels.dataset (: kb07)
Arrow.Table with 1789 rows, 7 columns, and schema:
:subj String
:item String
:spkr String
:prec String
:load String
:rt_trunc Int16
:rt_raw Int16
contrasts = Dict (
: subj => Grouping (),
: item => Grouping (),
: spkr => HelmertCoding (),
: prec => HelmertCoding (),
: load => HelmertCoding (),
)
m1 = let
form = @formula (
rt_trunc ~
1 +
spkr * prec * load +
(1 + spkr + prec + load | subj) +
(1 + spkr + prec + load | item)
)
fit (MixedModel, form, kb07; contrasts)
end
Minimizing 934 Time: 0:00:01 ( 1.52 ms/it)
objective: 28637.97101024215
(Intercept)
2181.6424
77.3512
28.20
<1e-99
301.8676
362.4694
spkr: old
67.7496
17.9607
3.77
0.0002
33.0524
41.1185
prec: maintain
-333.9200
47.1564
-7.08
<1e-11
58.8607
247.3292
load: yes
78.8007
19.7260
3.99
<1e-04
66.9505
43.3885
spkr: old & prec: maintain
-21.9960
15.8191
-1.39
0.1644
spkr: old & load: yes
18.3832
15.8191
1.16
0.2452
prec: maintain & load: yes
4.5327
15.8191
0.29
0.7745
spkr: old & prec: maintain & load: yes
23.6377
15.8191
1.49
0.1351
Residual
669.0516
subj
(Intercept)
91124.0779
301.8676
spkr: old
1092.4628
33.0524
+1.00
prec: maintain
3464.5848
58.8607
-0.62
-0.62
load: yes
4482.3662
66.9505
+0.36
+0.36
+0.51
item
(Intercept)
131384.0733
362.4694
spkr: old
1690.7347
41.1185
+0.42
prec: maintain
61171.7458
247.3292
-0.69
+0.37
load: yes
1882.5594
43.3885
+0.29
+0.14
-0.13
Residual
447630.0111
669.0516
Linear mixed model fit by maximum likelihood
rt_trunc ~ 1 + spkr + prec + load + spkr & prec + spkr & load + prec & load + spkr & prec & load + (1 + spkr + prec + load | subj) + (1 + spkr + prec + load | item)
logLik -2 logLik AIC AICc BIC
-14318.9855 28637.9710 28695.9710 28696.9602 28855.1640
Variance components:
Column Variance Std.Dev. Corr.
subj (Intercept) 91124.0779 301.8676
spkr: old 1092.4628 33.0524 +1.00
prec: maintain 3464.5848 58.8607 -0.62 -0.62
load: yes 4482.3662 66.9505 +0.36 +0.36 +0.51
item (Intercept) 131384.0733 362.4694
spkr: old 1690.7347 41.1185 +0.42
prec: maintain 61171.7458 247.3292 -0.69 +0.37
load: yes 1882.5594 43.3885 +0.29 +0.14 -0.13
Residual 447630.0111 669.0516
Number of obs: 1789; levels of grouping factors: 56, 32
Fixed-effects parameters:
───────────────────────────────────────────────────────────────────────────────
Coef. Std. Error z Pr(>|z|)
───────────────────────────────────────────────────────────────────────────────
(Intercept) 2181.64 77.3512 28.20 <1e-99
spkr: old 67.7496 17.9607 3.77 0.0002
prec: maintain -333.92 47.1564 -7.08 <1e-11
load: yes 78.8007 19.726 3.99 <1e-04
spkr: old & prec: maintain -21.996 15.8191 -1.39 0.1644
spkr: old & load: yes 18.3832 15.8191 1.16 0.2452
prec: maintain & load: yes 4.53274 15.8191 0.29 0.7745
spkr: old & prec: maintain & load: yes 23.6377 15.8191 1.49 0.1351
───────────────────────────────────────────────────────────────────────────────
Expressing the covariance of random effects
Earlier today we mentioned that the parameters being optimized are from a “matrix square root” of the covariance matrix for the random effects. There is one such lower triangular matrix for each grouping factor.
l1 = first (m1.λ) # Cholesky factor of relative covariance for subj
4×4 LowerTriangular{Float64, Matrix{Float64}}:
0.451187 ⋅ ⋅ ⋅
0.0494019 0.0 ⋅ ⋅
-0.0547554 -0.00478924 0.0686931 ⋅
0.0359018 -0.00650109 0.0931791 0.0
Notice the zero on the diagonal. A triangular matrix with zeros on the diagonal is singular.
l2 = last (m1.λ) # this one is not singular
4×4 LowerTriangular{Float64, Matrix{Float64}}:
0.541766 ⋅ ⋅ ⋅
0.0259253 0.0557222 ⋅ ⋅
-0.253781 0.267843 0.0226331 ⋅
0.0189249 0.000871109 0.0620218 0.0
To regenerate the covariance matrix we need to know that the covariance is not the square of l1
, it is l1 * l1'
(so that the result is symmetric) and multiplied by σ̂²
Σ₁ = varest (m1) .* (l1 * l1' )
4×4 Matrix{Float64}:
91124.1 9977.46 -11058.7 7250.91
9977.46 1092.46 -1210.85 793.924
-11058.7 -1210.85 3464.58 1999.15
7250.91 793.924 1999.15 4482.37
diag (Σ₁) # compare to the variance column in the VarCorr output
4-element Vector{Float64}:
91124.07791836817
1092.4628467755617
3464.584800597221
4482.366224216353
4-element Vector{Float64}:
301.86764967178607
33.05242573209358
58.86072375189776
66.95047590731788
Shrinkage plots
The upper left panel shows the perfect negative correlation for those two components of the random effects.
8×1789 Matrix{Int64}:
1 1 1 1 1 1 1 1 1 1 … 1 1 1 1 1 1 1 1 1
-1 1 1 -1 -1 1 1 -1 -1 1 1 -1 -1 1 1 -1 -1 1 1
-1 1 -1 1 -1 1 -1 1 -1 1 -1 1 -1 1 -1 1 -1 1 -1
1 -1 -1 -1 -1 1 1 1 1 -1 1 1 1 -1 -1 -1 -1 1 1
1 1 -1 -1 1 1 -1 -1 1 1 -1 -1 1 1 -1 -1 1 1 -1
-1 -1 -1 1 1 1 1 -1 -1 -1 … 1 -1 -1 -1 -1 1 1 1 1
-1 -1 1 -1 1 1 -1 1 -1 -1 -1 1 -1 -1 1 -1 1 1 -1
1 -1 1 1 -1 1 -1 -1 1 -1 -1 -1 1 -1 1 1 -1 1 -1
8×8 Matrix{Int64}:
1789 -1 -1 3 -3 1 1 3
-1 1789 -3 1 -1 3 3 1
-1 -3 1789 1 -1 3 3 1
3 1 1 1789 3 -1 -1 -3
-3 -1 -1 3 1789 1 1 3
1 3 3 -1 1 1789 -3 -1
1 3 3 -1 1 -3 1789 -1
3 1 1 -3 3 -1 -1 1789
How to interpret a shrinkage plot
Extreme shrinkage (shrunk to a line or to a point) is easy to interpret - the term is not providing benefit and can be removed.
When the range of the blue dots (shrunk values) is comparable to those of the red dots (unshrunk) it indicates that the term after shrinkage is about as strong as without shrinkage.
By itself, this doesn’t mean that the term is important. In some ways you need to get a feeling for the absolute magnitude of the random effects in addition to the relative magnitude.
Small magnitude and small relative magnitude indicate you can drop that term
Conclusions from these plots
Only the intercept for the subj
appears to be contributing explanatory power
For the item
both the intercept and the spkr
appear to be contributing
m2 = let
form = @formula (
rt_trunc ~
1 + prec * spkr * load + (1 | subj) + (1 + prec | item)
)
fit (MixedModel, form, kb07; contrasts)
end
(Intercept)
2181.7582
77.4709
28.16
<1e-99
364.7286
298.1109
prec: maintain
-333.8582
47.4629
-7.03
<1e-11
252.6687
spkr: old
67.8114
16.0526
4.22
<1e-04
load: yes
78.6849
16.0525
4.90
<1e-06
prec: maintain & spkr: old
-21.8802
16.0525
-1.36
0.1729
prec: maintain & load: yes
4.4710
16.0526
0.28
0.7806
spkr: old & load: yes
18.3214
16.0526
1.14
0.2537
prec: maintain & spkr: old & load: yes
23.5219
16.0525
1.47
0.1428
Residual
678.9318
item
(Intercept)
133026.916
364.729
prec: maintain
63841.497
252.669
-0.70
subj
(Intercept)
88870.081
298.111
Residual
460948.432
678.932
m3 = let
form = @formula (
rt_trunc ~
1 + prec + spkr + load + (1 | subj) + (1 + prec | item)
)
fit (MixedModel, form, kb07; contrasts)
end
(Intercept)
2181.8526
77.4681
28.16
<1e-99
364.7125
298.0259
prec: maintain
-333.7906
47.4472
-7.03
<1e-11
252.5212
spkr: old
67.8790
16.0785
4.22
<1e-04
load: yes
78.5904
16.0785
4.89
<1e-05
Residual
680.0319
item
(Intercept)
133015.243
364.713
prec: maintain
63766.936
252.521
-0.70
subj
(Intercept)
88819.436
298.026
Residual
462443.388
680.032
rng = Random .seed! (1234321 );
m3btstrp = parametricbootstrap (rng, 2000 , m3);
DataFrame (shortestcovint (m3btstrp))
9 rows × 5 columns
type group names lower upper String String? String? Float64 Float64 1 β missing (Intercept) 2013.95 2319.53 2 β missing prec: maintain -429.807 -241.429 3 β missing spkr: old 35.3339 95.7272 4 β missing load: yes 47.067 111.04 5 σ item (Intercept) 267.788 452.9 6 σ item prec: maintain 171.547 314.702 7 ρ item (Intercept), prec: maintain -0.893081 -0.457084 8 σ subj (Intercept) 235.922 364.717 9 σ residual missing 657.736 703.054
m4 = let
form = @formula (
rt_trunc ~
1 + prec + spkr + load + (1 + prec | item) + (1 | subj)
)
fit (MixedModel, form, kb07; contrasts)
end
(Intercept)
2181.8526
77.4681
28.16
<1e-99
364.7125
298.0259
prec: maintain
-333.7906
47.4472
-7.03
<1e-11
252.5212
spkr: old
67.8790
16.0785
4.22
<1e-04
load: yes
78.5904
16.0785
4.89
<1e-05
Residual
680.0319
m4bstrp = parametricbootstrap (rng, 2000 , m4);
ridgeplot (m4bstrp; show_intercept= false )
DataFrame (shortestcovint (m4bstrp))
9 rows × 5 columns
type group names lower upper String String? String? Float64 Float64 1 β missing (Intercept) 2008.72 2319.8 2 β missing prec: maintain -433.808 -248.858 3 β missing spkr: old 35.4729 97.9576 4 β missing load: yes 47.0078 108.437 5 σ item (Intercept) 261.52 444.426 6 σ item prec: maintain 177.438 318.854 7 ρ item (Intercept), prec: maintain -0.898522 -0.477346 8 σ subj (Intercept) 229.031 356.407 9 σ residual missing 656.918 701.946
item
(Intercept)
133015.243
364.713
prec: maintain
63766.936
252.521
-0.70
subj
(Intercept)
88819.436
298.026
Residual
462443.388
680.032
Code
let mods = [m1, m2, m4]
DataFrame (;
geomdof= (sum ∘ leverage).(mods),
npar= dof .(mods),
deviance= deviance .(mods),
AIC= aic .(mods),
BIC= bic .(mods),
AICc= aicc .(mods),
)
end
3 rows × 6 columns
geomdof npar deviance AIC BIC AICc Float64 Int64 Float64 Float64 Float64 Float64 1 130.125 29 28638.0 28696.0 28855.2 28697.0 2 107.543 13 28658.5 28684.5 28755.8 28684.7 3 103.478 9 28663.9 28681.9 28731.3 28682.0
scatter (fitted (m4), residuals (m4))