I have stated that the likelihood criterion used to fit linear mixed-effects can be considered as balancing fidelity to the data (i.e. fits the observed data well) versus model complexity.
This is similar to some of the criterion used in Machine Learning (ML), except that the criterion for LMMs has a rigorous mathematical basis.
In the shrinkage plot we consider the values of the random-effects coefficients for the fitted values of the model versus those from a model in which there is no penalty for model complexity.
If there is strong subject-to-subject variation then the model fit will tend to values of the random effects similar to those without a penalty on complexity.
If the random effects term is not contributing much (i.e. it is “inert”) then the random effects will be shrunk considerably towards zero in some directions.
Code
using CairoMakie
using DataFrames
using LinearAlgebra
using MixedModels
using MixedModelsMakie
using Random
using ProgressMeter
ProgressMeter.ijulia_behavior (: clear);
Load the kb07 data set (don’t tell Reinhold that I used these data).
kb07 = MixedModels.dataset (: kb07)
Arrow.Table with 1789 rows, 7 columns, and schema:
:subj String
:item String
:spkr String
:prec String
:load String
:rt_trunc Int16
:rt_raw Int16
contrasts = Dict (
: subj => Grouping (),
: item => Grouping (),
: spkr => HelmertCoding (),
: prec => HelmertCoding (),
: load => HelmertCoding (),
)
m1 = let
form = @formula (
rt_trunc ~
1 +
spkr * prec * load +
(1 + spkr + prec + load | subj) +
(1 + spkr + prec + load | item)
)
fit (MixedModel, form, kb07; contrasts)
end
Minimizing 874 Time: 0:00:01 ( 1.43 ms/it)
objective: 28637.971010073215
(Intercept)
2181.6424
77.3515
28.20
<1e-99
301.8721
362.4695
spkr: old
67.7496
17.9607
3.77
0.0002
33.0588
41.1159
prec: maintain
-333.9200
47.1563
-7.08
<1e-11
58.8512
247.3299
load: yes
78.8007
19.7269
3.99
<1e-04
66.9526
43.3991
spkr: old & prec: maintain
-21.9960
15.8191
-1.39
0.1644
spkr: old & load: yes
18.3832
15.8191
1.16
0.2452
prec: maintain & load: yes
4.5327
15.8191
0.29
0.7745
spkr: old & prec: maintain & load: yes
23.6377
15.8191
1.49
0.1351
Residual
669.0515
subj
(Intercept)
91126.7545
301.8721
spkr: old
1092.8862
33.0588
+1.00
prec: maintain
3463.4603
58.8512
-0.62
-0.62
load: yes
4482.6493
66.9526
+0.36
+0.36
+0.51
item
(Intercept)
131384.1084
362.4695
spkr: old
1690.5210
41.1159
+0.42
prec: maintain
61172.0781
247.3299
-0.69
+0.37
load: yes
1883.4785
43.3991
+0.29
+0.14
-0.13
Residual
447629.9270
669.0515
Linear mixed model fit by maximum likelihood
rt_trunc ~ 1 + spkr + prec + load + spkr & prec + spkr & load + prec & load + spkr & prec & load + (1 + spkr + prec + load | subj) + (1 + spkr + prec + load | item)
logLik -2 logLik AIC AICc BIC
-14318.9855 28637.9710 28695.9710 28696.9602 28855.1640
Variance components:
Column Variance Std.Dev. Corr.
subj (Intercept) 91126.7545 301.8721
spkr: old 1092.8862 33.0588 +1.00
prec: maintain 3463.4603 58.8512 -0.62 -0.62
load: yes 4482.6493 66.9526 +0.36 +0.36 +0.51
item (Intercept) 131384.1084 362.4695
spkr: old 1690.5210 41.1159 +0.42
prec: maintain 61172.0781 247.3299 -0.69 +0.37
load: yes 1883.4785 43.3991 +0.29 +0.14 -0.13
Residual 447629.9270 669.0515
Number of obs: 1789; levels of grouping factors: 56, 32
Fixed-effects parameters:
───────────────────────────────────────────────────────────────────────────────
Coef. Std. Error z Pr(>|z|)
───────────────────────────────────────────────────────────────────────────────
(Intercept) 2181.64 77.3515 28.20 <1e-99
spkr: old 67.7496 17.9607 3.77 0.0002
prec: maintain -333.92 47.1563 -7.08 <1e-11
load: yes 78.8007 19.7269 3.99 <1e-04
spkr: old & prec: maintain -21.996 15.8191 -1.39 0.1644
spkr: old & load: yes 18.3832 15.8191 1.16 0.2452
prec: maintain & load: yes 4.53273 15.8191 0.29 0.7745
spkr: old & prec: maintain & load: yes 23.6377 15.8191 1.49 0.1351
───────────────────────────────────────────────────────────────────────────────
Expressing the covariance of random effects
Earlier today we mentioned that the parameters being optimized are from a “matrix square root” of the covariance matrix for the random effects. There is one such lower triangular matrix for each grouping factor.
l1 = first (m1.λ) # Cholesky factor of relative covariance for subj
4×4 LowerTriangular{Float64, Matrix{Float64}}:
0.451194 ⋅ ⋅ ⋅
0.0494115 0.0 ⋅ ⋅
-0.0547474 -0.0357256 0.0588535 ⋅
0.0359075 -0.0484787 0.0798414 0.0
Notice the zero on the diagonal. A triangular matrix with zeros on the diagonal is singular.
l2 = last (m1.λ) # this one is also singular
4×4 LowerTriangular{Float64, Matrix{Float64}}:
0.541766 ⋅ ⋅ ⋅
0.0259255 0.0557178 ⋅ ⋅
-0.253795 0.26783 0.0226493 ⋅
0.0189316 0.000867924 0.0620364 0.0
To regenerate the covariance matrix we need to know that the covariance is not the square of l1
, it is l1 * l1'
(so that the result is symmetric) and multiplied by σ̂²
Σ₁ = varest (m1) .* (l1 * l1' )
4×4 Matrix{Float64}:
91126.8 9979.54 -11057.2 7252.16
9979.54 1092.89 -1210.91 794.203
-11057.2 -1210.91 3463.46 1998.68
7252.16 794.203 1998.68 4482.65
diag (Σ₁) # compare to the variance column in the VarCorr output
4-element Vector{Float64}:
91126.75451781915
1092.8862206704753
3463.4602904990634
4482.649272250181
4-element Vector{Float64}:
301.8720830381954
33.05882969299542
58.85117068078649
66.95258973520129
Shrinkage plots
The upper left panel shows the perfect negative correlation for those two components of the random effects.
8×1789 Matrix{Int64}:
1 1 1 1 1 1 1 1 1 1 … 1 1 1 1 1 1 1 1 1
-1 1 1 -1 -1 1 1 -1 -1 1 1 -1 -1 1 1 -1 -1 1 1
-1 1 -1 1 -1 1 -1 1 -1 1 -1 1 -1 1 -1 1 -1 1 -1
1 -1 -1 -1 -1 1 1 1 1 -1 1 1 1 -1 -1 -1 -1 1 1
1 1 -1 -1 1 1 -1 -1 1 1 -1 -1 1 1 -1 -1 1 1 -1
-1 -1 -1 1 1 1 1 -1 -1 -1 … 1 -1 -1 -1 -1 1 1 1 1
-1 -1 1 -1 1 1 -1 1 -1 -1 -1 1 -1 -1 1 -1 1 1 -1
1 -1 1 1 -1 1 -1 -1 1 -1 -1 -1 1 -1 1 1 -1 1 -1
8×8 Matrix{Int64}:
1789 -1 -1 3 -3 1 1 3
-1 1789 -3 1 -1 3 3 1
-1 -3 1789 1 -1 3 3 1
3 1 1 1789 3 -1 -1 -3
-3 -1 -1 3 1789 1 1 3
1 3 3 -1 1 1789 -3 -1
1 3 3 -1 1 -3 1789 -1
3 1 1 -3 3 -1 -1 1789
How to interpret a shrinkage plot
Extreme shrinkage (shrunk to a line or to a point) is easy to interpret - the term is not providing benefit and can be removed.
When the range of the blue dots (shrunk values) is comparable to those of the red dots (unshrunk) it indicates that the term after shrinkage is about as strong as without shrinkage.
By itself, this doesn’t mean that the term is important. In some ways you need to get a feeling for the absolute magnitude of the random effects in addition to the relative magnitude.
Small magnitude and small relative magnitude indicate you can drop that term
Conclusions from these plots
Only the intercept for the subj
appears to be contributing explanatory power
For the item
both the intercept and the spkr
appear to be contributing
m2 = let
form = @formula (
rt_trunc ~
1 + prec * spkr * load + (1 | subj) + (1 + prec | item)
)
fit (MixedModel, form, kb07; contrasts)
end
(Intercept)
2181.7582
77.4709
28.16
<1e-99
364.7286
298.1109
prec: maintain
-333.8582
47.4629
-7.03
<1e-11
252.6687
spkr: old
67.8114
16.0526
4.22
<1e-04
load: yes
78.6849
16.0525
4.90
<1e-06
prec: maintain & spkr: old
-21.8802
16.0525
-1.36
0.1729
prec: maintain & load: yes
4.4710
16.0526
0.28
0.7806
spkr: old & load: yes
18.3214
16.0526
1.14
0.2537
prec: maintain & spkr: old & load: yes
23.5219
16.0525
1.47
0.1428
Residual
678.9318
item
(Intercept)
133026.918
364.729
prec: maintain
63841.496
252.669
-0.70
subj
(Intercept)
88870.080
298.111
Residual
460948.432
678.932
m3 = let
form = @formula (
rt_trunc ~
1 + prec + spkr + load + (1 | subj) + (1 + prec | item)
)
fit (MixedModel, form, kb07; contrasts)
end
(Intercept)
2181.8526
77.4681
28.16
<1e-99
364.7126
298.0259
prec: maintain
-333.7906
47.4472
-7.03
<1e-11
252.5212
spkr: old
67.8790
16.0785
4.22
<1e-04
load: yes
78.5904
16.0785
4.89
<1e-05
Residual
680.0319
item
(Intercept)
133015.244
364.713
prec: maintain
63766.937
252.521
-0.70
subj
(Intercept)
88819.436
298.026
Residual
462443.388
680.032
rng = Random .seed! (1234321 );
m3btstrp = parametricbootstrap (rng, 2000 , m3);
DataFrame (shortestcovint (m3btstrp))
1
β
missing
(Intercept)
2013.95
2319.53
2
β
missing
prec: maintain
-429.807
-241.429
3
β
missing
spkr: old
35.3339
95.7272
4
β
missing
load: yes
47.067
111.04
5
σ
item
(Intercept)
267.788
452.9
6
σ
item
prec: maintain
171.547
314.7
7
ρ
item
(Intercept), prec: maintain
-0.893081
-0.457084
8
σ
subj
(Intercept)
235.921
364.717
9
σ
residual
missing
657.736
703.054
ridgeplot (m3btstrp; show_intercept= false )
m4 = let
form = @formula (
rt_trunc ~
1 + prec + spkr + load + (1 + prec | item) + (1 | subj)
)
fit (MixedModel, form, kb07; contrasts)
end
(Intercept)
2181.8526
77.4681
28.16
<1e-99
364.7126
298.0259
prec: maintain
-333.7906
47.4472
-7.03
<1e-11
252.5212
spkr: old
67.8790
16.0785
4.22
<1e-04
load: yes
78.5904
16.0785
4.89
<1e-05
Residual
680.0319
m4bstrp = parametricbootstrap (rng, 2000 , m4);
ridgeplot (m4bstrp; show_intercept= false )
DataFrame (shortestcovint (m4bstrp))
1
β
missing
(Intercept)
2030.75
2342.66
2
β
missing
prec: maintain
-432.619
-249.063
3
β
missing
spkr: old
35.4729
97.9576
4
β
missing
load: yes
47.0469
108.272
5
σ
item
(Intercept)
261.52
444.426
6
σ
item
prec: maintain
177.803
318.382
7
ρ
item
(Intercept), prec: maintain
-0.904897
-0.477346
8
σ
subj
(Intercept)
234.301
361.964
9
σ
residual
missing
657.11
701.879
item
(Intercept)
133015.244
364.713
prec: maintain
63766.937
252.521
-0.70
subj
(Intercept)
88819.436
298.026
Residual
462443.388
680.032
Code
let mods = [m1, m2, m4]
DataFrame (;
geomdof= (sum ∘ leverage).(mods),
npar= dof .(mods),
deviance= deviance .(mods),
AIC= aic .(mods),
BIC= bic .(mods),
AICc= aicc .(mods),
)
end
1
130.126
29
28638.0
28696.0
28855.2
28697.0
2
107.543
13
28658.5
28684.5
28755.8
28684.7
3
103.478
9
28663.9
28681.9
28731.3
28682.0
scatter (fitted (m4), residuals (m4))
Back to top