I have stated that the likelihood criterion used to fit linear mixed-effects can be considered as balancing fidelity to the data (i.e. fits the observed data well) versus model complexity.
This is similar to some of the criterion used in Machine Learning (ML), except that the criterion for LMMs has a rigorous mathematical basis.
In the shrinkage plot we consider the values of the random-effects coefficients for the fitted values of the model versus those from a model in which there is no penalty for model complexity.
If there is strong subject-to-subject variation then the model fit will tend to values of the random effects similar to those without a penalty on complexity.
If the random effects term is not contributing much (i.e. it is “inert”) then the random effects will be shrunk considerably towards zero in some directions.
Code
using CairoMakie
using DataFrames
using LinearAlgebra
using MixedModels
using MixedModelsMakie
using Random
using ProgressMeter
const progress = false
Load the kb07 data set (don’t tell Reinhold that I used these data).
kb07 = MixedModels.dataset (: kb07)
Arrow.Table with 1789 rows, 7 columns, and schema:
:subj String
:item String
:spkr String
:prec String
:load String
:rt_trunc Int16
:rt_raw Int16
contrasts = Dict (
: spkr => HelmertCoding (),
: prec => HelmertCoding (),
: load => HelmertCoding (),
)
m1 = let
form = @formula (
rt_trunc ~
1 +
spkr * prec * load +
(1 + spkr + prec + load | subj) +
(1 + spkr + prec + load | item)
)
fit (MixedModel, form, kb07; contrasts, progress)
end
Est.
SE
z
p
σ_subj
σ_item
(Intercept)
2181.6728
77.3040
28.22
<1e-99
301.7798
362.2071
spkr: old
67.7485
18.2919
3.70
0.0002
42.9978
40.6939
prec: maintain
-333.9212
47.1520
-7.08
<1e-11
62.0047
246.8895
load: yes
78.7703
19.5373
4.03
<1e-04
65.1516
42.3617
spkr: old & prec: maintain
-21.9656
15.8060
-1.39
0.1646
spkr: old & load: yes
18.3843
15.8060
1.16
0.2448
prec: maintain & load: yes
4.5339
15.8060
0.29
0.7742
spkr: old & prec: maintain & load: yes
23.6074
15.8060
1.49
0.1353
Residual
668.4953
Column
Variance
Std.Dev
Corr.
subj
(Intercept)
91071.0419
301.7798
spkr: old
1848.8141
42.9978
+0.78
prec: maintain
3844.5766
62.0047
-0.59
+0.03
load: yes
4244.7317
65.1516
+0.36
+0.83
+0.53
item
(Intercept)
131193.9807
362.2071
spkr: old
1655.9905
40.6939
+0.44
prec: maintain
60954.4361
246.8895
-0.69
+0.35
load: yes
1794.5158
42.3617
+0.32
+0.16
-0.14
Residual
446885.9859
668.4953
Linear mixed model fit by maximum likelihood
rt_trunc ~ 1 + spkr + prec + load + spkr & prec + spkr & load + prec & load + spkr & prec & load + (1 + spkr + prec + load | subj) + (1 + spkr + prec + load | item)
logLik -2 logLik AIC AICc BIC
-14318.5613 28637.1227 28695.1227 28696.1119 28854.3156
Variance components:
Column Variance Std.Dev. Corr.
subj (Intercept) 91071.0419 301.7798
spkr: old 1848.8141 42.9978 +0.78
prec: maintain 3844.5766 62.0047 -0.59 +0.03
load: yes 4244.7317 65.1516 +0.36 +0.83 +0.53
item (Intercept) 131193.9807 362.2071
spkr: old 1655.9905 40.6939 +0.44
prec: maintain 60954.4361 246.8895 -0.69 +0.35
load: yes 1794.5158 42.3617 +0.32 +0.16 -0.14
Residual 446885.9859 668.4953
Number of obs: 1789; levels of grouping factors: 56, 32
Fixed-effects parameters:
───────────────────────────────────────────────────────────────────────────────
Coef. Std. Error z Pr(>|z|)
───────────────────────────────────────────────────────────────────────────────
(Intercept) 2181.67 77.304 28.22 <1e-99
spkr: old 67.7485 18.2919 3.70 0.0002
prec: maintain -333.921 47.152 -7.08 <1e-11
load: yes 78.7703 19.5373 4.03 <1e-04
spkr: old & prec: maintain -21.9656 15.806 -1.39 0.1646
spkr: old & load: yes 18.3843 15.806 1.16 0.2448
prec: maintain & load: yes 4.53388 15.806 0.29 0.7742
spkr: old & prec: maintain & load: yes 23.6074 15.806 1.49 0.1353
───────────────────────────────────────────────────────────────────────────────
Expressing the covariance of random effects
Earlier today we mentioned that the parameters being optimized are from a “matrix square root” of the covariance matrix for the random effects. There is one such lower triangular matrix for each grouping factor.
l1 = first (m1.λ) # Cholesky factor of relative covariance for subj
4×4 LowerTriangular{Float64, Matrix{Float64}}:
0.451431 ⋅ ⋅ ⋅
0.0502513 0.0401487 ⋅ ⋅
-0.0550302 0.0726991 0.0170162 ⋅
0.0351681 0.0850799 0.0319858 0.0
Notice the zero on the diagonal. A triangular matrix with zeros on the diagonal is singular.
l2 = last (m1.λ) # this one is also singular
4×4 LowerTriangular{Float64, Matrix{Float64}}:
0.541824 ⋅ ⋅ ⋅
0.0268639 0.0546256 ⋅ ⋅
-0.253073 0.268004 0.0229392 ⋅
0.0200491 0.00127555 0.0601 0.0
To regenerate the covariance matrix we need to know that the covariance is not the square of l1, it is l1 * l1' (so that the result is symmetric) and multiplied by σ̂²
Σ₁ = varest (m1) .* (l1 * l1' )
4×4 Matrix{Float64}:
91071.0 10137.6 -11101.7 7094.75
10137.6 1848.81 68.5686 2316.25
-11101.7 68.5686 3844.58 2142.46
7094.75 2316.25 2142.46 4244.73
diag (Σ₁) # compare to the variance column in the VarCorr output
4-element Vector{Float64}:
91071.04188496091
1848.8141310725193
3844.576646699094
4244.73166099835
4-element Vector{Float64}:
301.7797903852425
42.99783867908385
62.00465020221543
65.15160520661291
Shrinkage plots
The upper left panel shows the perfect negative correlation for those two components of the random effects.
8×1789 Matrix{Int64}:
1 1 1 1 1 1 1 1 1 1 … 1 1 1 1 1 1 1 1 1
-1 1 1 -1 -1 1 1 -1 -1 1 1 -1 -1 1 1 -1 -1 1 1
-1 1 -1 1 -1 1 -1 1 -1 1 -1 1 -1 1 -1 1 -1 1 -1
1 -1 -1 -1 -1 1 1 1 1 -1 1 1 1 -1 -1 -1 -1 1 1
1 1 -1 -1 1 1 -1 -1 1 1 -1 -1 1 1 -1 -1 1 1 -1
-1 -1 -1 1 1 1 1 -1 -1 -1 … 1 -1 -1 -1 -1 1 1 1 1
-1 -1 1 -1 1 1 -1 1 -1 -1 -1 1 -1 -1 1 -1 1 1 -1
1 -1 1 1 -1 1 -1 -1 1 -1 -1 -1 1 -1 1 1 -1 1 -1
8×8 Matrix{Int64}:
1789 -1 -1 3 -3 1 1 3
-1 1789 -3 1 -1 3 3 1
-1 -3 1789 1 -1 3 3 1
3 1 1 1789 3 -1 -1 -3
-3 -1 -1 3 1789 1 1 3
1 3 3 -1 1 1789 -3 -1
1 3 3 -1 1 -3 1789 -1
3 1 1 -3 3 -1 -1 1789
How to interpret a shrinkage plot
Extreme shrinkage (shrunk to a line or to a point) is easy to interpret – the term is not providing benefit and can be removed.
When the range of the blue dots (shrunk values) is comparable to those of the red dots (unshrunk) it indicates that the term after shrinkage is about as strong as without shrinkage.
By itself, this doesn’t mean that the term is important. In some ways you need to get a feeling for the absolute magnitude of the random effects in addition to the relative magnitude.
Small magnitude and small relative magnitude indicate you can drop that term
Conclusions from these plots
Only the intercept for the subj appears to be contributing explanatory power
For the item both the intercept and the spkr appear to be contributing
m2 = let
form = @formula (
rt_trunc ~
1 + prec * spkr * load + (1 | subj) + (1 + prec | item)
)
fit (MixedModel, form, kb07; contrasts, progress)
end
Est.
SE
z
p
σ_item
σ_subj
(Intercept)
2181.7582
77.4710
28.16
<1e-99
364.7293
298.1107
prec: maintain
-333.8582
47.4631
-7.03
<1e-11
252.6694
spkr: old
67.8114
16.0526
4.22
<1e-04
load: yes
78.6849
16.0525
4.90
<1e-06
prec: maintain & spkr: old
-21.8802
16.0525
-1.36
0.1729
prec: maintain & load: yes
4.4710
16.0526
0.28
0.7806
spkr: old & load: yes
18.3214
16.0526
1.14
0.2537
prec: maintain & spkr: old & load: yes
23.5219
16.0525
1.47
0.1428
Residual
678.9319
Column
Variance
Std.Dev
Corr.
item
(Intercept)
133027.438
364.729
prec: maintain
63841.835
252.669
-0.70
subj
(Intercept)
88870.014
298.111
Residual
460948.573
678.932
m3 = let
form = @formula (
rt_trunc ~
1 + prec + spkr + load + (1 | subj) + (1 + prec | item)
)
fit (MixedModel, form, kb07; contrasts, progress)
end
Est.
SE
z
p
σ_item
σ_subj
(Intercept)
2181.8526
77.4680
28.16
<1e-99
364.7121
298.0257
prec: maintain
-333.7906
47.4476
-7.03
<1e-11
252.5236
spkr: old
67.8790
16.0785
4.22
<1e-04
load: yes
78.5904
16.0785
4.89
<1e-05
Residual
680.0318
Column
Variance
Std.Dev
Corr.
item
(Intercept)
133014.905
364.712
prec: maintain
63768.184
252.524
-0.70
subj
(Intercept)
88819.302
298.026
Residual
462443.261
680.032
rng = Random .seed! (1234321 );
m3btstrp = parametricbootstrap (rng, 2000 , m3);
DataFrame (shortestcovint (m3btstrp))
Row
type
group
names
lower
upper
String
String?
String?
Float64
Float64
1
β
missing
(Intercept)
2022.91
2334.0
2
β
missing
prec: maintain
-430.239
-239.802
3
β
missing
spkr: old
34.0592
96.72
4
β
missing
load: yes
46.5349
109.526
5
σ
item
(Intercept)
270.065
451.979
6
σ
item
prec: maintain
181.739
325.127
7
ρ
item
(Intercept), prec: maintain
-0.907255
-0.490094
8
σ
subj
(Intercept)
233.834
364.504
9
σ
residual
missing
657.341
702.655
ridgeplot (m3btstrp; show_intercept= false )
m4 = let
form = @formula (
rt_trunc ~
1 + prec + spkr + load + (1 + prec | item) + (1 | subj)
)
fit (MixedModel, form, kb07; contrasts, progress)
end
Est.
SE
z
p
σ_item
σ_subj
(Intercept)
2181.8526
77.4680
28.16
<1e-99
364.7121
298.0257
prec: maintain
-333.7906
47.4476
-7.03
<1e-11
252.5236
spkr: old
67.8790
16.0785
4.22
<1e-04
load: yes
78.5904
16.0785
4.89
<1e-05
Residual
680.0318
m4bstrp = parametricbootstrap (rng, 2000 , m4);
ridgeplot (m4bstrp; show_intercept= false )
DataFrame (shortestcovint (m4bstrp))
Row
type
group
names
lower
upper
String
String?
String?
Float64
Float64
1
β
missing
(Intercept)
2034.15
2335.45
2
β
missing
prec: maintain
-427.769
-248.337
3
β
missing
spkr: old
35.6938
97.8081
4
β
missing
load: yes
45.3095
107.368
5
σ
item
(Intercept)
260.498
451.649
6
σ
item
prec: maintain
179.495
315.414
7
ρ
item
(Intercept), prec: maintain
-0.901831
-0.471356
8
σ
subj
(Intercept)
236.684
360.363
9
σ
residual
missing
657.178
702.739
Column
Variance
Std.Dev
Corr.
item
(Intercept)
133014.905
364.712
prec: maintain
63768.184
252.524
-0.70
subj
(Intercept)
88819.302
298.026
Residual
462443.261
680.032
Code
let mods = [m1, m2, m4]
DataFrame (;
geomdof= (sum ∘ leverage).(mods),
npar= dof .(mods),
deviance= deviance .(mods),
AIC= aic .(mods),
BIC= bic .(mods),
AICc= aicc .(mods),
)
end
Row
geomdof
npar
deviance
AIC
BIC
AICc
Float64
Int64
Float64
Float64
Float64
Float64
1
131.593
29
28637.1
28695.1
28854.3
28696.1
2
107.543
13
28658.5
28684.5
28755.8
28684.7
3
103.478
9
28663.9
28681.9
28731.3
28682.0
scatter (fitted (m4), residuals (m4))
This page was rendered from git revision f875caa
using Quarto 1.8.26.
Back to top