I have stated that the likelihood criterion used to fit linear mixed-effects can be considered as balancing fidelity to the data (i.e. fits the observed data well) versus model complexity.
This is similar to some of the criterion used in Machine Learning (ML), except that the criterion for LMMs has a rigorous mathematical basis.
In the shrinkage plot we consider the values of the random-effects coefficients for the fitted values of the model versus those from a model in which there is no penalty for model complexity.
If there is strong subject-to-subject variation then the model fit will tend to values of the random effects similar to those without a penalty on complexity.
If the random effects term is not contributing much (i.e. it is “inert”) then the random effects will be shrunk considerably towards zero in some directions.
Code
using CairoMakie
using DataFrames
using LinearAlgebra
using MixedModels
using MixedModelsMakie
using Random
using ProgressMeter
const progress = false
Load the kb07 data set (don’t tell Reinhold that I used these data).
kb07 = MixedModels.dataset (: kb07)
Arrow.Table with 1789 rows, 7 columns, and schema:
:subj String
:item String
:spkr String
:prec String
:load String
:rt_trunc Int16
:rt_raw Int16
contrasts = Dict (
: spkr => HelmertCoding (),
: prec => HelmertCoding (),
: load => HelmertCoding (),
)
m1 = let
form = @formula (
rt_trunc ~
1 +
spkr * prec * load +
(1 + spkr + prec + load | subj) +
(1 + spkr + prec + load | item)
)
fit (MixedModel, form, kb07; contrasts, progress)
end
Est.
SE
z
p
σ_subj
σ_item
(Intercept)
2181.6728
77.3025
28.22
<1e-99
301.7795
362.1969
spkr: old
67.7486
18.2889
3.70
0.0002
42.9237
40.6934
prec: maintain
-333.9211
47.1525
-7.08
<1e-11
61.9966
246.8936
load: yes
78.7703
19.5367
4.03
<1e-04
65.1301
42.3692
spkr: old & prec: maintain
-21.9656
15.8062
-1.39
0.1646
spkr: old & load: yes
18.3842
15.8062
1.16
0.2448
prec: maintain & load: yes
4.5338
15.8062
0.29
0.7742
spkr: old & prec: maintain & load: yes
23.6074
15.8062
1.49
0.1353
Residual
668.5033
Column
Variance
Std.Dev
Corr.
subj
(Intercept)
91070.8712
301.7795
spkr: old
1842.4411
42.9237
+0.78
prec: maintain
3843.5766
61.9966
-0.59
+0.03
load: yes
4241.9308
65.1301
+0.36
+0.83
+0.53
item
(Intercept)
131186.5676
362.1969
spkr: old
1655.9495
40.6934
+0.44
prec: maintain
60956.4434
246.8936
-0.69
+0.35
load: yes
1795.1488
42.3692
+0.32
+0.16
-0.14
Residual
446896.6953
668.5033
Linear mixed model fit by maximum likelihood
rt_trunc ~ 1 + spkr + prec + load + spkr & prec + spkr & load + prec & load + spkr & prec & load + (1 + spkr + prec + load | subj) + (1 + spkr + prec + load | item)
logLik -2 logLik AIC AICc BIC
-14318.5614 28637.1227 28695.1227 28696.1119 28854.3157
Variance components:
Column Variance Std.Dev. Corr.
subj (Intercept) 91070.8712 301.7795
spkr: old 1842.4411 42.9237 +0.78
prec: maintain 3843.5766 61.9966 -0.59 +0.03
load: yes 4241.9308 65.1301 +0.36 +0.83 +0.53
item (Intercept) 131186.5676 362.1969
spkr: old 1655.9495 40.6934 +0.44
prec: maintain 60956.4434 246.8936 -0.69 +0.35
load: yes 1795.1488 42.3692 +0.32 +0.16 -0.14
Residual 446896.6953 668.5033
Number of obs: 1789; levels of grouping factors: 56, 32
Fixed-effects parameters:
──────────────────────────────────────────────────────────────────────────────
Coef. Std. Error z Pr(>|z|)
──────────────────────────────────────────────────────────────────────────────
(Intercept) 2181.67 77.3025 28.22 <1e-99
spkr: old 67.7486 18.2889 3.70 0.0002
prec: maintain -333.921 47.1525 -7.08 <1e-11
load: yes 78.7703 19.5367 4.03 <1e-04
spkr: old & prec: maintain -21.9656 15.8062 -1.39 0.1646
spkr: old & load: yes 18.3842 15.8062 1.16 0.2448
prec: maintain & load: yes 4.5338 15.8062 0.29 0.7742
spkr: old & prec: maintain & load: yes 23.6074 15.8062 1.49 0.1353
──────────────────────────────────────────────────────────────────────────────
Expressing the covariance of random effects
Earlier today we mentioned that the parameters being optimized are from a “matrix square root” of the covariance matrix for the random effects. There is one such lower triangular matrix for each grouping factor.
l1 = first (m1.λ) # Cholesky factor of relative covariance for subj
4×4 LowerTriangular{Float64, Matrix{Float64}}:
0.451426 ⋅ ⋅ ⋅
0.0502539 0.0399661 ⋅ ⋅
-0.0550269 0.0729273 0.0159447 ⋅
0.0351603 0.0857539 0.0300331 0.0
Notice the zero on the diagonal. A triangular matrix with zeros on the diagonal is singular.
l2 = last (m1.λ) # this one is also singular
4×4 LowerTriangular{Float64, Matrix{Float64}}:
0.541803 ⋅ ⋅ ⋅
0.0268569 0.0546274 ⋅ ⋅
-0.253088 0.267988 0.0229877 ⋅
0.0200493 0.00126987 0.0600951 0.00138194
To regenerate the covariance matrix we need to know that the covariance is not the square of l1, it is l1 * l1' (so that the result is symmetric) and multiplied by σ̂²
Σ₁ = varest (m1) .* (l1 * l1' )
4×4 Matrix{Float64}:
91070.9 10138.3 -11101.2 7093.26
10138.3 1842.44 66.7243 2321.27
-11101.2 66.7243 3843.58 2144.17
7093.26 2321.27 2144.17 4241.93
diag (Σ₁) # compare to the variance column in the VarCorr output
4-element Vector{Float64}:
91070.87115964481
1842.441054170415
3843.5765510736305
4241.930807293007
4-element Vector{Float64}:
301.7795075210456
42.92366543260739
61.99658499525301
65.13010676555818
Shrinkage plots
The upper left panel shows the perfect negative correlation for those two components of the random effects.
8×1789 Matrix{Int64}:
1 1 1 1 1 1 1 1 1 1 … 1 1 1 1 1 1 1 1 1
-1 1 1 -1 -1 1 1 -1 -1 1 1 -1 -1 1 1 -1 -1 1 1
-1 1 -1 1 -1 1 -1 1 -1 1 -1 1 -1 1 -1 1 -1 1 -1
1 -1 -1 -1 -1 1 1 1 1 -1 1 1 1 -1 -1 -1 -1 1 1
1 1 -1 -1 1 1 -1 -1 1 1 -1 -1 1 1 -1 -1 1 1 -1
-1 -1 -1 1 1 1 1 -1 -1 -1 … 1 -1 -1 -1 -1 1 1 1 1
-1 -1 1 -1 1 1 -1 1 -1 -1 -1 1 -1 -1 1 -1 1 1 -1
1 -1 1 1 -1 1 -1 -1 1 -1 -1 -1 1 -1 1 1 -1 1 -1
8×8 Matrix{Int64}:
1789 -1 -1 3 -3 1 1 3
-1 1789 -3 1 -1 3 3 1
-1 -3 1789 1 -1 3 3 1
3 1 1 1789 3 -1 -1 -3
-3 -1 -1 3 1789 1 1 3
1 3 3 -1 1 1789 -3 -1
1 3 3 -1 1 -3 1789 -1
3 1 1 -3 3 -1 -1 1789
How to interpret a shrinkage plot
Extreme shrinkage (shrunk to a line or to a point) is easy to interpret – the term is not providing benefit and can be removed.
When the range of the blue dots (shrunk values) is comparable to those of the red dots (unshrunk) it indicates that the term after shrinkage is about as strong as without shrinkage.
By itself, this doesn’t mean that the term is important. In some ways you need to get a feeling for the absolute magnitude of the random effects in addition to the relative magnitude.
Small magnitude and small relative magnitude indicate you can drop that term
Conclusions from these plots
Only the intercept for the subj appears to be contributing explanatory power
For the item both the intercept and the spkr appear to be contributing
m2 = let
form = @formula (
rt_trunc ~
1 + prec * spkr * load + (1 | subj) + (1 + prec | item)
)
fit (MixedModel, form, kb07; contrasts, progress)
end
Est.
SE
z
p
σ_item
σ_subj
(Intercept)
2181.7582
77.4710
28.16
<1e-99
364.7293
298.1107
prec: maintain
-333.8582
47.4631
-7.03
<1e-11
252.6694
spkr: old
67.8114
16.0526
4.22
<1e-04
load: yes
78.6849
16.0525
4.90
<1e-06
prec: maintain & spkr: old
-21.8802
16.0525
-1.36
0.1729
prec: maintain & load: yes
4.4710
16.0526
0.28
0.7806
spkr: old & load: yes
18.3214
16.0526
1.14
0.2537
prec: maintain & spkr: old & load: yes
23.5219
16.0525
1.47
0.1428
Residual
678.9319
Column
Variance
Std.Dev
Corr.
item
(Intercept)
133027.439
364.729
prec: maintain
63841.834
252.669
-0.70
subj
(Intercept)
88870.013
298.111
Residual
460948.573
678.932
m3 = let
form = @formula (
rt_trunc ~
1 + prec + spkr + load + (1 | subj) + (1 + prec | item)
)
fit (MixedModel, form, kb07; contrasts, progress)
end
Est.
SE
z
p
σ_item
σ_subj
(Intercept)
2181.8526
77.4680
28.16
<1e-99
364.7121
298.0257
prec: maintain
-333.7906
47.4476
-7.03
<1e-11
252.5236
spkr: old
67.8790
16.0785
4.22
<1e-04
load: yes
78.5904
16.0785
4.89
<1e-05
Residual
680.0318
Column
Variance
Std.Dev
Corr.
item
(Intercept)
133014.901
364.712
prec: maintain
63768.182
252.524
-0.70
subj
(Intercept)
88819.305
298.026
Residual
462443.261
680.032
rng = Random .seed! (1234321 );
m3btstrp = parametricbootstrap (rng, 2000 , m3);
DataFrame (shortestcovint (m3btstrp))
Row
type
group
names
lower
upper
String
String?
String?
Float64
Float64
1
β
missing
(Intercept)
2022.91
2334.0
2
β
missing
prec: maintain
-430.239
-239.802
3
β
missing
spkr: old
34.0592
96.72
4
β
missing
load: yes
46.5349
109.526
5
σ
item
(Intercept)
270.065
451.957
6
σ
item
prec: maintain
181.74
325.125
7
ρ
item
(Intercept), prec: maintain
-0.907255
-0.490079
8
σ
subj
(Intercept)
233.835
364.504
9
σ
residual
missing
657.341
702.655
ridgeplot (m3btstrp; show_intercept= false )
m4 = let
form = @formula (
rt_trunc ~
1 + prec + spkr + load + (1 + prec | item) + (1 | subj)
)
fit (MixedModel, form, kb07; contrasts, progress)
end
Est.
SE
z
p
σ_item
σ_subj
(Intercept)
2181.8526
77.4680
28.16
<1e-99
364.7121
298.0257
prec: maintain
-333.7906
47.4476
-7.03
<1e-11
252.5236
spkr: old
67.8790
16.0785
4.22
<1e-04
load: yes
78.5904
16.0785
4.89
<1e-05
Residual
680.0318
m4bstrp = parametricbootstrap (rng, 2000 , m4);
ridgeplot (m4bstrp; show_intercept= false )
DataFrame (shortestcovint (m4bstrp))
Row
type
group
names
lower
upper
String
String?
String?
Float64
Float64
1
β
missing
(Intercept)
2034.15
2335.45
2
β
missing
prec: maintain
-427.769
-248.337
3
β
missing
spkr: old
35.6938
97.8081
4
β
missing
load: yes
45.3095
107.368
5
σ
item
(Intercept)
260.498
451.649
6
σ
item
prec: maintain
179.477
315.415
7
ρ
item
(Intercept), prec: maintain
-0.901827
-0.471356
8
σ
subj
(Intercept)
236.684
360.363
9
σ
residual
missing
657.178
702.739
Column
Variance
Std.Dev
Corr.
item
(Intercept)
133014.901
364.712
prec: maintain
63768.182
252.524
-0.70
subj
(Intercept)
88819.305
298.026
Residual
462443.261
680.032
Code
let mods = [m1, m2, m4]
DataFrame (;
geomdof= (sum ∘ leverage).(mods),
npar= dof .(mods),
deviance= deviance .(mods),
AIC= aic .(mods),
BIC= bic .(mods),
AICc= aicc .(mods),
)
end
Row
geomdof
npar
deviance
AIC
BIC
AICc
Float64
Int64
Float64
Float64
Float64
Float64
1
131.555
29
28637.1
28695.1
28854.3
28696.1
2
107.543
13
28658.5
28684.5
28755.8
28684.7
3
103.478
9
28663.9
28681.9
28731.3
28682.0
scatter (fitted (m4), residuals (m4))
This page was rendered from git revision 60d6e27
using Quarto 1.7.33.
Back to top