I have stated that the likelihood criterion used to fit linear mixed-effects can be considered as balancing fidelity to the data (i.e. fits the observed data well) versus model complexity.
This is similar to some of the criterion used in Machine Learning (ML), except that the criterion for LMMs has a rigorous mathematical basis.
In the shrinkage plot we consider the values of the random-effects coefficients for the fitted values of the model versus those from a model in which there is no penalty for model complexity.
If there is strong subject-to-subject variation then the model fit will tend to values of the random effects similar to those without a penalty on complexity.
If the random effects term is not contributing much (i.e. it is “inert”) then the random effects will be shrunk considerably towards zero in some directions.
Code
using CairoMakie
using DataFrames
using LinearAlgebra
using MixedModels
using MixedModelsMakie
using Random
using ProgressMeter
const progress = false
Load the kb07 data set (don’t tell Reinhold that I used these data).
kb07 = MixedModels.dataset (: kb07)
Arrow.Table with 1789 rows, 7 columns, and schema:
:subj String
:item String
:spkr String
:prec String
:load String
:rt_trunc Int16
:rt_raw Int16
contrasts = Dict (
: spkr => HelmertCoding (),
: prec => HelmertCoding (),
: load => HelmertCoding (),
)
m1 = let
form = @formula (
rt_trunc ~
1 +
spkr * prec * load +
(1 + spkr + prec + load | subj) +
(1 + spkr + prec + load | item)
)
fit (MixedModel, form, kb07; contrasts, progress)
end
Est.
SE
z
p
σ_subj
σ_item
(Intercept)
2181.6729
77.3131
28.22
<1e-99
301.6736
362.3182
spkr: old
67.7490
18.2747
3.71
0.0002
42.5441
40.7073
prec: maintain
-333.9206
47.1388
-7.08
<1e-11
61.9978
246.8078
load: yes
78.7702
19.5387
4.03
<1e-04
64.9874
42.5139
spkr: old & prec: maintain
-21.9655
15.8070
-1.39
0.1646
spkr: old & load: yes
18.3838
15.8070
1.16
0.2448
prec: maintain & load: yes
4.5333
15.8070
0.29
0.7743
spkr: old & prec: maintain & load: yes
23.6072
15.8070
1.49
0.1353
Residual
668.5371
Column
Variance
Std.Dev
Corr.
subj
(Intercept)
91006.9887
301.6736
spkr: old
1810.0012
42.5441
+0.79
prec: maintain
3843.7268
61.9978
-0.59
+0.02
load: yes
4223.3647
64.9874
+0.36
+0.85
+0.54
item
(Intercept)
131274.4558
362.3182
spkr: old
1657.0830
40.7073
+0.44
prec: maintain
60914.1141
246.8078
-0.69
+0.35
load: yes
1807.4303
42.5139
+0.32
+0.15
-0.14
Residual
446941.7901
668.5371
Linear mixed model fit by maximum likelihood
rt_trunc ~ 1 + spkr + prec + load + spkr & prec + spkr & load + prec & load + spkr & prec & load + (1 + spkr + prec + load | subj) + (1 + spkr + prec + load | item)
logLik -2 logLik AIC AICc BIC
-14318.5619 28637.1239 28695.1239 28696.1131 28854.3168
Variance components:
Column Variance Std.Dev. Corr.
subj (Intercept) 91006.9887 301.6736
spkr: old 1810.0012 42.5441 +0.79
prec: maintain 3843.7268 61.9978 -0.59 +0.02
load: yes 4223.3647 64.9874 +0.36 +0.85 +0.54
item (Intercept) 131274.4558 362.3182
spkr: old 1657.0830 40.7073 +0.44
prec: maintain 60914.1141 246.8078 -0.69 +0.35
load: yes 1807.4303 42.5139 +0.32 +0.15 -0.14
Residual 446941.7901 668.5371
Number of obs: 1789; levels of grouping factors: 56, 32
Fixed-effects parameters:
───────────────────────────────────────────────────────────────────────────────
Coef. Std. Error z Pr(>|z|)
───────────────────────────────────────────────────────────────────────────────
(Intercept) 2181.67 77.3131 28.22 <1e-99
spkr: old 67.749 18.2747 3.71 0.0002
prec: maintain -333.921 47.1388 -7.08 <1e-11
load: yes 78.7702 19.5387 4.03 <1e-04
spkr: old & prec: maintain -21.9655 15.807 -1.39 0.1646
spkr: old & load: yes 18.3838 15.807 1.16 0.2448
prec: maintain & load: yes 4.53334 15.807 0.29 0.7743
spkr: old & prec: maintain & load: yes 23.6072 15.807 1.49 0.1353
───────────────────────────────────────────────────────────────────────────────
Expressing the covariance of random effects
Earlier today we mentioned that the parameters being optimized are from a “matrix square root” of the covariance matrix for the random effects. There is one such lower triangular matrix for each grouping factor.
l1 = first (m1.λ) # Cholesky factor of relative covariance for subj
4×4 LowerTriangular{Float64, Matrix{Float64}}:
0.451244 ⋅ ⋅ ⋅
0.0502516 0.0390451 ⋅ ⋅
-0.0549019 0.074379 0.00732171 ⋅
0.0352204 0.0894228 0.0145792 0.0
Notice the zero on the diagonal. A triangular matrix with zeros on the diagonal is singular.
l2 = last (m1.λ) # this one is also singular
4×4 LowerTriangular{Float64, Matrix{Float64}}:
0.541957 ⋅ ⋅ ⋅
0.0268894 0.0546312 ⋅ ⋅
-0.252892 0.267823 0.024643 ⋅
0.0200364 0.00105896 0.0595843 0.0095462
To regenerate the covariance matrix we need to know that the covariance is not the square of l1
, it is l1 * l1'
(so that the result is symmetric) and multiplied by σ̂²
Σ₁ = varest (m1) .* (l1 * l1' )
4×4 Matrix{Float64}:
91007.0 10134.8 -11072.6 7103.25
10134.8 1810.0 64.9058 2351.54
-11072.6 64.9058 3843.73 2156.16
7103.25 2351.54 2156.16 4223.36
diag (Σ₁) # compare to the variance column in the VarCorr output
4-element Vector{Float64}:
91006.98870968402
1810.0012414879016
3843.7267826016114
4223.364731548873
4-element Vector{Float64}:
301.6736460310778
42.54410936296471
61.99779659473078
64.98741979451772
Shrinkage plots
The upper left panel shows the perfect negative correlation for those two components of the random effects.
8×1789 Matrix{Int64}:
1 1 1 1 1 1 1 1 1 1 … 1 1 1 1 1 1 1 1 1
-1 1 1 -1 -1 1 1 -1 -1 1 1 -1 -1 1 1 -1 -1 1 1
-1 1 -1 1 -1 1 -1 1 -1 1 -1 1 -1 1 -1 1 -1 1 -1
1 -1 -1 -1 -1 1 1 1 1 -1 1 1 1 -1 -1 -1 -1 1 1
1 1 -1 -1 1 1 -1 -1 1 1 -1 -1 1 1 -1 -1 1 1 -1
-1 -1 -1 1 1 1 1 -1 -1 -1 … 1 -1 -1 -1 -1 1 1 1 1
-1 -1 1 -1 1 1 -1 1 -1 -1 -1 1 -1 -1 1 -1 1 1 -1
1 -1 1 1 -1 1 -1 -1 1 -1 -1 -1 1 -1 1 1 -1 1 -1
8×8 Matrix{Int64}:
1789 -1 -1 3 -3 1 1 3
-1 1789 -3 1 -1 3 3 1
-1 -3 1789 1 -1 3 3 1
3 1 1 1789 3 -1 -1 -3
-3 -1 -1 3 1789 1 1 3
1 3 3 -1 1 1789 -3 -1
1 3 3 -1 1 -3 1789 -1
3 1 1 -3 3 -1 -1 1789
How to interpret a shrinkage plot
Extreme shrinkage (shrunk to a line or to a point) is easy to interpret – the term is not providing benefit and can be removed.
When the range of the blue dots (shrunk values) is comparable to those of the red dots (unshrunk) it indicates that the term after shrinkage is about as strong as without shrinkage.
By itself, this doesn’t mean that the term is important. In some ways you need to get a feeling for the absolute magnitude of the random effects in addition to the relative magnitude.
Small magnitude and small relative magnitude indicate you can drop that term
Conclusions from these plots
Only the intercept for the subj
appears to be contributing explanatory power
For the item
both the intercept and the spkr
appear to be contributing
m2 = let
form = @formula (
rt_trunc ~
1 + prec * spkr * load + (1 | subj) + (1 + prec | item)
)
fit (MixedModel, form, kb07; contrasts, progress)
end
Est.
SE
z
p
σ_item
σ_subj
(Intercept)
2181.7582
77.4709
28.16
<1e-99
364.7286
298.1109
prec: maintain
-333.8582
47.4629
-7.03
<1e-11
252.6687
spkr: old
67.8114
16.0526
4.22
<1e-04
load: yes
78.6849
16.0525
4.90
<1e-06
prec: maintain & spkr: old
-21.8802
16.0525
-1.36
0.1729
prec: maintain & load: yes
4.4710
16.0526
0.28
0.7806
spkr: old & load: yes
18.3214
16.0526
1.14
0.2537
prec: maintain & spkr: old & load: yes
23.5219
16.0525
1.47
0.1428
Residual
678.9318
Column
Variance
Std.Dev
Corr.
item
(Intercept)
133026.918
364.729
prec: maintain
63841.496
252.669
-0.70
subj
(Intercept)
88870.080
298.111
Residual
460948.432
678.932
m3 = let
form = @formula (
rt_trunc ~
1 + prec + spkr + load + (1 | subj) + (1 + prec | item)
)
fit (MixedModel, form, kb07; contrasts, progress)
end
Est.
SE
z
p
σ_item
σ_subj
(Intercept)
2181.8526
77.4681
28.16
<1e-99
364.7125
298.0259
prec: maintain
-333.7906
47.4472
-7.03
<1e-11
252.5212
spkr: old
67.8790
16.0785
4.22
<1e-04
load: yes
78.5904
16.0785
4.89
<1e-05
Residual
680.0319
Column
Variance
Std.Dev
Corr.
item
(Intercept)
133015.242
364.713
prec: maintain
63766.935
252.521
-0.70
subj
(Intercept)
88819.438
298.026
Residual
462443.388
680.032
rng = Random .seed! (1234321 );
m3btstrp = parametricbootstrap (rng, 2000 , m3);
DataFrame (shortestcovint (m3btstrp))
1
β
missing
(Intercept)
2035.14
2326.99
2
β
missing
prec: maintain
-426.345
-243.517
3
β
missing
spkr: old
36.6549
98.5666
4
β
missing
load: yes
44.5338
108.327
5
σ
item
(Intercept)
260.63
453.848
6
σ
item
prec: maintain
181.992
318.503
7
ρ
item
(Intercept), prec: maintain
-0.888974
-0.439872
8
σ
subj
(Intercept)
232.996
359.237
9
σ
residual
missing
656.681
702.679
ridgeplot (m3btstrp; show_intercept= false )
m4 = let
form = @formula (
rt_trunc ~
1 + prec + spkr + load + (1 + prec | item) + (1 | subj)
)
fit (MixedModel, form, kb07; contrasts, progress)
end
Est.
SE
z
p
σ_item
σ_subj
(Intercept)
2181.8526
77.4681
28.16
<1e-99
364.7125
298.0259
prec: maintain
-333.7906
47.4472
-7.03
<1e-11
252.5212
spkr: old
67.8790
16.0785
4.22
<1e-04
load: yes
78.5904
16.0785
4.89
<1e-05
Residual
680.0319
m4bstrp = parametricbootstrap (rng, 2000 , m4);
ridgeplot (m4bstrp; show_intercept= false )
DataFrame (shortestcovint (m4bstrp))
1
β
missing
(Intercept)
2036.61
2326.9
2
β
missing
prec: maintain
-428.062
-248.716
3
β
missing
spkr: old
33.4999
95.5996
4
β
missing
load: yes
47.776
109.14
5
σ
item
(Intercept)
257.632
446.425
6
σ
item
prec: maintain
177.382
318.478
7
ρ
item
(Intercept), prec: maintain
-0.903413
-0.46731
8
σ
subj
(Intercept)
226.312
359.338
9
σ
residual
missing
655.65
700.819
Column
Variance
Std.Dev
Corr.
item
(Intercept)
133015.242
364.713
prec: maintain
63766.935
252.521
-0.70
subj
(Intercept)
88819.438
298.026
Residual
462443.388
680.032
Code
let mods = [m1, m2, m4]
DataFrame (;
geomdof= (sum ∘ leverage).(mods),
npar= dof .(mods),
deviance= deviance .(mods),
AIC= aic .(mods),
BIC= bic .(mods),
AICc= aicc .(mods),
)
end
1
131.391
29
28637.1
28695.1
28854.3
28696.1
2
107.543
13
28658.5
28684.5
28755.8
28684.7
3
103.478
9
28663.9
28681.9
28731.3
28682.0
scatter (fitted (m4), residuals (m4))
Back to top