More on shrinkage plots

Author

Phillip Alday, Douglas Bates, and Reinhold Kliegl

Published

2025-09-12

Code
using CairoMakie
using DataFrames
using LinearAlgebra
using MixedModels
using MixedModelsMakie
using Random
using ProgressMeter

const progress = false
false

Load the kb07 data set (don’t tell Reinhold that I used these data).

kb07 = MixedModels.dataset(:kb07)
Arrow.Table with 1789 rows, 7 columns, and schema:
 :subj      String
 :item      String
 :spkr      String
 :prec      String
 :load      String
 :rt_trunc  Int16
 :rt_raw    Int16
contrasts = Dict(
  :spkr => HelmertCoding(),
  :prec => HelmertCoding(),
  :load => HelmertCoding(),
)
m1 = let
  form = @formula(
    rt_trunc ~
      1 +
      spkr * prec * load +
      (1 + spkr + prec + load | subj) +
      (1 + spkr + prec + load | item)
  )
  fit(MixedModel, form, kb07; contrasts, progress)
end
Est. SE z p σ_subj σ_item
(Intercept) 2181.6728 77.3025 28.22 <1e-99 301.7795 362.1969
spkr: old 67.7486 18.2889 3.70 0.0002 42.9237 40.6934
prec: maintain -333.9211 47.1525 -7.08 <1e-11 61.9966 246.8936
load: yes 78.7703 19.5367 4.03 <1e-04 65.1301 42.3692
spkr: old & prec: maintain -21.9656 15.8062 -1.39 0.1646
spkr: old & load: yes 18.3842 15.8062 1.16 0.2448
prec: maintain & load: yes 4.5338 15.8062 0.29 0.7742
spkr: old & prec: maintain & load: yes 23.6074 15.8062 1.49 0.1353
Residual 668.5033
VarCorr(m1)
Column Variance Std.Dev Corr.
subj (Intercept) 91070.8712 301.7795
spkr: old 1842.4411 42.9237 +0.78
prec: maintain 3843.5766 61.9966 -0.59 +0.03
load: yes 4241.9308 65.1301 +0.36 +0.83 +0.53
item (Intercept) 131186.5676 362.1969
spkr: old 1655.9495 40.6934 +0.44
prec: maintain 60956.4434 246.8936 -0.69 +0.35
load: yes 1795.1488 42.3692 +0.32 +0.16 -0.14
Residual 446896.6953 668.5033
issingular(m1)
true
print(m1)
Linear mixed model fit by maximum likelihood
 rt_trunc ~ 1 + spkr + prec + load + spkr & prec + spkr & load + prec & load + spkr & prec & load + (1 + spkr + prec + load | subj) + (1 + spkr + prec + load | item)
    logLik   -2 logLik      AIC         AICc        BIC     
 -14318.5614  28637.1227  28695.1227  28696.1119  28854.3157

Variance components:
             Column       Variance  Std.Dev.   Corr.
subj     (Intercept)      91070.8712 301.7795
         spkr: old         1842.4411  42.9237 +0.78
         prec: maintain    3843.5766  61.9966 -0.59 +0.03
         load: yes         4241.9308  65.1301 +0.36 +0.83 +0.53
item     (Intercept)     131186.5676 362.1969
         spkr: old         1655.9495  40.6934 +0.44
         prec: maintain   60956.4434 246.8936 -0.69 +0.35
         load: yes         1795.1488  42.3692 +0.32 +0.16 -0.14
Residual                 446896.6953 668.5033
 Number of obs: 1789; levels of grouping factors: 56, 32

  Fixed-effects parameters:
──────────────────────────────────────────────────────────────────────────────
                                            Coef.  Std. Error      z  Pr(>|z|)
──────────────────────────────────────────────────────────────────────────────
(Intercept)                             2181.67       77.3025  28.22    <1e-99
spkr: old                                 67.7486     18.2889   3.70    0.0002
prec: maintain                          -333.921      47.1525  -7.08    <1e-11
load: yes                                 78.7703     19.5367   4.03    <1e-04
spkr: old & prec: maintain               -21.9656     15.8062  -1.39    0.1646
spkr: old & load: yes                     18.3842     15.8062   1.16    0.2448
prec: maintain & load: yes                 4.5338     15.8062   0.29    0.7742
spkr: old & prec: maintain & load: yes    23.6074     15.8062   1.49    0.1353
──────────────────────────────────────────────────────────────────────────────

1 Expressing the covariance of random effects

Earlier today we mentioned that the parameters being optimized are from a “matrix square root” of the covariance matrix for the random effects. There is one such lower triangular matrix for each grouping factor.

l1 = first(m1.λ)   # Cholesky factor of relative covariance for subj
4×4 LowerTriangular{Float64, Matrix{Float64}}:
  0.451426    ⋅          ⋅          ⋅ 
  0.0502539  0.0399661   ⋅          ⋅ 
 -0.0550269  0.0729273  0.0159447   ⋅ 
  0.0351603  0.0857539  0.0300331  0.0

Notice the zero on the diagonal. A triangular matrix with zeros on the diagonal is singular.

l2 = last(m1.λ)    # this one is also singular
4×4 LowerTriangular{Float64, Matrix{Float64}}:
  0.541803    ⋅           ⋅          ⋅ 
  0.0268569  0.0546274    ⋅          ⋅ 
 -0.253088   0.267988    0.0229877   ⋅ 
  0.0200493  0.00126987  0.0600951  0.00138194

To regenerate the covariance matrix we need to know that the covariance is not the square of l1, it is l1 * l1' (so that the result is symmetric) and multiplied by σ̂²

Σ₁ = varest(m1) .* (l1 * l1')
4×4 Matrix{Float64}:
  91070.9   10138.3     -11101.2     7093.26
  10138.3    1842.44        66.7243  2321.27
 -11101.2      66.7243    3843.58    2144.17
   7093.26   2321.27      2144.17    4241.93
diag(Σ₁)  # compare to the variance column in the VarCorr output
4-element Vector{Float64}:
 91070.87115964481
  1842.441054170415
  3843.5765510736305
  4241.930807293007
sqrt.(diag(Σ₁))
4-element Vector{Float64}:
 301.7795075210456
  42.92366543260739
  61.99658499525301
  65.13010676555818

2 Shrinkage plots

Code
shrinkageplot(m1)
Figure 1: Shrinkage plot of model m1

The upper left panel shows the perfect negative correlation for those two components of the random effects.

shrinkageplot(m1, :item)
X1 = Int.(m1.X')
8×1789 Matrix{Int64}:
  1   1   1   1   1  1   1   1   1   1  …   1   1   1   1   1   1   1  1   1
 -1   1   1  -1  -1  1   1  -1  -1   1      1  -1  -1   1   1  -1  -1  1   1
 -1   1  -1   1  -1  1  -1   1  -1   1     -1   1  -1   1  -1   1  -1  1  -1
  1  -1  -1  -1  -1  1   1   1   1  -1      1   1   1  -1  -1  -1  -1  1   1
  1   1  -1  -1   1  1  -1  -1   1   1     -1  -1   1   1  -1  -1   1  1  -1
 -1  -1  -1   1   1  1   1  -1  -1  -1  …   1  -1  -1  -1  -1   1   1  1   1
 -1  -1   1  -1   1  1  -1   1  -1  -1     -1   1  -1  -1   1  -1   1  1  -1
  1  -1   1   1  -1  1  -1  -1   1  -1     -1  -1   1  -1   1   1  -1  1  -1
X1 * X1'
8×8 Matrix{Int64}:
 1789    -1    -1     3    -3     1     1     3
   -1  1789    -3     1    -1     3     3     1
   -1    -3  1789     1    -1     3     3     1
    3     1     1  1789     3    -1    -1    -3
   -3    -1    -1     3  1789     1     1     3
    1     3     3    -1     1  1789    -3    -1
    1     3     3    -1     1    -3  1789    -1
    3     1     1    -3     3    -1    -1  1789

3 How to interpret a shrinkage plot

  • Extreme shrinkage (shrunk to a line or to a point) is easy to interpret – the term is not providing benefit and can be removed.
  • When the range of the blue dots (shrunk values) is comparable to those of the red dots (unshrunk) it indicates that the term after shrinkage is about as strong as without shrinkage.
  • By itself, this doesn’t mean that the term is important. In some ways you need to get a feeling for the absolute magnitude of the random effects in addition to the relative magnitude.
  • Small magnitude and small relative magnitude indicate you can drop that term

4 Conclusions from these plots

  • Only the intercept for the subj appears to be contributing explanatory power
  • For the item both the intercept and the spkr appear to be contributing
m2 = let
  form = @formula(
    rt_trunc ~
      1 + prec * spkr * load + (1 | subj) + (1 + prec | item)
  )
  fit(MixedModel, form, kb07; contrasts, progress)
end
Est. SE z p σ_item σ_subj
(Intercept) 2181.7582 77.4710 28.16 <1e-99 364.7293 298.1107
prec: maintain -333.8582 47.4631 -7.03 <1e-11 252.6694
spkr: old 67.8114 16.0526 4.22 <1e-04
load: yes 78.6849 16.0525 4.90 <1e-06
prec: maintain & spkr: old -21.8802 16.0525 -1.36 0.1729
prec: maintain & load: yes 4.4710 16.0526 0.28 0.7806
spkr: old & load: yes 18.3214 16.0526 1.14 0.2537
prec: maintain & spkr: old & load: yes 23.5219 16.0525 1.47 0.1428
Residual 678.9319
VarCorr(m2)
Column Variance Std.Dev Corr.
item (Intercept) 133027.439 364.729
prec: maintain 63841.834 252.669 -0.70
subj (Intercept) 88870.013 298.111
Residual 460948.573 678.932
Code
shrinkageplot(m2)
Figure 2: Shrinkage plot of model m2
m3 = let
  form = @formula(
    rt_trunc ~
      1 + prec + spkr + load + (1 | subj) + (1 + prec | item)
  )
  fit(MixedModel, form, kb07; contrasts, progress)
end
Est. SE z p σ_item σ_subj
(Intercept) 2181.8526 77.4680 28.16 <1e-99 364.7121 298.0257
prec: maintain -333.7906 47.4476 -7.03 <1e-11 252.5236
spkr: old 67.8790 16.0785 4.22 <1e-04
load: yes 78.5904 16.0785 4.89 <1e-05
Residual 680.0318
VarCorr(m3)
Column Variance Std.Dev Corr.
item (Intercept) 133014.901 364.712
prec: maintain 63768.182 252.524 -0.70
subj (Intercept) 88819.305 298.026
Residual 462443.261 680.032
rng = Random.seed!(1234321);
m3btstrp = parametricbootstrap(rng, 2000, m3);
DataFrame(shortestcovint(m3btstrp))
9×5 DataFrame
Row type group names lower upper
String String? String? Float64 Float64
1 β missing (Intercept) 2022.91 2334.0
2 β missing prec: maintain -430.239 -239.802
3 β missing spkr: old 34.0592 96.72
4 β missing load: yes 46.5349 109.526
5 σ item (Intercept) 270.065 451.957
6 σ item prec: maintain 181.74 325.125
7 ρ item (Intercept), prec: maintain -0.907255 -0.490079
8 σ subj (Intercept) 233.835 364.504
9 σ residual missing 657.341 702.655
ridgeplot(m3btstrp)
Figure 3: Ridge plot of the fixed-effects coefficients from the bootstrap sample
ridgeplot(m3btstrp; show_intercept=false)
Figure 4: Ridge plot of the fixed-effects coefficients from the bootstrap sample (with the intercept)
m4 = let
  form = @formula(
    rt_trunc ~
      1 + prec + spkr + load + (1 + prec | item) + (1 | subj)
  )
  fit(MixedModel, form, kb07; contrasts, progress)
end
Est. SE z p σ_item σ_subj
(Intercept) 2181.8526 77.4680 28.16 <1e-99 364.7121 298.0257
prec: maintain -333.7906 47.4476 -7.03 <1e-11 252.5236
spkr: old 67.8790 16.0785 4.22 <1e-04
load: yes 78.5904 16.0785 4.89 <1e-05
Residual 680.0318
m4bstrp = parametricbootstrap(rng, 2000, m4);
ridgeplot(m4bstrp; show_intercept=false)
DataFrame(shortestcovint(m4bstrp))
9×5 DataFrame
Row type group names lower upper
String String? String? Float64 Float64
1 β missing (Intercept) 2034.15 2335.45
2 β missing prec: maintain -427.769 -248.337
3 β missing spkr: old 35.6938 97.8081
4 β missing load: yes 45.3095 107.368
5 σ item (Intercept) 260.498 451.649
6 σ item prec: maintain 179.477 315.415
7 ρ item (Intercept), prec: maintain -0.901827 -0.471356
8 σ subj (Intercept) 236.684 360.363
9 σ residual missing 657.178 702.739
VarCorr(m4)
Column Variance Std.Dev Corr.
item (Intercept) 133014.901 364.712
prec: maintain 63768.182 252.524 -0.70
subj (Intercept) 88819.305 298.026
Residual 462443.261 680.032
Code
let mods = [m1, m2, m4]
  DataFrame(;
    geomdof=(sum  leverage).(mods),
    npar=dof.(mods),
    deviance=deviance.(mods),
    AIC=aic.(mods),
    BIC=bic.(mods),
    AICc=aicc.(mods),
  )
end
3×6 DataFrame
Row geomdof npar deviance AIC BIC AICc
Float64 Int64 Float64 Float64 Float64 Float64
1 131.555 29 28637.1 28695.1 28854.3 28696.1
2 107.543 13 28658.5 28684.5 28755.8 28684.7
3 103.478 9 28663.9 28681.9 28731.3 28682.0
scatter(fitted(m4), residuals(m4))
Figure 5: Residuals versus fitted values for model m4

This page was rendered from git revision 60d6e27 using Quarto 1.7.33.

Back to top