More on shrinkage plots

Author

Phillip Alday, Douglas Bates, and Reinhold Kliegl

Published

2024-09-11

Code
using CairoMakie
using DataFrames
using LinearAlgebra
using MixedModels
using MixedModelsMakie
using Random
using ProgressMeter

const progress = false
false

Load the kb07 data set (don’t tell Reinhold that I used these data).

kb07 = MixedModels.dataset(:kb07)
Arrow.Table with 1789 rows, 7 columns, and schema:
 :subj      String
 :item      String
 :spkr      String
 :prec      String
 :load      String
 :rt_trunc  Int16
 :rt_raw    Int16
contrasts = Dict(
  :spkr => HelmertCoding(),
  :prec => HelmertCoding(),
  :load => HelmertCoding(),
)
m1 = let
  form = @formula(
    rt_trunc ~
      1 +
      spkr * prec * load +
      (1 + spkr + prec + load | subj) +
      (1 + spkr + prec + load | item)
  )
  fit(MixedModel, form, kb07; contrasts, progress)
end
Est. SE z p σ_subj σ_item
(Intercept) 2181.6729 77.3131 28.22 <1e-99 301.6736 362.3182
spkr: old 67.7490 18.2747 3.71 0.0002 42.5441 40.7073
prec: maintain -333.9206 47.1388 -7.08 <1e-11 61.9978 246.8078
load: yes 78.7702 19.5387 4.03 <1e-04 64.9874 42.5139
spkr: old & prec: maintain -21.9655 15.8070 -1.39 0.1646
spkr: old & load: yes 18.3838 15.8070 1.16 0.2448
prec: maintain & load: yes 4.5333 15.8070 0.29 0.7743
spkr: old & prec: maintain & load: yes 23.6072 15.8070 1.49 0.1353
Residual 668.5371
VarCorr(m1)
Column Variance Std.Dev Corr.
subj (Intercept) 91006.9887 301.6736
spkr: old 1810.0012 42.5441 +0.79
prec: maintain 3843.7268 61.9978 -0.59 +0.02
load: yes 4223.3647 64.9874 +0.36 +0.85 +0.54
item (Intercept) 131274.4558 362.3182
spkr: old 1657.0830 40.7073 +0.44
prec: maintain 60914.1141 246.8078 -0.69 +0.35
load: yes 1807.4303 42.5139 +0.32 +0.15 -0.14
Residual 446941.7901 668.5371
issingular(m1)
true
print(m1)
Linear mixed model fit by maximum likelihood
 rt_trunc ~ 1 + spkr + prec + load + spkr & prec + spkr & load + prec & load + spkr & prec & load + (1 + spkr + prec + load | subj) + (1 + spkr + prec + load | item)
    logLik   -2 logLik      AIC         AICc        BIC     
 -14318.5619  28637.1239  28695.1239  28696.1131  28854.3168

Variance components:
             Column       Variance  Std.Dev.   Corr.
subj     (Intercept)      91006.9887 301.6736
         spkr: old         1810.0012  42.5441 +0.79
         prec: maintain    3843.7268  61.9978 -0.59 +0.02
         load: yes         4223.3647  64.9874 +0.36 +0.85 +0.54
item     (Intercept)     131274.4558 362.3182
         spkr: old         1657.0830  40.7073 +0.44
         prec: maintain   60914.1141 246.8078 -0.69 +0.35
         load: yes         1807.4303  42.5139 +0.32 +0.15 -0.14
Residual                 446941.7901 668.5371
 Number of obs: 1789; levels of grouping factors: 56, 32

  Fixed-effects parameters:
───────────────────────────────────────────────────────────────────────────────
                                             Coef.  Std. Error      z  Pr(>|z|)
───────────────────────────────────────────────────────────────────────────────
(Intercept)                             2181.67        77.3131  28.22    <1e-99
spkr: old                                 67.749       18.2747   3.71    0.0002
prec: maintain                          -333.921       47.1388  -7.08    <1e-11
load: yes                                 78.7702      19.5387   4.03    <1e-04
spkr: old & prec: maintain               -21.9655      15.807   -1.39    0.1646
spkr: old & load: yes                     18.3838      15.807    1.16    0.2448
prec: maintain & load: yes                 4.53334     15.807    0.29    0.7743
spkr: old & prec: maintain & load: yes    23.6072      15.807    1.49    0.1353
───────────────────────────────────────────────────────────────────────────────

1 Expressing the covariance of random effects

Earlier today we mentioned that the parameters being optimized are from a “matrix square root” of the covariance matrix for the random effects. There is one such lower triangular matrix for each grouping factor.

l1 = first(m1.λ)   # Cholesky factor of relative covariance for subj
4×4 LowerTriangular{Float64, Matrix{Float64}}:
  0.451244    ⋅          ⋅           ⋅ 
  0.0502516  0.0390451   ⋅           ⋅ 
 -0.0549019  0.074379   0.00732171   ⋅ 
  0.0352204  0.0894228  0.0145792   0.0

Notice the zero on the diagonal. A triangular matrix with zeros on the diagonal is singular.

l2 = last(m1.λ)    # this one is also singular
4×4 LowerTriangular{Float64, Matrix{Float64}}:
  0.541957    ⋅           ⋅          ⋅ 
  0.0268894  0.0546312    ⋅          ⋅ 
 -0.252892   0.267823    0.024643    ⋅ 
  0.0200364  0.00105896  0.0595843  0.0095462

To regenerate the covariance matrix we need to know that the covariance is not the square of l1, it is l1 * l1' (so that the result is symmetric) and multiplied by σ̂²

Σ₁ = varest(m1) .* (l1 * l1')
4×4 Matrix{Float64}:
  91007.0   10134.8     -11072.6     7103.25
  10134.8    1810.0         64.9058  2351.54
 -11072.6      64.9058    3843.73    2156.16
   7103.25   2351.54      2156.16    4223.36
diag(Σ₁)  # compare to the variance column in the VarCorr output
4-element Vector{Float64}:
 91006.98870968402
  1810.0012414879016
  3843.7267826016114
  4223.364731548873
sqrt.(diag(Σ₁))
4-element Vector{Float64}:
 301.6736460310778
  42.54410936296471
  61.99779659473078
  64.98741979451772

2 Shrinkage plots

Code
shrinkageplot(m1)
Figure 1: Shrinkage plot of model m1

The upper left panel shows the perfect negative correlation for those two components of the random effects.

shrinkageplot(m1, :item)

X1 = Int.(m1.X')
8×1789 Matrix{Int64}:
  1   1   1   1   1  1   1   1   1   1  …   1   1   1   1   1   1   1  1   1
 -1   1   1  -1  -1  1   1  -1  -1   1      1  -1  -1   1   1  -1  -1  1   1
 -1   1  -1   1  -1  1  -1   1  -1   1     -1   1  -1   1  -1   1  -1  1  -1
  1  -1  -1  -1  -1  1   1   1   1  -1      1   1   1  -1  -1  -1  -1  1   1
  1   1  -1  -1   1  1  -1  -1   1   1     -1  -1   1   1  -1  -1   1  1  -1
 -1  -1  -1   1   1  1   1  -1  -1  -1  …   1  -1  -1  -1  -1   1   1  1   1
 -1  -1   1  -1   1  1  -1   1  -1  -1     -1   1  -1  -1   1  -1   1  1  -1
  1  -1   1   1  -1  1  -1  -1   1  -1     -1  -1   1  -1   1   1  -1  1  -1
X1 * X1'
8×8 Matrix{Int64}:
 1789    -1    -1     3    -3     1     1     3
   -1  1789    -3     1    -1     3     3     1
   -1    -3  1789     1    -1     3     3     1
    3     1     1  1789     3    -1    -1    -3
   -3    -1    -1     3  1789     1     1     3
    1     3     3    -1     1  1789    -3    -1
    1     3     3    -1     1    -3  1789    -1
    3     1     1    -3     3    -1    -1  1789

3 How to interpret a shrinkage plot

  • Extreme shrinkage (shrunk to a line or to a point) is easy to interpret – the term is not providing benefit and can be removed.
  • When the range of the blue dots (shrunk values) is comparable to those of the red dots (unshrunk) it indicates that the term after shrinkage is about as strong as without shrinkage.
  • By itself, this doesn’t mean that the term is important. In some ways you need to get a feeling for the absolute magnitude of the random effects in addition to the relative magnitude.
  • Small magnitude and small relative magnitude indicate you can drop that term

4 Conclusions from these plots

  • Only the intercept for the subj appears to be contributing explanatory power
  • For the item both the intercept and the spkr appear to be contributing
m2 = let
  form = @formula(
    rt_trunc ~
      1 + prec * spkr * load + (1 | subj) + (1 + prec | item)
  )
  fit(MixedModel, form, kb07; contrasts, progress)
end
Est. SE z p σ_item σ_subj
(Intercept) 2181.7582 77.4709 28.16 <1e-99 364.7286 298.1109
prec: maintain -333.8582 47.4629 -7.03 <1e-11 252.6687
spkr: old 67.8114 16.0526 4.22 <1e-04
load: yes 78.6849 16.0525 4.90 <1e-06
prec: maintain & spkr: old -21.8802 16.0525 -1.36 0.1729
prec: maintain & load: yes 4.4710 16.0526 0.28 0.7806
spkr: old & load: yes 18.3214 16.0526 1.14 0.2537
prec: maintain & spkr: old & load: yes 23.5219 16.0525 1.47 0.1428
Residual 678.9318
VarCorr(m2)
Column Variance Std.Dev Corr.
item (Intercept) 133026.918 364.729
prec: maintain 63841.496 252.669 -0.70
subj (Intercept) 88870.080 298.111
Residual 460948.432 678.932
Code
shrinkageplot(m2)
Figure 2: Shrinkage plot of model m2
m3 = let
  form = @formula(
    rt_trunc ~
      1 + prec + spkr + load + (1 | subj) + (1 + prec | item)
  )
  fit(MixedModel, form, kb07; contrasts, progress)
end
Est. SE z p σ_item σ_subj
(Intercept) 2181.8526 77.4681 28.16 <1e-99 364.7125 298.0259
prec: maintain -333.7906 47.4472 -7.03 <1e-11 252.5212
spkr: old 67.8790 16.0785 4.22 <1e-04
load: yes 78.5904 16.0785 4.89 <1e-05
Residual 680.0319
VarCorr(m3)
Column Variance Std.Dev Corr.
item (Intercept) 133015.242 364.713
prec: maintain 63766.935 252.521 -0.70
subj (Intercept) 88819.438 298.026
Residual 462443.388 680.032
rng = Random.seed!(1234321);
m3btstrp = parametricbootstrap(rng, 2000, m3);
DataFrame(shortestcovint(m3btstrp))
9×5 DataFrame
Row type group names lower upper
String String? String? Float64 Float64
1 β missing (Intercept) 2035.14 2326.99
2 β missing prec: maintain -426.345 -243.517
3 β missing spkr: old 36.6549 98.5666
4 β missing load: yes 44.5338 108.327
5 σ item (Intercept) 260.63 453.848
6 σ item prec: maintain 181.992 318.503
7 ρ item (Intercept), prec: maintain -0.888974 -0.439872
8 σ subj (Intercept) 232.996 359.237
9 σ residual missing 656.681 702.679
ridgeplot(m3btstrp)
Figure 3: Ridge plot of the fixed-effects coefficients from the bootstrap sample
ridgeplot(m3btstrp; show_intercept=false)
Figure 4: Ridge plot of the fixed-effects coefficients from the bootstrap sample (with the intercept)
m4 = let
  form = @formula(
    rt_trunc ~
      1 + prec + spkr + load + (1 + prec | item) + (1 | subj)
  )
  fit(MixedModel, form, kb07; contrasts, progress)
end
Est. SE z p σ_item σ_subj
(Intercept) 2181.8526 77.4681 28.16 <1e-99 364.7125 298.0259
prec: maintain -333.7906 47.4472 -7.03 <1e-11 252.5212
spkr: old 67.8790 16.0785 4.22 <1e-04
load: yes 78.5904 16.0785 4.89 <1e-05
Residual 680.0319
m4bstrp = parametricbootstrap(rng, 2000, m4);
ridgeplot(m4bstrp; show_intercept=false)

DataFrame(shortestcovint(m4bstrp))
9×5 DataFrame
Row type group names lower upper
String String? String? Float64 Float64
1 β missing (Intercept) 2036.61 2326.9
2 β missing prec: maintain -428.062 -248.716
3 β missing spkr: old 33.4999 95.5996
4 β missing load: yes 47.776 109.14
5 σ item (Intercept) 257.632 446.425
6 σ item prec: maintain 177.382 318.478
7 ρ item (Intercept), prec: maintain -0.903413 -0.46731
8 σ subj (Intercept) 226.312 359.338
9 σ residual missing 655.65 700.819
VarCorr(m4)
Column Variance Std.Dev Corr.
item (Intercept) 133015.242 364.713
prec: maintain 63766.935 252.521 -0.70
subj (Intercept) 88819.438 298.026
Residual 462443.388 680.032
Code
let mods = [m1, m2, m4]
  DataFrame(;
    geomdof=(sum  leverage).(mods),
    npar=dof.(mods),
    deviance=deviance.(mods),
    AIC=aic.(mods),
    BIC=bic.(mods),
    AICc=aicc.(mods),
  )
end
3×6 DataFrame
Row geomdof npar deviance AIC BIC AICc
Float64 Int64 Float64 Float64 Float64 Float64
1 131.391 29 28637.1 28695.1 28854.3 28696.1
2 107.543 13 28658.5 28684.5 28755.8 28684.7
3 103.478 9 28663.9 28681.9 28731.3 28682.0
scatter(fitted(m4), residuals(m4))
Figure 5: Residuals versus fitted values for model m4
Back to top