More on shrinkage plots

Author

Phillip Alday, Douglas Bates, and Reinhold Kliegl

Published

2022-09-27

Code
using CairoMakie
using DataFrames
using LinearAlgebra
using MixedModels
using MixedModelsMakie
using Random
using ProgressMeter

ProgressMeter.ijulia_behavior(:clear);

Load the kb07 data set (don’t tell Reinhold that I used these data).

kb07 = MixedModels.dataset(:kb07)
Arrow.Table with 1789 rows, 7 columns, and schema:
 :subj      String
 :item      String
 :spkr      String
 :prec      String
 :load      String
 :rt_trunc  Int16
 :rt_raw    Int16
contrasts = Dict(
  :subj => Grouping(),
  :item => Grouping(),
  :spkr => HelmertCoding(),
  :prec => HelmertCoding(),
  :load => HelmertCoding(),
)
m1 = let
  form = @formula(
    rt_trunc ~
      1 +
      spkr * prec * load +
      (1 + spkr + prec + load | subj) +
      (1 + spkr + prec + load | item)
  )
  fit(MixedModel, form, kb07; contrasts)
end
Minimizing 934   Time: 0:00:01 ( 1.52 ms/it)
  objective:  28637.97101024215
Est. SE z p σ_subj σ_item
(Intercept) 2181.6424 77.3512 28.20 <1e-99 301.8676 362.4694
spkr: old 67.7496 17.9607 3.77 0.0002 33.0524 41.1185
prec: maintain -333.9200 47.1564 -7.08 <1e-11 58.8607 247.3292
load: yes 78.8007 19.7260 3.99 <1e-04 66.9505 43.3885
spkr: old & prec: maintain -21.9960 15.8191 -1.39 0.1644
spkr: old & load: yes 18.3832 15.8191 1.16 0.2452
prec: maintain & load: yes 4.5327 15.8191 0.29 0.7745
spkr: old & prec: maintain & load: yes 23.6377 15.8191 1.49 0.1351
Residual 669.0516
VarCorr(m1)
Column Variance Std.Dev Corr.
subj (Intercept) 91124.0779 301.8676
spkr: old 1092.4628 33.0524 +1.00
prec: maintain 3464.5848 58.8607 -0.62 -0.62
load: yes 4482.3662 66.9505 +0.36 +0.36 +0.51
item (Intercept) 131384.0733 362.4694
spkr: old 1690.7347 41.1185 +0.42
prec: maintain 61171.7458 247.3292 -0.69 +0.37
load: yes 1882.5594 43.3885 +0.29 +0.14 -0.13
Residual 447630.0111 669.0516
issingular(m1)
true
print(m1)
Linear mixed model fit by maximum likelihood
 rt_trunc ~ 1 + spkr + prec + load + spkr & prec + spkr & load + prec & load + spkr & prec & load + (1 + spkr + prec + load | subj) + (1 + spkr + prec + load | item)
    logLik   -2 logLik      AIC         AICc        BIC     
 -14318.9855  28637.9710  28695.9710  28696.9602  28855.1640

Variance components:
             Column       Variance  Std.Dev.   Corr.
subj     (Intercept)      91124.0779 301.8676
         spkr: old         1092.4628  33.0524 +1.00
         prec: maintain    3464.5848  58.8607 -0.62 -0.62
         load: yes         4482.3662  66.9505 +0.36 +0.36 +0.51
item     (Intercept)     131384.0733 362.4694
         spkr: old         1690.7347  41.1185 +0.42
         prec: maintain   61171.7458 247.3292 -0.69 +0.37
         load: yes         1882.5594  43.3885 +0.29 +0.14 -0.13
Residual                 447630.0111 669.0516
 Number of obs: 1789; levels of grouping factors: 56, 32

  Fixed-effects parameters:
───────────────────────────────────────────────────────────────────────────────

                                             Coef.  Std. Error      z  Pr(>|z|)
───────────────────────────────────────────────────────────────────────────────
(Intercept)                             2181.64        77.3512  28.20    <1e-99
spkr: old                                 67.7496      17.9607   3.77    0.0002
prec: maintain                          -333.92        47.1564  -7.08    <1e-11
load: yes                                 78.8007      19.726    3.99    <1e-04
spkr: old & prec: maintain               -21.996       15.8191  -1.39    0.1644
spkr: old & load: yes                     18.3832      15.8191   1.16    0.2452
prec: maintain & load: yes                 4.53274     15.8191   0.29    0.7745
spkr: old & prec: maintain & load: yes    23.6377      15.8191   1.49    0.1351
───────────────────────────────────────────────────────────────────────────────

Expressing the covariance of random effects

Earlier today we mentioned that the parameters being optimized are from a “matrix square root” of the covariance matrix for the random effects. There is one such lower triangular matrix for each grouping factor.

l1 = first(m1.λ)   # Cholesky factor of relative covariance for subj
4×4 LowerTriangular{Float64, Matrix{Float64}}:
  0.451187     ⋅           ⋅          ⋅ 
  0.0494019   0.0          ⋅          ⋅ 
 -0.0547554  -0.00478924  0.0686931   ⋅ 
  0.0359018  -0.00650109  0.0931791  0.0

Notice the zero on the diagonal. A triangular matrix with zeros on the diagonal is singular.

l2 = last(m1.λ)    # this one is not singular
4×4 LowerTriangular{Float64, Matrix{Float64}}:
  0.541766    ⋅            ⋅          ⋅ 
  0.0259253  0.0557222     ⋅          ⋅ 
 -0.253781   0.267843     0.0226331   ⋅ 
  0.0189249  0.000871109  0.0620218  0.0

To regenerate the covariance matrix we need to know that the covariance is not the square of l1, it is l1 * l1' (so that the result is symmetric) and multiplied by σ̂²

Σ₁ = varest(m1) .* (l1 * l1')
4×4 Matrix{Float64}:
  91124.1    9977.46   -11058.7   7250.91
   9977.46   1092.46    -1210.85   793.924
 -11058.7   -1210.85     3464.58  1999.15
   7250.91    793.924    1999.15  4482.37
diag(Σ₁)  # compare to the variance column in the VarCorr output
4-element Vector{Float64}:
 91124.07791836817
  1092.4628467755617
  3464.584800597221
  4482.366224216353
sqrt.(diag(Σ₁))
4-element Vector{Float64}:
 301.86764967178607
  33.05242573209358
  58.86072375189776
  66.95047590731788

Shrinkage plots

Code
shrinkageplot(m1)

Figure 1: Shrinkage plot of model m1

The upper left panel shows the perfect negative correlation for those two components of the random effects.

shrinkageplot(m1, :item)

X1 = Int.(m1.X')
8×1789 Matrix{Int64}:
  1   1   1   1   1  1   1   1   1   1  …   1   1   1   1   1   1   1  1   1
 -1   1   1  -1  -1  1   1  -1  -1   1      1  -1  -1   1   1  -1  -1  1   1
 -1   1  -1   1  -1  1  -1   1  -1   1     -1   1  -1   1  -1   1  -1  1  -1
  1  -1  -1  -1  -1  1   1   1   1  -1      1   1   1  -1  -1  -1  -1  1   1
  1   1  -1  -1   1  1  -1  -1   1   1     -1  -1   1   1  -1  -1   1  1  -1
 -1  -1  -1   1   1  1   1  -1  -1  -1  …   1  -1  -1  -1  -1   1   1  1   1
 -1  -1   1  -1   1  1  -1   1  -1  -1     -1   1  -1  -1   1  -1   1  1  -1
  1  -1   1   1  -1  1  -1  -1   1  -1     -1  -1   1  -1   1   1  -1  1  -1
X1 * X1'
8×8 Matrix{Int64}:
 1789    -1    -1     3    -3     1     1     3
   -1  1789    -3     1    -1     3     3     1
   -1    -3  1789     1    -1     3     3     1
    3     1     1  1789     3    -1    -1    -3
   -3    -1    -1     3  1789     1     1     3
    1     3     3    -1     1  1789    -3    -1
    1     3     3    -1     1    -3  1789    -1
    3     1     1    -3     3    -1    -1  1789

How to interpret a shrinkage plot

  • Extreme shrinkage (shrunk to a line or to a point) is easy to interpret - the term is not providing benefit and can be removed.
  • When the range of the blue dots (shrunk values) is comparable to those of the red dots (unshrunk) it indicates that the term after shrinkage is about as strong as without shrinkage.
  • By itself, this doesn’t mean that the term is important. In some ways you need to get a feeling for the absolute magnitude of the random effects in addition to the relative magnitude.
  • Small magnitude and small relative magnitude indicate you can drop that term

Conclusions from these plots

  • Only the intercept for the subj appears to be contributing explanatory power
  • For the item both the intercept and the spkr appear to be contributing
m2 = let
  form = @formula(
    rt_trunc ~
      1 + prec * spkr * load + (1 | subj) + (1 + prec | item)
  )
  fit(MixedModel, form, kb07; contrasts)
end
Est. SE z p σ_item σ_subj
(Intercept) 2181.7582 77.4709 28.16 <1e-99 364.7286 298.1109
prec: maintain -333.8582 47.4629 -7.03 <1e-11 252.6687
spkr: old 67.8114 16.0526 4.22 <1e-04
load: yes 78.6849 16.0525 4.90 <1e-06
prec: maintain & spkr: old -21.8802 16.0525 -1.36 0.1729
prec: maintain & load: yes 4.4710 16.0526 0.28 0.7806
spkr: old & load: yes 18.3214 16.0526 1.14 0.2537
prec: maintain & spkr: old & load: yes 23.5219 16.0525 1.47 0.1428
Residual 678.9318
VarCorr(m2)
Column Variance Std.Dev Corr.
item (Intercept) 133026.916 364.729
prec: maintain 63841.497 252.669 -0.70
subj (Intercept) 88870.081 298.111
Residual 460948.432 678.932
Code
shrinkageplot(m2)

Figure 2: Shrinkage plot of model m2

m3 = let
  form = @formula(
    rt_trunc ~
      1 + prec + spkr + load + (1 | subj) + (1 + prec | item)
  )
  fit(MixedModel, form, kb07; contrasts)
end
Est. SE z p σ_item σ_subj
(Intercept) 2181.8526 77.4681 28.16 <1e-99 364.7125 298.0259
prec: maintain -333.7906 47.4472 -7.03 <1e-11 252.5212
spkr: old 67.8790 16.0785 4.22 <1e-04
load: yes 78.5904 16.0785 4.89 <1e-05
Residual 680.0319
VarCorr(m3)
Column Variance Std.Dev Corr.
item (Intercept) 133015.243 364.713
prec: maintain 63766.936 252.521 -0.70
subj (Intercept) 88819.436 298.026
Residual 462443.388 680.032
rng = Random.seed!(1234321);
m3btstrp = parametricbootstrap(rng, 2000, m3);
DataFrame(shortestcovint(m3btstrp))

9 rows × 5 columns

typegroupnameslowerupper
StringString?String?Float64Float64
1βmissing(Intercept)2013.952319.53
2βmissingprec: maintain-429.807-241.429
3βmissingspkr: old35.333995.7272
4βmissingload: yes47.067111.04
5σitem(Intercept)267.788452.9
6σitemprec: maintain171.547314.702
7ρitem(Intercept), prec: maintain-0.893081-0.457084
8σsubj(Intercept)235.922364.717
9σresidualmissing657.736703.054
ridgeplot(m3btstrp)

Figure 3: Ridge plot of the fixed-effects coefficients from the bootstrap sample

ridgeplot(m3btstrp; show_intercept=false)

Figure 4: ?(caption)

m4 = let
  form = @formula(
    rt_trunc ~
      1 + prec + spkr + load + (1 + prec | item) + (1 | subj)
  )
  fit(MixedModel, form, kb07; contrasts)
end
Est. SE z p σ_item σ_subj
(Intercept) 2181.8526 77.4681 28.16 <1e-99 364.7125 298.0259
prec: maintain -333.7906 47.4472 -7.03 <1e-11 252.5212
spkr: old 67.8790 16.0785 4.22 <1e-04
load: yes 78.5904 16.0785 4.89 <1e-05
Residual 680.0319
m4bstrp = parametricbootstrap(rng, 2000, m4);
ridgeplot(m4bstrp; show_intercept=false)

DataFrame(shortestcovint(m4bstrp))

9 rows × 5 columns

typegroupnameslowerupper
StringString?String?Float64Float64
1βmissing(Intercept)2008.722319.8
2βmissingprec: maintain-433.808-248.858
3βmissingspkr: old35.472997.9576
4βmissingload: yes47.0078108.437
5σitem(Intercept)261.52444.426
6σitemprec: maintain177.438318.854
7ρitem(Intercept), prec: maintain-0.898522-0.477346
8σsubj(Intercept)229.031356.407
9σresidualmissing656.918701.946
VarCorr(m4)
Column Variance Std.Dev Corr.
item (Intercept) 133015.243 364.713
prec: maintain 63766.936 252.521 -0.70
subj (Intercept) 88819.436 298.026
Residual 462443.388 680.032
Code
let mods = [m1, m2, m4]
  DataFrame(;
    geomdof=(sum  leverage).(mods),
    npar=dof.(mods),
    deviance=deviance.(mods),
    AIC=aic.(mods),
    BIC=bic.(mods),
    AICc=aicc.(mods),
  )
end

3 rows × 6 columns

geomdofnpardevianceAICBICAICc
Float64Int64Float64Float64Float64Float64
1130.1252928638.028696.028855.228697.0
2107.5431328658.528684.528755.828684.7
3103.478928663.928681.928731.328682.0
scatter(fitted(m4), residuals(m4))

Figure 5: Residuals versus fitted values for model m4