Code
using CairoMakie
using DataFrames
using LinearAlgebra
using MixedModels
using MixedModelsMakie
using ProgressMeter
ijulia_behavior(:clear)
ProgressMeter.activate!(; type="svg") CairoMakie.
Douglas Bates
2024-06-27
Add the packages to be used
Fit a model for reaction time in the sleepstudy example, preserving information on the estimation progress (the thin=1
optional argument)
m01 = let f = @formula reaction ~ 1 + days + (1 + days|subj)
fit(MixedModel, f, MixedModels.dataset(:sleepstudy); thin=1)
end
print(m01)
Linear mixed model fit by maximum likelihood
reaction ~ 1 + days + (1 + days | subj)
logLik -2 logLik AIC AICc BIC
-875.9697 1751.9393 1763.9393 1764.4249 1783.0971
Variance components:
Column Variance Std.Dev. Corr.
subj (Intercept) 565.51067 23.78047
days 32.68212 5.71683 +0.08
Residual 654.94145 25.59182
Number of obs: 180; levels of grouping factors: 18
Fixed-effects parameters:
──────────────────────────────────────────────────
Coef. Std. Error z Pr(>|z|)
──────────────────────────────────────────────────
(Intercept) 251.405 6.63226 37.91 <1e-99
days 10.4673 1.50224 6.97 <1e-11
──────────────────────────────────────────────────
The covariance matrix for each subject’s random effects is evaluated from its “matrix square root”, called the Cholesky factor.
The transpose of \(\lambda\), written \(\lambda'\), is an upper triangular matrix generated by “flipping” \(\lambda\) about the main diagonal.
The product \(\lambda * \lambda'\) will be symmetric. The covariance matrix of the random effects, \(\Sigma\), is this symmetric matrix scaled by \(\sigma^2\)
The estimated variances of the random effects, which are the diagonal elements of \(\Sigma\), correspond to the values shown in the table. To evaluate the covariance, isolate the correlation
# m01.σρs extracts the `σρs` property of the model.
# This property is a NamedTuple where the names
# correspond to grouping factors - in this case, `subj`.
# So `m01.σρs.subj.ρ` is the estimated correlation(s) for
# this grouping factor. Because there is only one such correlation
# we can extract it with `only()`, which also verifies that
# there is exactly one.
ρ = only(m01.σρs.subj.ρ)
0.08133219589180817
and multiply by the standard deviations
In practice we optimize the log-likelihood with respect to a parameter vector called \(\theta\) that generates \(\lambda\).
The elements of this parameter vector are subject to constraints. In particular, two of the three elements have a lower bound of zero.
That is, the first and third elements of \(\theta\), corresponding to diagonal elements of \(\lambda\), must be non-negative, whereas the second component is unconstrained (has a lower bound of \(-\infty\)).
The optsum.fitlog
property of the model is a vector of tuples where each tuple contains the value of the \(\theta\) vector and the value of the objective at that \(\theta\). The fitlog
always contains the first and the last evaluation. When the thin
named argument is set, this property has a row for every thin’th evaluation.
57-element Vector{Tuple{Vector{Float64}, Float64}}:
([1.0, 0.0, 1.0], 1784.642296192456)
([1.75, 0.0, 1.0], 1790.1256369894486)
([1.0, 1.0, 1.0], 1798.9996244965819)
([1.0, 0.0, 1.75], 1803.8532002844051)
([0.25, 0.0, 1.0], 1800.6139807455452)
([1.0, -1.0, 1.0], 1798.6046308389245)
([1.0, 0.0, 0.25], 1752.2607369909213)
([1.1832612965368465, -0.008661887958066326, 0.0], 1797.5876920198457)
([1.075, 0.0, 0.32499999999999996], 1754.9541095798602)
([0.8166315695343028, 0.01116725445704456, 0.28823768689703677], 1753.695681656805)
([1.0, -0.07071067811865475, 0.19696699141100893], 1754.8169985163813)
([0.9436827046395643, 0.06383542916407543, 0.2626963029646229], 1753.1067335474727)
([0.9801419885633994, -0.026656844944244907, 0.27474275609535526], 1752.939376719002)
⋮
([0.9289855999666455, 0.01823660250666579, 0.22248440076567516], 1751.93935485824)
([0.9286970797848914, 0.018293694810667324, 0.2231753585201247], 1751.939489774018)
([0.9282426145734711, 0.01826952230111312, 0.22258371399259966], 1751.9393630914042)
([0.9291127489262951, 0.018179125424596244, 0.22262388984119313], 1751.9393448296826)
([0.9291905988237738, 0.01816575408997575, 0.22264320543969665], 1751.9393444889993)
([0.9292543047658205, 0.018209270421704643, 0.2226208143044214], 1751.9393450338166)
([0.929189229695291, 0.01812979185998945, 0.22257323648679392], 1751.9393475283666)
([0.9292535809765234, 0.018167625515921226, 0.22264990504608415], 1751.939344490362)
([0.9292145006688363, 0.01817173915920612, 0.22264674299586876], 1751.93934447441)
([0.9292083468967692, 0.018171477280421077, 0.22264619293728768], 1751.939344475018)
([0.9292092940780088, 0.018172966348127092, 0.22265206223369866], 1751.939344506563)
([0.9292213124664976, 0.01816838673115054, 0.22264486480463275], 1751.9393444646873)
There were 57 evaluations of the objective before convergence was declared, according to rather stringent convergence criteria. We can see that the last 10 evaluations only produced changes in the fourth decimal place of the objective or even smaller. That is, effective convergence occurred after about 40 or 45 evaluations.
When the iterations converge to a value of \(\theta\) that is on the boundary, which means that one of the diagonal elements of \(\lambda\) is exactly zero, we say that \(\lambda\) is singular, in the sense that its inverse does not exist.
Another, more evocative characterization, is that the distribution of the random effects is a degenerate distribution, in the sense that the values of the random effects are constrained to a plane (technically, a linear subspace) in the space of all possible random effects.
This is shown in section 3.5.3 of Embrace Uncertainty, especially Figure 3.13.
These are some appendices, mostly so that I can keep track of them
There is a short-cut for evaluating the correlation which is to “normalize” the second row of \(\lambda\), in the sense that the row is scaled so that it has unit length.
providing the correlation as
To profile the correlation we need optimize the objective while fixing a value of the correlation. The way we will do this is to determine \(\theta_2\) as a function of the fixed \(\rho\) and \(\theta_3\).
We need to solve \[ \rho = \frac{\theta_2}{\sqrt{\theta_2^2 + \theta_3^2}} \]
for \(\theta_2\) as a function of \(\rho\) and \(\theta_3\).
Notice that \(\theta_2\) and \(\rho\) have the same sign. Thus it is sufficient to determine the absolute value of \(\theta_2\) then transfer the sign from \(\rho\). \[ \theta_2^2=\rho^2(\theta_2^2 + \theta_3^2) \] which implies \[ \theta_2^2 = \frac{\rho^2}{1-\rho^2}\theta_3^2, \quad \theta_3\ge 0 \] and thus \[ \theta_2=\frac{\rho}{\sqrt{1-\rho^2}}\theta_3 \]