using AlgebraOfGraphics
using CairoMakie
using DataFrames
using Distributions
using MixedModels
using MixedModelsMakie
using MixedModelsSim
using StatsBase
using SMLP2026: dataset
using Random
const progress = isinteractive()Using simulation to estimate uncertainty and power
Or how I learned how to stop worrying and love the bootstrap
After working through this page you will be able to:
- use simulation and the parametric bootstrap to estimate power and uncertainty;
- simulate data from scratch for a planned design;
- interpret the simulated distributions to inform design decisions.
Prerequisites: Parametric bootstrap for mixed-effects models.
Datasets used: kb07 and simulated data (see the dataset catalog).
1 The Parametric Bootstrap
Let us consider the kb07 dataset.
kb07 = dataset(:kb07)
contrasts = Dict(:spkr => EffectsCoding(),
:prec => EffectsCoding(),
:load => EffectsCoding())
fm1 = fit(MixedModel,
@formula(rt_trunc ~ 1 * spkr * prec * load +
(1 | subj) +
(1 | item)),
kb07; contrasts, progress)We can perform a parametric bootstrap on the model to get estimates of our uncertainty. In the parametric bootstrap, we use the parameters we estimated to simulate new data. If we repeat this process many times, we are able to “pick ourselves up by our bootstraps” and examine the variability we would expect to see based purely on chance if the ground truth exactly matched our estimates. In this way, we are able to estimate our uncertainty – we cannot be more certain than the ‘natural’ variability we would have for a given parameter value.
pb1 = parametricbootstrap(MersenneTwister(42), 1000, fm1;
optsum_overrides=(;ftol_rel=1e-8))MixedModelBootstrap with 1000 samples
parameter min q25 median mean q75 max
┌──────────────────────────────────────────────────────────────────────
1 │ β1 1900.3 2135.55 2185.24 2185.58 2239.79 2450.02
2 │ β2 18.158 57.4185 67.9409 68.2766 79.3631 122.066
3 │ β3 -395.408 -344.343 -331.891 -332.839 -321.136 -287.525
4 │ β4 27.2793 66.6404 78.395 78.2168 89.6324 130.452
5 │ β5 -72.9725 -33.1367 -21.8153 -21.7127 -9.80705 37.0716
6 │ β6 -33.2009 6.93196 18.0705 18.0002 29.3706 63.2961
7 │ β7 -54.4851 -8.0108 4.59346 4.4105 16.6667 57.3347
8 │ β8 -28.4451 11.1012 22.1103 22.6787 34.6194 76.9896
9 │ σ 682.112 710.281 718.865 718.961 726.745 759.56
10 │ σ1 208.795 290.502 314.746 316.101 340.264 423.95
11 │ σ2 181.808 324.209 354.19 356.639 389.676 504.023
12 │ θ1 0.276611 0.403103 0.436941 0.439833 0.47219 0.588657
13 │ θ2 0.251186 0.452144 0.494152 0.49617 0.54042 0.691018
optsum_overrides
The option optsum_overrides allows us to pass additional arguments for controlling the model fitting of each simulation replicate. ftol_rel=1e-8 lowers the threshold for changes in the objective – more directly, the optimizer considers the model converged if the change in the deviance is less than \(10^{-8}\), which is a very small change, but larger than the default \(10^{-12}\). Because the majority of the optimization time is spent in final fine-tuning, changing this threshold can greatly speed up the fitting time at the cost of a small loss of quality in fit. For a stochastic process like the bootstrap, that change in quality just adds to the general noise, but that’s acceptable tradeoff in order to get many more replicates. The MixedModels.jl documentation also includes a little note on the reduced precision bootstrap
Now, if we look at the docstring for parametricbootstrap, we see that there are keyword-arguments for the various model parameters:
Code
@doc parametricbootstrapparametricbootstrap([rng::AbstractRNG], nsamp::Integer, m::MixedModel{T}, ftype=T;
β = fixef(m), σ = m.σ, θ = m.θ, progress=true, optsum_overrides=(;))
Perform nsamp parametric bootstrap replication fits of m, returning a MixedModelBootstrap.
The default random number generator is Random.GLOBAL_RNG.
ftype can be used to store the computed bootstrap values in a lower precision. ftype is not a named argument because named arguments are not used in method dispatch and thus specialization. In other words, having ftype as a positional argument has some potential performance benefits.
Keyword Arguments
β,σ, andθare the values ofm's parameters for simulating the responses.σis only valid forLinearMixedModelandGeneralizedLinearMixedModelfor
families with a dispersion parameter.
progresscontrols whether the progress bar is shown. Note that the progress
bar is automatically disabled for non-interactive (i.e. logging) contexts.
optsum_overridesis used to override values of OptSummary in the models
fit during the bootstrapping process. For example, optsum_overrides=(;ftol_rel=1e-08) reduces the convergence criterion, which can greatly speed up the bootstrap fits. Taking advantage of this speed up to increase n can often lead to better estimates of coverage intervals.
Note
All coefficients are bootstrapped. In the rank deficient case, the inestimatable coefficients are treated as -0.0 in the simulations underlying the bootstrap, which will generally result in their estimate from the simulated data also being being inestimable and thus set to -0.0. However this behavior may change in future releases to explicitly drop the extraneous columns before simulation and thus not include their estimates in the bootstrap result.
These keyword arguments are forward on to the simulate! function, which simulates a new dataset based on model matrices and parameter values. The model matrices are simply taken from the model at hand. By default, the parameter values are the estimated parameter values from a fitted model.
Code
@doc simulate!simulate!(rng::AbstractRNG, m::MixedModel{T}; β=fixef(m), σ=m.σ, θ=T[])
simulate!(m::MixedModel; β=fixef(m), σ=m.σ, θ=m.θ)
Overwrite the response (i.e. m.trms[end]) with a simulated response vector from model m.
This simulation includes sampling new values for the random effects.
β can be specified either as a pivoted, full rank coefficient vector (cf. fixef) or as an unpivoted full dimension coefficient vector (cf. coef), where the entries corresponding to redundant columns will be ignored.
Note
Note that simulate! methods with a y::AbstractVector as the first argument (besides the RNG) and simulate methods return the simulated response. This is in contrast to simulate! methods with a m::MixedModel as the first argument, which modify the model's response and return the entire modified model.
simulate!([rng::AbstractRNG,] y::AbstractVector, m::MixedModel{T}[, newdata];
β = coef(m), σ = m.σ, θ = T[], wts=m.wts)
simulate([rng::AbstractRNG,] m::MixedModel{T}[, newdata];
β = coef(m), σ = m.σ, θ = T[], wts=m.wts)
Simulate a new response vector, optionally overwriting a pre-allocated vector.
New data can be optionally provided in tabular format.
This simulation includes sampling new values for the random effects. Thus in contrast to predict, there is no distinction in between "new" and "old" / previously observed random-effects levels.
Unlike predict, there is no type parameter for GeneralizedLinearMixedModel because the noise term in the model and simulation is always on the response scale.
The wts argument is currently ignored except for GeneralizedLinearMixedModel models with a Binomial distribution.
Note
Note that simulate! methods with a y::AbstractVector as the first argument (besides the RNG) and simulate methods return the simulated response. This is in contrast to simulate! methods with a m::MixedModel as the first argument, which modify the model's response and return the entire modified model.
So now we have a way to simulate new data with new parameter values once we have a model. We just need a way to create a model with our preferred design. We’ll use the MixedModelsSim package for that.
1.1 Simulating data from scratch.
The MixedModelsSim package provides a function simdat_crossed for simulating effects from a crossed design:
Code
@doc simdat_crossedsimdat_crossed([RNG], subj_n, item_n;
subj_btwn=nothing, item_btwn=nothing, both_win=nothing,
subj_prefix="S", item_prefix="I")
Return a row table with a design specified by the:
number of subjects (
subj_n),number of items (
item_n)between-subject factors (
subj_btwn)between-item factors (
item_btwn)within-subject/item factors (
both_win)
If a factor is both between-subject and between-item, put it in both subj_btwn and item_btwn with the same keys and the same levels.
Factors should be specified as dictionaries in the following format:
Dict(
:factor1_name => ["F1_level1", "F1_level2"],
:factor2_name => ["F2_level1", "F2_level2", "F2_level3"]
)
In addition to design, the rowtable contains a field dv pre-populated with N(0,1) noise as a basis for further simulating a design.
Note
The number of subjects/items must divide the number of combinations of between subject/item factor levels. In other words, this function assumes a balanced design and will throw an error if that is not possible.
Let’s see what that looks like in practice. We’ll look at a simple 2 x 2 design with 20 subjects and 20 items. Our first factor age will vary between subjects and have the levels old and young. Our second factor frequency will vary between items and have the levels low and high. Finally, we also need to specify a random number generator to use for seeding the data simulation.
subj_n = 20
item_n = 20
subj_btwn = Dict(:age => ["old", "young"])
item_btwn = Dict(:frequency => ["low", "high"])
const RNG = MersenneTwister(42)
dat = simdat_crossed(RNG, subj_n, item_n;
subj_btwn, item_btwn)
Table(dat)Table with 5 columns and 400 rows:
subj age item frequency dv
┌─────────────────────────────────────────
1 │ S01 old I01 low 1.21028
2 │ S02 young I01 low -0.0789986
3 │ S03 old I01 low 0.403512
4 │ S04 young I01 low 0.289951
5 │ S05 old I01 low -0.0670422
6 │ S06 young I01 low 0.61398
7 │ S07 old I01 low 1.01694
8 │ S08 young I01 low -0.053864
9 │ S09 old I01 low -0.570387
10 │ S10 young I01 low -0.857737
11 │ S11 old I01 low -0.211452
12 │ S12 young I01 low -0.853673
13 │ S13 old I01 low 0.439568
14 │ S14 young I01 low -1.31491
15 │ S15 old I01 low 0.176901
16 │ S16 young I01 low -0.254805
17 │ S17 old I01 low -1.77737
⋮ │ ⋮ ⋮ ⋮ ⋮ ⋮
We have 400 rows – 20 subjects x 20 items. Similarly, the experimental factors are expanded out to be fully crossed. Finally, we have a dependent variable dv initialized to be draws from the standard normal distribution \(N(0,1)\).
simdat_crossed is designed to simulate a fully crossed factorial design. If you have a partially crossed or Latin squaresdesign, then you could delete the “extra” cells to reduce the fully crossed data here to be partially crossed. For continuous covariates, we need to separately construct the covariates and then use a tabular join to create the design. We’ll examine an example of this later.
simmod = fit(MixedModel,
@formula(dv ~ 1 + age * frequency +
(1 + frequency | subj) +
(1 + age | item)), dat; progress)
println(simmod)Linear mixed model fit by maximum likelihood
dv ~ 1 + age + frequency + age & frequency + (1 + frequency | subj) + (1 + age | item)
logLik -2 logLik AIC AICc BIC
-591.8575 1183.7149 1205.7149 1206.3953 1249.6210
Variance components:
Column Variance Std.Dev. Corr.
subj (Intercept) 0.0733656 0.2708608
frequency: low 0.0335313 0.1831157 -1.00
item (Intercept) 0.0132766 0.1152241
age: young 0.0005200 0.0228031 -1.00
Residual 1.0883091 1.0432205
Number of obs: 400; levels of grouping factors: 20, 20
Fixed-effects parameters:
─────────────────────────────────────────────────────────────────────
Coef. Std. Error z Pr(>|z|)
─────────────────────────────────────────────────────────────────────
(Intercept) 0.0816111 0.139812 0.58 0.5594
age: young -0.0349801 0.191027 -0.18 0.8547
frequency: low -0.0638793 0.166657 -0.38 0.7015
age: young & frequency: low 0.00879841 0.224372 0.04 0.9687
─────────────────────────────────────────────────────────────────────
make sure to discuss contrasts here
β = [250.0, -25.0, 10, 0.0]
simulate!(RNG, simmod; β)| Est. | SE | z | p | σ_subj | σ_item | |
|---|---|---|---|---|---|---|
| (Intercept) | 0.0816 | 0.1398 | 0.58 | 0.5594 | 0.2709 | 0.1152 |
| age: young | -0.0350 | 0.1910 | -0.18 | 0.8547 | 0.0228 | |
| frequency: low | -0.0639 | 0.1667 | -0.38 | 0.7015 | 0.1831 | |
| age: young & frequency: low | 0.0088 | 0.2244 | 0.04 | 0.9687 | ||
| Residual | 1.0432 |
fit!(simmod)| Est. | SE | z | p | σ_subj | σ_item | |
|---|---|---|---|---|---|---|
| (Intercept) | 249.9796 | 0.1267 | 1973.16 | <1e-99 | 0.1641 | 0.1703 |
| age: young | -25.0472 | 0.1655 | -151.36 | <1e-99 | 0.1041 | |
| frequency: low | 9.7387 | 0.1640 | 59.39 | <1e-99 | 0.0422 | |
| age: young & frequency: low | 0.1164 | 0.2106 | 0.55 | 0.5804 | ||
| Residual | 1.0225 |
σ = 25.0
fit!(simulate!(RNG, simmod; β, σ))| Est. | SE | z | p | σ_subj | σ_item | |
|---|---|---|---|---|---|---|
| (Intercept) | 256.2394 | 3.9124 | 65.49 | <1e-99 | 4.5684 | 8.6530 |
| age: young | -33.8588 | 4.6177 | -7.33 | <1e-12 | 7.5397 | |
| frequency: low | 5.5490 | 5.2227 | 1.06 | 0.2880 | 2.8944 | |
| age: young & frequency: low | 5.1727 | 5.9978 | 0.86 | 0.3884 | ||
| Residual | 23.9420 |
# relative to σ!
subj_re = create_re(2.0, 1.3)2×2 LinearAlgebra.LowerTriangular{Float64, LinearAlgebra.Diagonal{Float64, Vector{Float64}}}:
2.0 ⋅
0.0 1.3
item_re = create_re(1.3, 2.0)2×2 LinearAlgebra.LowerTriangular{Float64, LinearAlgebra.Diagonal{Float64, Vector{Float64}}}:
1.3 ⋅
0.0 2.0
θ = createθ(simmod; subj=subj_re, item=item_re)6-element Vector{Float64}:
2.0
0.0
1.3
1.3
0.0
2.0
fit!(simulate!(RNG, simmod; β, σ, θ))| Est. | SE | z | p | σ_subj | σ_item | |
|---|---|---|---|---|---|---|
| (Intercept) | 238.6765 | 15.4590 | 15.44 | <1e-53 | 39.9114 | 27.0075 |
| age: young | -21.2541 | 21.2976 | -1.00 | 0.3183 | 34.8575 | |
| frequency: low | -0.3036 | 16.0265 | -0.02 | 0.9849 | 31.2202 | |
| age: young & frequency: low | 16.0378 | 21.5626 | 0.74 | 0.4570 | ||
| Residual | 25.9787 |
samp = parametricbootstrap(RNG, 1000, simmod; β, σ, θ, progress)MixedModelBootstrap with 1000 samples
parameter min q25 median mean q75 ⋯
┌─────────────────────────────────────────────────────────────────────
1 │ β1 187.504 236.563 250.196 249.917 263.367 ⋯
2 │ β2 -118.074 -45.386 -24.2367 -25.4062 -6.48009 ⋯
3 │ β3 -57.7974 -2.0594 8.93123 9.78849 22.555 ⋯
4 │ β4 -77.082 -17.5927 0.253178 -0.0776182 17.2086 ⋯
5 │ σ 22.204 24.3262 24.9756 24.9547 25.5221 ⋯
6 │ σ1 25.6549 42.027 47.4457 47.7116 53.3682 ⋯
7 │ σ2 13.7514 26.8112 31.2347 30.9942 34.9932 ⋯
8 │ ρ1 -0.808928 -0.18075 -0.0242876 -0.0201442 0.151477 ⋯
9 │ σ3 14.4239 27.508 31.2277 31.2821 35.1992 ⋯
10 │ σ4 17.287 42.1595 47.4245 47.5599 52.6321 ⋯
11 │ ρ2 -0.696028 -0.203852 -0.0184107 -0.0147826 0.156928 ⋯
12 │ θ1 1.01087 1.68976 1.90887 1.91418 2.13819 ⋯
13 │ θ2 -1.19827 -0.227775 -0.0266265 -0.0274248 0.185927 ⋯
14 │ θ3 0.369048 1.03083 1.20271 1.20325 1.37222 ⋯
15 │ θ4 0.570767 1.09377 1.24564 1.25551 1.40709 ⋯
16 │ θ5 -1.32032 -0.386843 -0.0346073 -0.0343756 0.302657 ⋯
17 │ θ6 0.683923 1.62089 1.84137 1.84508 2.0708 ⋯
ridgeplot(samp)let f = Figure()
ax = Axis(f[1, 1])
coefplot!(ax, samp;
conf_level=0.8,
vline_at_zero=true,
show_intercept=true)
ridgeplot!(ax, samp;
conf_level=0.8,
vline_at_zero=true,
show_intercept=true,
# xlabel="Normalized density and 80% range")
)
scatter!(ax, β, length(β):-1:1;
marker=:x,
markersize=20,
color=:red)
f
endcoefpvalues = DataFrame()
# @showprogress
for subj_n in [20, 40, 60, 80, 100, 120, 140], item_n in [40, 60, 80, 100, 120, 140]
dat = simdat_crossed(RNG, subj_n, item_n;
subj_btwn, item_btwn)
simmod = MixedModel(@formula(dv ~ 1 + age * frequency +
(1 + frequency | subj) +
(1 + age | item)),
dat)
θ = createθ(simmod; subj=subj_re, item=item_re)
simboot = parametricbootstrap(RNG, 100, simmod;
β, σ, θ,
optsum_overrides=(;ftol_rel=1e-8),
progress)
df = DataFrame(simboot.coefpvalues)
df[!, :subj_n] .= subj_n
df[!, :item_n] .= item_n
append!(coefpvalues, df)
endpower = combine(groupby(coefpvalues, [:coefname, :subj_n, :item_n]),
:p => (p -> mean(p .< 0.05)) => :power)| Row | coefname | subj_n | item_n | power |
|---|---|---|---|---|
| Symbol | Int64 | Int64 | Float64 | |
| 1 | (Intercept) | 20 | 40 | 1.0 |
| 2 | age: young | 20 | 40 | 0.29 |
| 3 | frequency: low | 20 | 40 | 0.07 |
| 4 | age: young & frequency: low | 20 | 40 | 0.06 |
| 5 | (Intercept) | 20 | 60 | 1.0 |
| 6 | age: young | 20 | 60 | 0.21 |
| 7 | frequency: low | 20 | 60 | 0.17 |
| 8 | age: young & frequency: low | 20 | 60 | 0.04 |
| 9 | (Intercept) | 20 | 80 | 1.0 |
| 10 | age: young | 20 | 80 | 0.15 |
| 11 | frequency: low | 20 | 80 | 0.17 |
| 12 | age: young & frequency: low | 20 | 80 | 0.07 |
| 13 | (Intercept) | 20 | 100 | 1.0 |
| ⋮ | ⋮ | ⋮ | ⋮ | ⋮ |
| 157 | (Intercept) | 140 | 100 | 1.0 |
| 158 | age: young | 140 | 100 | 0.59 |
| 159 | frequency: low | 140 | 100 | 0.21 |
| 160 | age: young & frequency: low | 140 | 100 | 0.04 |
| 161 | (Intercept) | 140 | 120 | 1.0 |
| 162 | age: young | 140 | 120 | 0.71 |
| 163 | frequency: low | 140 | 120 | 0.35 |
| 164 | age: young & frequency: low | 140 | 120 | 0.05 |
| 165 | (Intercept) | 140 | 140 | 1.0 |
| 166 | age: young | 140 | 140 | 0.62 |
| 167 | frequency: low | 140 | 140 | 0.3 |
| 168 | age: young & frequency: low | 140 | 140 | 0.03 |
power = combine(groupby(coefpvalues,
[:coefname, :subj_n, :item_n]),
:p => (p -> mean(p .< 0.05)) => :power,
:p => (p -> sem(p .< 0.05)) => :power_se)| Row | coefname | subj_n | item_n | power | power_se |
|---|---|---|---|---|---|
| Symbol | Int64 | Int64 | Float64 | Float64 | |
| 1 | (Intercept) | 20 | 40 | 1.0 | 0.0 |
| 2 | age: young | 20 | 40 | 0.29 | 0.0456048 |
| 3 | frequency: low | 20 | 40 | 0.07 | 0.0256432 |
| 4 | age: young & frequency: low | 20 | 40 | 0.06 | 0.0238683 |
| 5 | (Intercept) | 20 | 60 | 1.0 | 0.0 |
| 6 | age: young | 20 | 60 | 0.21 | 0.040936 |
| 7 | frequency: low | 20 | 60 | 0.17 | 0.0377525 |
| 8 | age: young & frequency: low | 20 | 60 | 0.04 | 0.0196946 |
| 9 | (Intercept) | 20 | 80 | 1.0 | 0.0 |
| 10 | age: young | 20 | 80 | 0.15 | 0.035887 |
| 11 | frequency: low | 20 | 80 | 0.17 | 0.0377525 |
| 12 | age: young & frequency: low | 20 | 80 | 0.07 | 0.0256432 |
| 13 | (Intercept) | 20 | 100 | 1.0 | 0.0 |
| ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ |
| 157 | (Intercept) | 140 | 100 | 1.0 | 0.0 |
| 158 | age: young | 140 | 100 | 0.59 | 0.0494311 |
| 159 | frequency: low | 140 | 100 | 0.21 | 0.040936 |
| 160 | age: young & frequency: low | 140 | 100 | 0.04 | 0.0196946 |
| 161 | (Intercept) | 140 | 120 | 1.0 | 0.0 |
| 162 | age: young | 140 | 120 | 0.71 | 0.0456048 |
| 163 | frequency: low | 140 | 120 | 0.35 | 0.0479372 |
| 164 | age: young & frequency: low | 140 | 120 | 0.05 | 0.0219043 |
| 165 | (Intercept) | 140 | 140 | 1.0 | 0.0 |
| 166 | age: young | 140 | 140 | 0.62 | 0.0487832 |
| 167 | frequency: low | 140 | 140 | 0.3 | 0.0460566 |
| 168 | age: young & frequency: low | 140 | 140 | 0.03 | 0.0171447 |
select!(power, :coefname, :subj_n, :item_n, :power,
[:power, :power_se] => ByRow((p, se) -> [p - 1.96*se, p + 1.96*se]) => [:lower, :upper])| Row | coefname | subj_n | item_n | power | lower | upper |
|---|---|---|---|---|---|---|
| Symbol | Int64 | Int64 | Float64 | Float64 | Float64 | |
| 1 | (Intercept) | 20 | 40 | 1.0 | 1.0 | 1.0 |
| 2 | age: young | 20 | 40 | 0.29 | 0.200615 | 0.379385 |
| 3 | frequency: low | 20 | 40 | 0.07 | 0.0197392 | 0.120261 |
| 4 | age: young & frequency: low | 20 | 40 | 0.06 | 0.0132181 | 0.106782 |
| 5 | (Intercept) | 20 | 60 | 1.0 | 1.0 | 1.0 |
| 6 | age: young | 20 | 60 | 0.21 | 0.129765 | 0.290235 |
| 7 | frequency: low | 20 | 60 | 0.17 | 0.0960051 | 0.243995 |
| 8 | age: young & frequency: low | 20 | 60 | 0.04 | 0.00139851 | 0.0786015 |
| 9 | (Intercept) | 20 | 80 | 1.0 | 1.0 | 1.0 |
| 10 | age: young | 20 | 80 | 0.15 | 0.0796614 | 0.220339 |
| 11 | frequency: low | 20 | 80 | 0.17 | 0.0960051 | 0.243995 |
| 12 | age: young & frequency: low | 20 | 80 | 0.07 | 0.0197392 | 0.120261 |
| 13 | (Intercept) | 20 | 100 | 1.0 | 1.0 | 1.0 |
| ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ |
| 157 | (Intercept) | 140 | 100 | 1.0 | 1.0 | 1.0 |
| 158 | age: young | 140 | 100 | 0.59 | 0.493115 | 0.686885 |
| 159 | frequency: low | 140 | 100 | 0.21 | 0.129765 | 0.290235 |
| 160 | age: young & frequency: low | 140 | 100 | 0.04 | 0.00139851 | 0.0786015 |
| 161 | (Intercept) | 140 | 120 | 1.0 | 1.0 | 1.0 |
| 162 | age: young | 140 | 120 | 0.71 | 0.620615 | 0.799385 |
| 163 | frequency: low | 140 | 120 | 0.35 | 0.256043 | 0.443957 |
| 164 | age: young & frequency: low | 140 | 120 | 0.05 | 0.00706759 | 0.0929324 |
| 165 | (Intercept) | 140 | 140 | 1.0 | 1.0 | 1.0 |
| 166 | age: young | 140 | 140 | 0.62 | 0.524385 | 0.715615 |
| 167 | frequency: low | 140 | 140 | 0.3 | 0.209729 | 0.390271 |
| 168 | age: young & frequency: low | 140 | 140 | 0.03 | -0.00360354 | 0.0636035 |
data(power) * mapping(:subj_n, :item_n, :power; layout=:coefname) * visual(Heatmap) |> drawdat = simdat_crossed(RNG, subj_n, item_n;
subj_btwn, item_btwn)
dat = DataFrame(dat)| Row | subj | age | item | frequency | dv |
|---|---|---|---|---|---|
| String | String | String | String | Float64 | |
| 1 | S01 | old | I01 | low | -0.882085 |
| 2 | S02 | young | I01 | low | -0.0201484 |
| 3 | S03 | old | I01 | low | 0.945484 |
| 4 | S04 | young | I01 | low | -0.247553 |
| 5 | S05 | old | I01 | low | 0.453218 |
| 6 | S06 | young | I01 | low | 0.924281 |
| 7 | S07 | old | I01 | low | 0.220375 |
| 8 | S08 | young | I01 | low | 1.85978 |
| 9 | S09 | old | I01 | low | -0.44618 |
| 10 | S10 | young | I01 | low | -0.754939 |
| 11 | S11 | old | I01 | low | 0.0105376 |
| 12 | S12 | young | I01 | low | 0.929521 |
| 13 | S13 | old | I01 | low | -0.314341 |
| ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ |
| 389 | S09 | old | I20 | high | 2.2703 |
| 390 | S10 | young | I20 | high | -0.00613232 |
| 391 | S11 | old | I20 | high | -0.923031 |
| 392 | S12 | young | I20 | high | 0.539303 |
| 393 | S13 | old | I20 | high | -0.868366 |
| 394 | S14 | young | I20 | high | 2.10268 |
| 395 | S15 | old | I20 | high | -1.23199 |
| 396 | S16 | young | I20 | high | -1.02512 |
| 397 | S17 | old | I20 | high | 0.818806 |
| 398 | S18 | young | I20 | high | 1.7468 |
| 399 | S19 | old | I20 | high | -0.327621 |
| 400 | S20 | young | I20 | high | -0.841135 |
item_covariates = unique!(select(dat, :item))| Row | item |
|---|---|
| String | |
| 1 | I01 |
| 2 | I02 |
| 3 | I03 |
| 4 | I04 |
| 5 | I05 |
| 6 | I06 |
| 7 | I07 |
| 8 | I08 |
| 9 | I09 |
| 10 | I10 |
| 11 | I11 |
| 12 | I12 |
| 13 | I13 |
| 14 | I14 |
| 15 | I15 |
| 16 | I16 |
| 17 | I17 |
| 18 | I18 |
| 19 | I19 |
| 20 | I20 |
item_covariates[!, :chaos] = rand(RNG,
Normal(5, 2),
nrow(item_covariates))20-element Vector{Float64}:
4.283869606409811
1.766354546694887
3.5819077117990936
4.426082794175044
3.979161685431401
4.150057335954468
5.209549095470591
5.3931183588041325
5.836973833266723
6.042800312594512
3.1607596007727814
6.519844952694776
9.286581793894484
7.995006071550052
4.189786264631877
2.4169512074675787
3.445736901440534
5.123574599692554
4.422113183811228
1.9739443958799194
leftjoin!(dat, item_covariates; on=:item)| Row | subj | age | item | frequency | dv | chaos |
|---|---|---|---|---|---|---|
| String | String | String | String | Float64 | Float64? | |
| 1 | S01 | old | I01 | low | -0.882085 | 4.28387 |
| 2 | S02 | young | I01 | low | -0.0201484 | 4.28387 |
| 3 | S03 | old | I01 | low | 0.945484 | 4.28387 |
| 4 | S04 | young | I01 | low | -0.247553 | 4.28387 |
| 5 | S05 | old | I01 | low | 0.453218 | 4.28387 |
| 6 | S06 | young | I01 | low | 0.924281 | 4.28387 |
| 7 | S07 | old | I01 | low | 0.220375 | 4.28387 |
| 8 | S08 | young | I01 | low | 1.85978 | 4.28387 |
| 9 | S09 | old | I01 | low | -0.44618 | 4.28387 |
| 10 | S10 | young | I01 | low | -0.754939 | 4.28387 |
| 11 | S11 | old | I01 | low | 0.0105376 | 4.28387 |
| 12 | S12 | young | I01 | low | 0.929521 | 4.28387 |
| 13 | S13 | old | I01 | low | -0.314341 | 4.28387 |
| ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ |
| 389 | S09 | old | I20 | high | 2.2703 | 1.97394 |
| 390 | S10 | young | I20 | high | -0.00613232 | 1.97394 |
| 391 | S11 | old | I20 | high | -0.923031 | 1.97394 |
| 392 | S12 | young | I20 | high | 0.539303 | 1.97394 |
| 393 | S13 | old | I20 | high | -0.868366 | 1.97394 |
| 394 | S14 | young | I20 | high | 2.10268 | 1.97394 |
| 395 | S15 | old | I20 | high | -1.23199 | 1.97394 |
| 396 | S16 | young | I20 | high | -1.02512 | 1.97394 |
| 397 | S17 | old | I20 | high | 0.818806 | 1.97394 |
| 398 | S18 | young | I20 | high | 1.7468 | 1.97394 |
| 399 | S19 | old | I20 | high | -0.327621 | 1.97394 |
| 400 | S20 | young | I20 | high | -0.841135 | 1.97394 |
simmod = fit(MixedModel,
@formula(dv ~ 1 + age * frequency * chaos +
(1 + frequency | subj) +
(1 + age | item)), dat; contrasts, progress)| Est. | SE | z | p | σ_subj | σ_item | |
|---|---|---|---|---|---|---|
| (Intercept) | 0.0560 | 0.2412 | 0.23 | 0.8164 | 0.0000 | 0.0000 |
| age: young | -0.3565 | 0.3411 | -1.05 | 0.2959 | 0.0000 | |
| frequency: low | 0.0548 | 0.3699 | 0.15 | 0.8822 | 0.0000 | |
| chaos | -0.0135 | 0.0484 | -0.28 | 0.7809 | ||
| age: young & frequency: low | 0.1030 | 0.5232 | 0.20 | 0.8439 | ||
| age: young & chaos | 0.1036 | 0.0685 | 1.51 | 0.1305 | ||
| frequency: low & chaos | -0.0421 | 0.0738 | -0.57 | 0.5683 | ||
| age: young & frequency: low & chaos | -0.0317 | 0.1044 | -0.30 | 0.7617 | ||
| Residual | 0.9456 |
TODO: continuous covariate TODO: bernoulli response TODO: savereplicates
dat[!, :dv] = rand(RNG, Bernoulli(), nrow(dat))400-element Vector{Bool}:
0
1
0
0
1
1
1
0
0
0
⋮
1
1
1
0
1
1
0
0
1
dat[!, :dv] = rand(RNG, Poisson(), nrow(dat))400-element Vector{Int64}:
2
1
2
0
0
1
0
2
0
1
⋮
0
2
0
1
1
3
0
1
1
dat[!, :n] = rand(RNG, Poisson(), nrow(dat)) .+ 3400-element Vector{Int64}:
5
3
3
7
4
3
5
3
3
4
⋮
4
3
3
4
4
3
6
4
3
dat[!, :dv] = rand.(RNG, Binomial.(dat[!, :n])) ./ dat[!, :n]400-element Vector{Float64}:
0.6
0.3333333333333333
0.6666666666666666
0.42857142857142855
0.5
0.6666666666666666
0.2
0.3333333333333333
0.6666666666666666
0.5
⋮
0.5
0.6666666666666666
0.3333333333333333
0.25
1.0
1.0
0.5
0.75
1.0
1.2 Exercises
- Effect size and power. Re-run the simulation across a range of assumed effect sizes, holding the sample size fixed. Plot estimated power against effect size. What shape do you expect?
Power increases monotonically with effect size, tracing an S-shaped curve from the nominal false-positive rate (when the true effect is zero) toward 1 (for large effects). The interesting region is the middle, where small design changes move power the most.
- Sample size. Now hold the effect size fixed and vary the number of subjects. How does required sample size trade off against effect size for a target power (say 0.8)?
For a fixed target power, required sample size grows rapidly as the effect size shrinks (roughly inversely with the square of the effect size). Simulating a grid over both lets you read off feasible design points rather than relying on a single closed-form approximation.
This page was rendered from git revision e563d55 using Quarto 1.9.38 and Julia 1.12.6.