Using simulation to estimate uncertainty and power

Or how I learned how to stop worrying and love the bootstrap

Authors

Phillip Alday

Reinhold Kliegl

Published

2026-06-21

After working through this page you will be able to:

  • use simulation and the parametric bootstrap to estimate power and uncertainty;
  • simulate data from scratch for a planned design;
  • interpret the simulated distributions to inform design decisions.
NoteBefore you start

Prerequisites: Parametric bootstrap for mixed-effects models.

Datasets used: kb07 and simulated data (see the dataset catalog).

using AlgebraOfGraphics
using CairoMakie
using DataFrames
using Distributions
using MixedModels
using MixedModelsMakie
using MixedModelsSim
using StatsBase
using SMLP2026: dataset
using Random

const progress = isinteractive()

1 The Parametric Bootstrap

Let us consider the kb07 dataset.

kb07 = dataset(:kb07)
contrasts = Dict(:spkr => EffectsCoding(),
                 :prec => EffectsCoding(),
                 :load => EffectsCoding())
fm1 = fit(MixedModel,
          @formula(rt_trunc ~ 1 * spkr * prec * load +
                             (1 | subj) +
                             (1 | item)),
          kb07; contrasts, progress)

We can perform a parametric bootstrap on the model to get estimates of our uncertainty. In the parametric bootstrap, we use the parameters we estimated to simulate new data. If we repeat this process many times, we are able to “pick ourselves up by our bootstraps” and examine the variability we would expect to see based purely on chance if the ground truth exactly matched our estimates. In this way, we are able to estimate our uncertainty – we cannot be more certain than the ‘natural’ variability we would have for a given parameter value.

pb1 = parametricbootstrap(MersenneTwister(42), 1000, fm1;
                          optsum_overrides=(;ftol_rel=1e-8))
MixedModelBootstrap with 1000 samples
      parameter  min       q25       median    mean      q75       max
    ┌──────────────────────────────────────────────────────────────────────
 1  │ β1         1900.3    2135.55   2185.24   2185.58   2239.79   2450.02
 2  │ β2         18.158    57.4185   67.9409   68.2766   79.3631   122.066
 3  │ β3         -395.408  -344.343  -331.891  -332.839  -321.136  -287.525
 4  │ β4         27.2793   66.6404   78.395    78.2168   89.6324   130.452
 5  │ β5         -72.9725  -33.1367  -21.8153  -21.7127  -9.80705  37.0716
 6  │ β6         -33.2009  6.93196   18.0705   18.0002   29.3706   63.2961
 7  │ β7         -54.4851  -8.0108   4.59346   4.4105    16.6667   57.3347
 8  │ β8         -28.4451  11.1012   22.1103   22.6787   34.6194   76.9896
 9  │ σ          682.112   710.281   718.865   718.961   726.745   759.56
 10 │ σ1         208.795   290.502   314.746   316.101   340.264   423.95
 11 │ σ2         181.808   324.209   354.19    356.639   389.676   504.023
 12 │ θ1         0.276611  0.403103  0.436941  0.439833  0.47219   0.588657
 13 │ θ2         0.251186  0.452144  0.494152  0.49617   0.54042   0.691018
Tipoptsum_overrides

The option optsum_overrides allows us to pass additional arguments for controlling the model fitting of each simulation replicate. ftol_rel=1e-8 lowers the threshold for changes in the objective – more directly, the optimizer considers the model converged if the change in the deviance is less than \(10^{-8}\), which is a very small change, but larger than the default \(10^{-12}\). Because the majority of the optimization time is spent in final fine-tuning, changing this threshold can greatly speed up the fitting time at the cost of a small loss of quality in fit. For a stochastic process like the bootstrap, that change in quality just adds to the general noise, but that’s acceptable tradeoff in order to get many more replicates. The MixedModels.jl documentation also includes a little note on the reduced precision bootstrap

Now, if we look at the docstring for parametricbootstrap, we see that there are keyword-arguments for the various model parameters:

Code
@doc parametricbootstrap
parametricbootstrap([rng::AbstractRNG], nsamp::Integer, m::MixedModel{T}, ftype=T;
    β = fixef(m), σ = m.σ, θ = m.θ, progress=true, optsum_overrides=(;))

Perform nsamp parametric bootstrap replication fits of m, returning a MixedModelBootstrap.

The default random number generator is Random.GLOBAL_RNG.

ftype can be used to store the computed bootstrap values in a lower precision. ftype is not a named argument because named arguments are not used in method dispatch and thus specialization. In other words, having ftype as a positional argument has some potential performance benefits.

Keyword Arguments

  • β, σ, and θ are the values of m's parameters for simulating the responses.

  • σ is only valid for LinearMixedModel and GeneralizedLinearMixedModel for

families with a dispersion parameter.

  • progress controls whether the progress bar is shown. Note that the progress

bar is automatically disabled for non-interactive (i.e. logging) contexts.

  • optsum_overrides is used to override values of OptSummary in the models

fit during the bootstrapping process. For example, optsum_overrides=(;ftol_rel=1e-08) reduces the convergence criterion, which can greatly speed up the bootstrap fits. Taking advantage of this speed up to increase n can often lead to better estimates of coverage intervals.

Note

All coefficients are bootstrapped. In the rank deficient case, the inestimatable coefficients are treated as -0.0 in the simulations underlying the bootstrap, which will generally result in their estimate from the simulated data also being being inestimable and thus set to -0.0. However this behavior may change in future releases to explicitly drop the extraneous columns before simulation and thus not include their estimates in the bootstrap result.

These keyword arguments are forward on to the simulate! function, which simulates a new dataset based on model matrices and parameter values. The model matrices are simply taken from the model at hand. By default, the parameter values are the estimated parameter values from a fitted model.

Code
@doc simulate!
simulate!(rng::AbstractRNG, m::MixedModel{T}; β=fixef(m), σ=m.σ, θ=T[])
simulate!(m::MixedModel; β=fixef(m), σ=m.σ, θ=m.θ)

Overwrite the response (i.e. m.trms[end]) with a simulated response vector from model m.

This simulation includes sampling new values for the random effects.

β can be specified either as a pivoted, full rank coefficient vector (cf. fixef) or as an unpivoted full dimension coefficient vector (cf. coef), where the entries corresponding to redundant columns will be ignored.

Note

Note that simulate! methods with a y::AbstractVector as the first argument (besides the RNG) and simulate methods return the simulated response. This is in contrast to simulate! methods with a m::MixedModel as the first argument, which modify the model's response and return the entire modified model.

simulate!([rng::AbstractRNG,] y::AbstractVector, m::MixedModel{T}[, newdata];
                β = coef(m), σ = m.σ, θ = T[], wts=m.wts)
simulate([rng::AbstractRNG,] m::MixedModel{T}[, newdata];
                β = coef(m), σ = m.σ, θ = T[], wts=m.wts)

Simulate a new response vector, optionally overwriting a pre-allocated vector.

New data can be optionally provided in tabular format.

This simulation includes sampling new values for the random effects. Thus in contrast to predict, there is no distinction in between "new" and "old" / previously observed random-effects levels.

Unlike predict, there is no type parameter for GeneralizedLinearMixedModel because the noise term in the model and simulation is always on the response scale.

The wts argument is currently ignored except for GeneralizedLinearMixedModel models with a Binomial distribution.

Note

Note that simulate! methods with a y::AbstractVector as the first argument (besides the RNG) and simulate methods return the simulated response. This is in contrast to simulate! methods with a m::MixedModel as the first argument, which modify the model's response and return the entire modified model.

So now we have a way to simulate new data with new parameter values once we have a model. We just need a way to create a model with our preferred design. We’ll use the MixedModelsSim package for that.

1.1 Simulating data from scratch.

The MixedModelsSim package provides a function simdat_crossed for simulating effects from a crossed design:

Code
@doc simdat_crossed
simdat_crossed([RNG], subj_n, item_n;
               subj_btwn=nothing, item_btwn=nothing, both_win=nothing,
               subj_prefix="S", item_prefix="I")

Return a row table with a design specified by the:

  • number of subjects (subj_n),

  • number of items (item_n)

  • between-subject factors (subj_btwn)

  • between-item factors (item_btwn)

  • within-subject/item factors (both_win)

If a factor is both between-subject and between-item, put it in both subj_btwn and item_btwn with the same keys and the same levels.

Factors should be specified as dictionaries in the following format:

Dict(
    :factor1_name => ["F1_level1", "F1_level2"],
    :factor2_name => ["F2_level1", "F2_level2", "F2_level3"]
)

In addition to design, the rowtable contains a field dv pre-populated with N(0,1) noise as a basis for further simulating a design.

Note

The number of subjects/items must divide the number of combinations of between subject/item factor levels. In other words, this function assumes a balanced design and will throw an error if that is not possible.

Let’s see what that looks like in practice. We’ll look at a simple 2 x 2 design with 20 subjects and 20 items. Our first factor age will vary between subjects and have the levels old and young. Our second factor frequency will vary between items and have the levels low and high. Finally, we also need to specify a random number generator to use for seeding the data simulation.

subj_n = 20
item_n = 20
subj_btwn = Dict(:age => ["old", "young"])
item_btwn = Dict(:frequency => ["low", "high"])
const RNG = MersenneTwister(42)
dat = simdat_crossed(RNG, subj_n, item_n;
                     subj_btwn, item_btwn)
Table(dat)
Table with 5 columns and 400 rows:
      subj  age    item  frequency  dv
    ┌─────────────────────────────────────────
 1  │ S01   old    I01   low        1.21028
 2  │ S02   young  I01   low        -0.0789986
 3  │ S03   old    I01   low        0.403512
 4  │ S04   young  I01   low        0.289951
 5  │ S05   old    I01   low        -0.0670422
 6  │ S06   young  I01   low        0.61398
 7  │ S07   old    I01   low        1.01694
 8  │ S08   young  I01   low        -0.053864
 9  │ S09   old    I01   low        -0.570387
 10 │ S10   young  I01   low        -0.857737
 11 │ S11   old    I01   low        -0.211452
 12 │ S12   young  I01   low        -0.853673
 13 │ S13   old    I01   low        0.439568
 14 │ S14   young  I01   low        -1.31491
 15 │ S15   old    I01   low        0.176901
 16 │ S16   young  I01   low        -0.254805
 17 │ S17   old    I01   low        -1.77737
 ⋮  │  ⋮      ⋮     ⋮        ⋮          ⋮

We have 400 rows – 20 subjects x 20 items. Similarly, the experimental factors are expanded out to be fully crossed. Finally, we have a dependent variable dv initialized to be draws from the standard normal distribution \(N(0,1)\).

NoteLatin squares, partial crossing, and continuous covariates

simdat_crossed is designed to simulate a fully crossed factorial design. If you have a partially crossed or Latin squaresdesign, then you could delete the “extra” cells to reduce the fully crossed data here to be partially crossed. For continuous covariates, we need to separately construct the covariates and then use a tabular join to create the design. We’ll examine an example of this later.

simmod = fit(MixedModel,
             @formula(dv ~ 1 + age * frequency +
                          (1 + frequency | subj) +
                          (1 + age | item)), dat; progress)
println(simmod)
Linear mixed model fit by maximum likelihood
 dv ~ 1 + age + frequency + age & frequency + (1 + frequency | subj) + (1 + age | item)
   logLik   -2 logLik     AIC       AICc        BIC    
  -591.8575  1183.7149  1205.7149  1206.3953  1249.6210

Variance components:
             Column      Variance  Std.Dev.   Corr.
subj     (Intercept)     0.0733656 0.2708608
         frequency: low  0.0335313 0.1831157 -1.00
item     (Intercept)     0.0132766 0.1152241
         age: young      0.0005200 0.0228031 -1.00
Residual                 1.0883091 1.0432205
 Number of obs: 400; levels of grouping factors: 20, 20

  Fixed-effects parameters:
─────────────────────────────────────────────────────────────────────
                                   Coef.  Std. Error      z  Pr(>|z|)
─────────────────────────────────────────────────────────────────────
(Intercept)                   0.0816111     0.139812   0.58    0.5594
age: young                   -0.0349801     0.191027  -0.18    0.8547
frequency: low               -0.0638793     0.166657  -0.38    0.7015
age: young & frequency: low   0.00879841    0.224372   0.04    0.9687
─────────────────────────────────────────────────────────────────────

make sure to discuss contrasts here

β = [250.0, -25.0, 10, 0.0]
simulate!(RNG, simmod; β)
Est. SE z p σ_subj σ_item
(Intercept) 0.0816 0.1398 0.58 0.5594 0.2709 0.1152
age: young -0.0350 0.1910 -0.18 0.8547 0.0228
frequency: low -0.0639 0.1667 -0.38 0.7015 0.1831
age: young & frequency: low 0.0088 0.2244 0.04 0.9687
Residual 1.0432
fit!(simmod)
Est. SE z p σ_subj σ_item
(Intercept) 249.9796 0.1267 1973.16 <1e-99 0.1641 0.1703
age: young -25.0472 0.1655 -151.36 <1e-99 0.1041
frequency: low 9.7387 0.1640 59.39 <1e-99 0.0422
age: young & frequency: low 0.1164 0.2106 0.55 0.5804
Residual 1.0225
σ = 25.0
fit!(simulate!(RNG, simmod; β, σ))
Est. SE z p σ_subj σ_item
(Intercept) 256.2394 3.9124 65.49 <1e-99 4.5684 8.6530
age: young -33.8588 4.6177 -7.33 <1e-12 7.5397
frequency: low 5.5490 5.2227 1.06 0.2880 2.8944
age: young & frequency: low 5.1727 5.9978 0.86 0.3884
Residual 23.9420
# relative to σ!
subj_re = create_re(2.0, 1.3)
2×2 LinearAlgebra.LowerTriangular{Float64, LinearAlgebra.Diagonal{Float64, Vector{Float64}}}:
 2.0   ⋅ 
 0.0  1.3
item_re = create_re(1.3, 2.0)
2×2 LinearAlgebra.LowerTriangular{Float64, LinearAlgebra.Diagonal{Float64, Vector{Float64}}}:
 1.3   ⋅ 
 0.0  2.0
θ = createθ(simmod; subj=subj_re, item=item_re)
6-element Vector{Float64}:
 2.0
 0.0
 1.3
 1.3
 0.0
 2.0
fit!(simulate!(RNG, simmod; β, σ, θ))
Est. SE z p σ_subj σ_item
(Intercept) 238.6765 15.4590 15.44 <1e-53 39.9114 27.0075
age: young -21.2541 21.2976 -1.00 0.3183 34.8575
frequency: low -0.3036 16.0265 -0.02 0.9849 31.2202
age: young & frequency: low 16.0378 21.5626 0.74 0.4570
Residual 25.9787
samp = parametricbootstrap(RNG, 1000, simmod; β, σ, θ, progress)
MixedModelBootstrap with 1000 samples
      parameter  min        q25        median      mean        q75       ⋯
    ┌─────────────────────────────────────────────────────────────────────
 1  │ β1         187.504    236.563    250.196     249.917     263.367   ⋯
 2  │ β2         -118.074   -45.386    -24.2367    -25.4062    -6.48009  ⋯
 3  │ β3         -57.7974   -2.0594    8.93123     9.78849     22.555    ⋯
 4  │ β4         -77.082    -17.5927   0.253178    -0.0776182  17.2086   ⋯
 5  │ σ          22.204     24.3262    24.9756     24.9547     25.5221   ⋯
 6  │ σ1         25.6549    42.027     47.4457     47.7116     53.3682   ⋯
 7  │ σ2         13.7514    26.8112    31.2347     30.9942     34.9932   ⋯
 8  │ ρ1         -0.808928  -0.18075   -0.0242876  -0.0201442  0.151477  ⋯
 9  │ σ3         14.4239    27.508     31.2277     31.2821     35.1992   ⋯
 10 │ σ4         17.287     42.1595    47.4245     47.5599     52.6321   ⋯
 11 │ ρ2         -0.696028  -0.203852  -0.0184107  -0.0147826  0.156928  ⋯
 12 │ θ1         1.01087    1.68976    1.90887     1.91418     2.13819   ⋯
 13 │ θ2         -1.19827   -0.227775  -0.0266265  -0.0274248  0.185927  ⋯
 14 │ θ3         0.369048   1.03083    1.20271     1.20325     1.37222   ⋯
 15 │ θ4         0.570767   1.09377    1.24564     1.25551     1.40709   ⋯
 16 │ θ5         -1.32032   -0.386843  -0.0346073  -0.0343756  0.302657  ⋯
 17 │ θ6         0.683923   1.62089    1.84137     1.84508     2.0708    ⋯
ridgeplot(samp)
let f = Figure()
    ax = Axis(f[1, 1])
    coefplot!(ax, samp;
              conf_level=0.8,
              vline_at_zero=true,
              show_intercept=true)
    ridgeplot!(ax, samp;
               conf_level=0.8,
               vline_at_zero=true,
               show_intercept=true,
#               xlabel="Normalized density and 80% range")
    )
    scatter!(ax, β, length(β):-1:1;
             marker=:x,
             markersize=20,
             color=:red)
    f
end
coefpvalues = DataFrame()
# @showprogress 
for subj_n in [20, 40, 60, 80, 100, 120, 140],  item_n in [40, 60, 80, 100, 120, 140]
    dat = simdat_crossed(RNG, subj_n, item_n;
                         subj_btwn, item_btwn)
    simmod = MixedModel(@formula(dv ~ 1 + age * frequency +
                                     (1 + frequency | subj) +
                                     (1 + age | item)),
                        dat)

    θ = createθ(simmod; subj=subj_re, item=item_re)
    simboot = parametricbootstrap(RNG, 100, simmod;
                                  β, σ, θ,
                                  optsum_overrides=(;ftol_rel=1e-8),
                                  progress)
    df = DataFrame(simboot.coefpvalues)
    df[!, :subj_n] .= subj_n
    df[!, :item_n] .= item_n
    append!(coefpvalues, df)
end
power = combine(groupby(coefpvalues, [:coefname, :subj_n, :item_n]),
                :p => (p -> mean(p .< 0.05)) => :power)
168×4 DataFrame
143 rows omitted
Row coefname subj_n item_n power
Symbol Int64 Int64 Float64
1 (Intercept) 20 40 1.0
2 age: young 20 40 0.29
3 frequency: low 20 40 0.07
4 age: young & frequency: low 20 40 0.06
5 (Intercept) 20 60 1.0
6 age: young 20 60 0.21
7 frequency: low 20 60 0.17
8 age: young & frequency: low 20 60 0.04
9 (Intercept) 20 80 1.0
10 age: young 20 80 0.15
11 frequency: low 20 80 0.17
12 age: young & frequency: low 20 80 0.07
13 (Intercept) 20 100 1.0
157 (Intercept) 140 100 1.0
158 age: young 140 100 0.59
159 frequency: low 140 100 0.21
160 age: young & frequency: low 140 100 0.04
161 (Intercept) 140 120 1.0
162 age: young 140 120 0.71
163 frequency: low 140 120 0.35
164 age: young & frequency: low 140 120 0.05
165 (Intercept) 140 140 1.0
166 age: young 140 140 0.62
167 frequency: low 140 140 0.3
168 age: young & frequency: low 140 140 0.03
power = combine(groupby(coefpvalues,
                        [:coefname, :subj_n, :item_n]),
                :p => (p -> mean(p .< 0.05)) => :power,
                :p => (p -> sem(p .< 0.05)) => :power_se)
168×5 DataFrame
143 rows omitted
Row coefname subj_n item_n power power_se
Symbol Int64 Int64 Float64 Float64
1 (Intercept) 20 40 1.0 0.0
2 age: young 20 40 0.29 0.0456048
3 frequency: low 20 40 0.07 0.0256432
4 age: young & frequency: low 20 40 0.06 0.0238683
5 (Intercept) 20 60 1.0 0.0
6 age: young 20 60 0.21 0.040936
7 frequency: low 20 60 0.17 0.0377525
8 age: young & frequency: low 20 60 0.04 0.0196946
9 (Intercept) 20 80 1.0 0.0
10 age: young 20 80 0.15 0.035887
11 frequency: low 20 80 0.17 0.0377525
12 age: young & frequency: low 20 80 0.07 0.0256432
13 (Intercept) 20 100 1.0 0.0
157 (Intercept) 140 100 1.0 0.0
158 age: young 140 100 0.59 0.0494311
159 frequency: low 140 100 0.21 0.040936
160 age: young & frequency: low 140 100 0.04 0.0196946
161 (Intercept) 140 120 1.0 0.0
162 age: young 140 120 0.71 0.0456048
163 frequency: low 140 120 0.35 0.0479372
164 age: young & frequency: low 140 120 0.05 0.0219043
165 (Intercept) 140 140 1.0 0.0
166 age: young 140 140 0.62 0.0487832
167 frequency: low 140 140 0.3 0.0460566
168 age: young & frequency: low 140 140 0.03 0.0171447
select!(power, :coefname, :subj_n, :item_n, :power,
        [:power, :power_se] => ByRow((p, se) -> [p - 1.96*se, p + 1.96*se]) => [:lower, :upper])
168×6 DataFrame
143 rows omitted
Row coefname subj_n item_n power lower upper
Symbol Int64 Int64 Float64 Float64 Float64
1 (Intercept) 20 40 1.0 1.0 1.0
2 age: young 20 40 0.29 0.200615 0.379385
3 frequency: low 20 40 0.07 0.0197392 0.120261
4 age: young & frequency: low 20 40 0.06 0.0132181 0.106782
5 (Intercept) 20 60 1.0 1.0 1.0
6 age: young 20 60 0.21 0.129765 0.290235
7 frequency: low 20 60 0.17 0.0960051 0.243995
8 age: young & frequency: low 20 60 0.04 0.00139851 0.0786015
9 (Intercept) 20 80 1.0 1.0 1.0
10 age: young 20 80 0.15 0.0796614 0.220339
11 frequency: low 20 80 0.17 0.0960051 0.243995
12 age: young & frequency: low 20 80 0.07 0.0197392 0.120261
13 (Intercept) 20 100 1.0 1.0 1.0
157 (Intercept) 140 100 1.0 1.0 1.0
158 age: young 140 100 0.59 0.493115 0.686885
159 frequency: low 140 100 0.21 0.129765 0.290235
160 age: young & frequency: low 140 100 0.04 0.00139851 0.0786015
161 (Intercept) 140 120 1.0 1.0 1.0
162 age: young 140 120 0.71 0.620615 0.799385
163 frequency: low 140 120 0.35 0.256043 0.443957
164 age: young & frequency: low 140 120 0.05 0.00706759 0.0929324
165 (Intercept) 140 140 1.0 1.0 1.0
166 age: young 140 140 0.62 0.524385 0.715615
167 frequency: low 140 140 0.3 0.209729 0.390271
168 age: young & frequency: low 140 140 0.03 -0.00360354 0.0636035
data(power) * mapping(:subj_n, :item_n, :power; layout=:coefname) * visual(Heatmap) |> draw
dat = simdat_crossed(RNG, subj_n, item_n;
                     subj_btwn, item_btwn)
dat = DataFrame(dat)
400×5 DataFrame
375 rows omitted
Row subj age item frequency dv
String String String String Float64
1 S01 old I01 low -0.882085
2 S02 young I01 low -0.0201484
3 S03 old I01 low 0.945484
4 S04 young I01 low -0.247553
5 S05 old I01 low 0.453218
6 S06 young I01 low 0.924281
7 S07 old I01 low 0.220375
8 S08 young I01 low 1.85978
9 S09 old I01 low -0.44618
10 S10 young I01 low -0.754939
11 S11 old I01 low 0.0105376
12 S12 young I01 low 0.929521
13 S13 old I01 low -0.314341
389 S09 old I20 high 2.2703
390 S10 young I20 high -0.00613232
391 S11 old I20 high -0.923031
392 S12 young I20 high 0.539303
393 S13 old I20 high -0.868366
394 S14 young I20 high 2.10268
395 S15 old I20 high -1.23199
396 S16 young I20 high -1.02512
397 S17 old I20 high 0.818806
398 S18 young I20 high 1.7468
399 S19 old I20 high -0.327621
400 S20 young I20 high -0.841135
item_covariates = unique!(select(dat, :item))
20×1 DataFrame
Row item
String
1 I01
2 I02
3 I03
4 I04
5 I05
6 I06
7 I07
8 I08
9 I09
10 I10
11 I11
12 I12
13 I13
14 I14
15 I15
16 I16
17 I17
18 I18
19 I19
20 I20
item_covariates[!, :chaos] = rand(RNG,
                                  Normal(5, 2),
                                  nrow(item_covariates))
20-element Vector{Float64}:
 4.283869606409811
 1.766354546694887
 3.5819077117990936
 4.426082794175044
 3.979161685431401
 4.150057335954468
 5.209549095470591
 5.3931183588041325
 5.836973833266723
 6.042800312594512
 3.1607596007727814
 6.519844952694776
 9.286581793894484
 7.995006071550052
 4.189786264631877
 2.4169512074675787
 3.445736901440534
 5.123574599692554
 4.422113183811228
 1.9739443958799194
leftjoin!(dat, item_covariates; on=:item)
400×6 DataFrame
375 rows omitted
Row subj age item frequency dv chaos
String String String String Float64 Float64?
1 S01 old I01 low -0.882085 4.28387
2 S02 young I01 low -0.0201484 4.28387
3 S03 old I01 low 0.945484 4.28387
4 S04 young I01 low -0.247553 4.28387
5 S05 old I01 low 0.453218 4.28387
6 S06 young I01 low 0.924281 4.28387
7 S07 old I01 low 0.220375 4.28387
8 S08 young I01 low 1.85978 4.28387
9 S09 old I01 low -0.44618 4.28387
10 S10 young I01 low -0.754939 4.28387
11 S11 old I01 low 0.0105376 4.28387
12 S12 young I01 low 0.929521 4.28387
13 S13 old I01 low -0.314341 4.28387
389 S09 old I20 high 2.2703 1.97394
390 S10 young I20 high -0.00613232 1.97394
391 S11 old I20 high -0.923031 1.97394
392 S12 young I20 high 0.539303 1.97394
393 S13 old I20 high -0.868366 1.97394
394 S14 young I20 high 2.10268 1.97394
395 S15 old I20 high -1.23199 1.97394
396 S16 young I20 high -1.02512 1.97394
397 S17 old I20 high 0.818806 1.97394
398 S18 young I20 high 1.7468 1.97394
399 S19 old I20 high -0.327621 1.97394
400 S20 young I20 high -0.841135 1.97394
simmod = fit(MixedModel,
             @formula(dv ~ 1 + age * frequency * chaos +
                          (1 + frequency | subj) +
                          (1 + age | item)), dat; contrasts, progress)
Est. SE z p σ_subj σ_item
(Intercept) 0.0560 0.2412 0.23 0.8164 0.0000 0.0000
age: young -0.3565 0.3411 -1.05 0.2959 0.0000
frequency: low 0.0548 0.3699 0.15 0.8822 0.0000
chaos -0.0135 0.0484 -0.28 0.7809
age: young & frequency: low 0.1030 0.5232 0.20 0.8439
age: young & chaos 0.1036 0.0685 1.51 0.1305
frequency: low & chaos -0.0421 0.0738 -0.57 0.5683
age: young & frequency: low & chaos -0.0317 0.1044 -0.30 0.7617
Residual 0.9456

TODO: continuous covariate TODO: bernoulli response TODO: savereplicates

dat[!, :dv] = rand(RNG, Bernoulli(), nrow(dat))
400-element Vector{Bool}:
 0
 1
 0
 0
 1
 1
 1
 0
 0
 0
 ⋮
 1
 1
 1
 0
 1
 1
 0
 0
 1
dat[!, :dv] = rand(RNG, Poisson(), nrow(dat))
400-element Vector{Int64}:
 2
 1
 2
 0
 0
 1
 0
 2
 0
 1
 ⋮
 0
 2
 0
 1
 1
 3
 0
 1
 1
dat[!, :n] = rand(RNG, Poisson(), nrow(dat)) .+ 3
400-element Vector{Int64}:
 5
 3
 3
 7
 4
 3
 5
 3
 3
 4
 ⋮
 4
 3
 3
 4
 4
 3
 6
 4
 3
dat[!, :dv] = rand.(RNG, Binomial.(dat[!, :n])) ./ dat[!, :n]
400-element Vector{Float64}:
 0.6
 0.3333333333333333
 0.6666666666666666
 0.42857142857142855
 0.5
 0.6666666666666666
 0.2
 0.3333333333333333
 0.6666666666666666
 0.5
 ⋮
 0.5
 0.6666666666666666
 0.3333333333333333
 0.25
 1.0
 1.0
 0.5
 0.75
 1.0

1.2 Exercises

  1. Effect size and power. Re-run the simulation across a range of assumed effect sizes, holding the sample size fixed. Plot estimated power against effect size. What shape do you expect?

Power increases monotonically with effect size, tracing an S-shaped curve from the nominal false-positive rate (when the true effect is zero) toward 1 (for large effects). The interesting region is the middle, where small design changes move power the most.

  1. Sample size. Now hold the effect size fixed and vary the number of subjects. How does required sample size trade off against effect size for a target power (say 0.8)?

For a fixed target power, required sample size grows rapidly as the effect size shrinks (roughly inversely with the square of the effect size). Simulating a grid over both lets you read off feasible design points rather than relying on a single closed-form approximation.


This page was rendered from git revision e563d55 using Quarto 1.9.38 and Julia 1.12.6.

Back to top