Mixed Models Tutorial: Contrast Coding

Author

Reinhold Kliegl

Published

2024-06-27

This script uses a subset of data reported in Fühner et al. (2021).

To circumvent delays associated with model fitting we work with models that are less complex than those in the reference publication. All the data to reproduce the models in the publication are used here, too; the script requires only a few changes to specify the more complex models in the paper.

All children were between 6.0 and 6.99 years at legal keydate (30 September) of school enrollment, that is in their ninth year of life in the third grade. To avoid delays associated with model fitting we work with a reduced data set and less complex models than those in the reference publication. The script requires only a few changes to specify the more complex models in the paper.

The script is structured in three main sections:

  1. Setup with reading and examining the data

  2. Contrasts coding

  1. Other topics

1 Setup

1.1 Packages and functions

Code
using AlgebraOfGraphics
using CairoMakie
using Chain
using CategoricalArrays
using DataFrames
using DataFrameMacros
using MixedModels
using ProgressMeter
using SMLP2023: dataset
using Statistics
using StatsBase

ProgressMeter.ijulia_behavior(:clear);

1.2 Readme for dataset("fggk21")

Number of scores: 525126

  1. Cohort: 9 levels; 2011-2019

  2. School: 515 levels

  3. Child: 108295 levels; all children are between 8.0 and 8.99 years old

  4. Sex: “Girls” (n=55,086), “Boys” (n= 53,209)

  5. age: testdate - middle of month of birthdate

  6. Test: 5 levels

    • Endurance (Run): 6 minute endurance run [m]; to nearest 9m in 9x18m field
    • Coordination (Star_r): star coordination run [m/s]; 9x9m field, 4 x diagonal = 50.912 m
    • Speed(S20_r): 20-meters sprint [m/s]
    • Muscle power low (SLJ): standing long jump [cm]
    • Muscle power up (BPT): 1-kg medicine ball push test [m]
  7. score - see units

1.3 Preprocessing

1.3.1 Read data

tbl = dataset(:fggk21)
Arrow.Table with 525126 rows, 7 columns, and schema:
 :Cohort  String
 :School  String
 :Child   String
 :Sex     String
 :age     Float64
 :Test    String
 :score   Float64
df = DataFrame(tbl)
describe(df)
7×7 DataFrame
Row variable mean min median max nmissing eltype
Symbol Union… Any Union… Any Int64 DataType
1 Cohort 2011 2019 0 String
2 School S100043 S800200 0 String
3 Child C002352 C117966 0 String
4 Sex female male 0 String
5 age 8.56073 7.99452 8.55852 9.10609 0 Float64
6 Test BPT Star_r 0 String
7 score 226.141 1.14152 4.65116 1530.0 0 Float64

1.3.2 Extract a stratified subsample

We extract a random sample of 500 children from the Sex (2) x Test (5) cells of the design. Cohort and School are random.

dat = @chain df begin
  @transform(:Sex = :Sex == "female" ? "Girls" : "Boys")
  @groupby(:Test, :Sex)
  combine(x -> x[sample(1:nrow(x), 500), :])
end
5000×7 DataFrame
4975 rows omitted
Row Test Sex Cohort School Child age score
String String String String String Float64 Float64
1 S20_r Boys 2014 S110905 C026446 8.29021 3.92157
2 S20_r Boys 2012 S103081 C079758 8.75017 5.0
3 S20_r Boys 2018 S104760 C072504 8.69268 5.12821
4 S20_r Boys 2014 S100250 C035297 8.37509 4.25532
5 S20_r Boys 2018 S103561 C090685 8.85421 4.25532
6 S20_r Boys 2012 S112458 C032270 8.33402 4.65116
7 S20_r Boys 2017 S112525 C074350 8.70363 4.54545
8 S20_r Boys 2011 S101187 C014723 8.16975 4.0
9 S20_r Boys 2011 S103408 C023307 8.25188 3.7037
10 S20_r Boys 2013 S100948 C021065 8.24641 4.65116
11 S20_r Boys 2016 S106719 C015767 8.18617 4.0
12 S20_r Boys 2013 S113244 C013538 8.16427 4.34783
13 S20_r Boys 2017 S104980 C079138 8.74743 4.7619
4989 Run Girls 2016 S106434 C042585 8.4271 700.0
4990 Run Girls 2019 S101485 C105087 8.9692 1188.0
4991 Run Girls 2012 S102350 C031604 8.33402 1218.0
4992 Run Girls 2018 S101862 C072002 8.69268 927.0
4993 Run Girls 2018 S111200 C117776 9.10609 1044.0
4994 Run Girls 2011 S102260 C008136 8.08487 900.0
4995 Run Girls 2017 S100213 C096675 8.8898 980.0
4996 Run Girls 2018 S101710 C052735 8.52567 1287.0
4997 Run Girls 2019 S106355 C038195 8.38877 1098.0
4998 Run Girls 2016 S104176 C065900 8.63244 995.0
4999 Run Girls 2017 S106331 C086277 8.80767 1143.0
5000 Run Girls 2015 S102799 C082605 8.77207 864.0

1.3.3 Transformations

transform!(dat, :age, :age => (x -> x .- 8.5) => :a1) # centered age (linear)
select!(groupby(dat, :Test), :, :score => zscore => :zScore) # z-score
5000×9 DataFrame
4975 rows omitted
Row Test Sex Cohort School Child age score a1 zScore
String String String String String Float64 Float64 Float64 Float64
1 S20_r Boys 2014 S110905 C026446 8.29021 3.92157 -0.209788 -1.43408
2 S20_r Boys 2012 S103081 C079758 8.75017 5.0 0.250171 1.1729
3 S20_r Boys 2018 S104760 C072504 8.69268 5.12821 0.192676 1.48282
4 S20_r Boys 2014 S100250 C035297 8.37509 4.25532 -0.124914 -0.627277
5 S20_r Boys 2018 S103561 C090685 8.85421 4.25532 0.354209 -0.627277
6 S20_r Boys 2012 S112458 C032270 8.33402 4.65116 -0.165982 0.32963
7 S20_r Boys 2017 S112525 C074350 8.70363 4.54545 0.203628 0.0740921
8 S20_r Boys 2011 S101187 C014723 8.16975 4.0 -0.330253 -1.24448
9 S20_r Boys 2011 S103408 C023307 8.25188 3.7037 -0.248118 -1.96074
10 S20_r Boys 2013 S100948 C021065 8.24641 4.65116 -0.253593 0.32963
11 S20_r Boys 2016 S106719 C015767 8.18617 4.0 -0.313826 -1.24448
12 S20_r Boys 2013 S113244 C013538 8.16427 4.34783 -0.335729 -0.403652
13 S20_r Boys 2017 S104980 C079138 8.74743 4.7619 0.247433 0.597336
4989 Run Girls 2016 S106434 C042585 8.4271 700.0 -0.0728953 -2.06289
4990 Run Girls 2019 S101485 C105087 8.9692 1188.0 0.469199 1.16633
4991 Run Girls 2012 S102350 C031604 8.33402 1218.0 -0.165982 1.36485
4992 Run Girls 2018 S101862 C072002 8.69268 927.0 0.192676 -0.560773
4993 Run Girls 2018 S111200 C117776 9.10609 1044.0 0.606092 0.213446
4994 Run Girls 2011 S102260 C008136 8.08487 900.0 -0.415127 -0.739439
4995 Run Girls 2017 S100213 C096675 8.8898 980.0 0.389802 -0.210058
4996 Run Girls 2018 S101710 C052735 8.52567 1287.0 0.0256674 1.82144
4997 Run Girls 2019 S106355 C038195 8.38877 1098.0 -0.111225 0.570778
4998 Run Girls 2016 S104176 C065900 8.63244 995.0 0.132444 -0.110799
4999 Run Girls 2017 S106331 C086277 8.80767 1143.0 0.307666 0.868555
5000 Run Girls 2015 S102799 C082605 8.77207 864.0 0.272074 -0.97766
dat2 = combine(
  groupby(dat, [:Test, :Sex]),
  :score => mean,
  :score => std,
  :zScore => mean,
  :zScore => std,
)
10×6 DataFrame
Row Test Sex score_mean score_std zScore_mean zScore_std
String String Float64 Float64 Float64 Float64
1 S20_r Boys 4.58728 0.413559 0.175204 0.999732
2 BPT Boys 3.9588 0.689444 0.305114 0.988995
3 SLJ Boys 129.6 19.861 0.150547 1.05208
4 Star_r Boys 2.08849 0.307899 0.131982 1.08002
5 Run Boys 1052.44 149.696 0.269296 0.990575
6 S20_r Girls 4.44233 0.401287 -0.175204 0.970064
7 BPT Girls 3.5334 0.637901 -0.305114 0.915058
8 SLJ Girls 123.916 17.4015 -0.150547 0.921794
9 Star_r Girls 2.01324 0.255091 -0.131982 0.894789
10 Run Girls 971.048 141.395 -0.269296 0.935646

1.3.4 Figure of age x Sex x Test interactions

The main results of relevance here are shown in Figure 2 of Scientific Reports 11:17566.

2 Contrast coding

Contrast coding is part of StatsModels.jl. Here is the primary author’s (i.e., Dave Kleinschmidt’s) documentation of Modeling Categorical Data.

The random factors Child, School, and Cohort are assigned a Grouping contrast. This contrast is needed when the number of groups (i.e., units, levels) is very large. This is the case for Child (i.e., the 108,925 children in the full and probably also the 11,566 children in the reduced data set). The assignment is not necessary for the typical sample size of experiments. However, we use this coding of random factors irrespective of the number of units associated with them to be transparent about the distinction between random and fixed factors.

A couple of general remarks about the following examples. First, all contrasts defined in this tutorial return an estimate of the Grand Mean (GM) in the intercept, that is they are so-called sum-to-zero contrasts. In both Julia and R the default contrast is Dummy coding which is not a sum-to-zero contrast, but returns the mean of the reference (control) group - unfortunately for (quasi-)experimentally minded scientists.

Second, The factor Sex has only two levels. We use EffectCoding (also known as Sum coding in R) to estimate the difference of the levels from the Grand Mean. Unlike in R, the default sign of the effect is for the second level (base is the first, not the last level), but this can be changed with the base kwarg in the command. Effect coding is a sum-to-zero contrast, but when applied to factors with more than two levels does not yield orthogonal contrasts.

Finally, contrasts for the five levels of the fixed factor Test represent the hypotheses about differences between them. In this tutorial, we use this factor to illustrate various options.

We (initially) include only Test as fixed factor and Child as random factor. More complex LMMs can be specified by simply adding other fixed or random factors to the formula.

2.1 SeqDiffCoding: contr1

SeqDiffCoding was used in the publication. This specification tests pairwise differences between the five neighboring levels of Test, that is:

  • SDC1: 2-1
  • SDC2: 3-2
  • SDC3: 4-3
  • SDC4: 5-4

The levels were sorted such that these contrasts map onto four a priori hypotheses; in other words, they are theoretically motivated pairwise comparisons. The motivation also encompasses theoretically motivated interactions with Sex. The order of levels can also be explicitly specified during contrast construction. This is very useful if levels are in a different order in the dataframe. We recommend the explicit specification to increase transparency of the code.

The statistical disadvantage of SeqDiffCoding is that the contrasts are not orthogonal, that is the contrasts are correlated. This is obvious from the fact that levels 2, 3, and 4 are all used in two contrasts. One consequence of this is that correlation parameters estimated between neighboring contrasts (e.g., 2-1 and 3-2) are difficult to interpret. Usually, they will be negative because assuming some practical limitation on the overall range (e.g., between levels 1 and 3), a small “2-1” effect “correlates” negatively with a larger “3-2” effect for mathematical reasons.

Obviously, the tradeoff between theoretical motivation and statistical purity is something that must be considered carefully when planning the analysis.

contr1 = merge(
  Dict(nm => Grouping() for nm in (:School, :Child, :Cohort)),
  Dict(
    :Sex => EffectsCoding(; levels=["Girls", "Boys"]),
    :Test => SeqDiffCoding(;
      levels=["Run", "Star_r", "S20_r", "SLJ", "BPT"]
    ),
  ),
)
Dict{Symbol, StatsModels.AbstractContrasts} with 5 entries:
  :Child  => Grouping()
  :School => Grouping()
  :Test   => SeqDiffCoding(["Run", "Star_r", "S20_r", "SLJ", "BPT"])
  :Cohort => Grouping()
  :Sex    => EffectsCoding(nothing, ["Girls", "Boys"])
f_ovi_1 = @formula zScore ~ 1 + Test + (1 | Child);
m_ovi_SeqDiff_1 = fit(MixedModel, f_ovi_1, dat; contrasts=contr1)
Minimizing 14    Time: 0:00:00 (18.64 ms/it)
Est. SE z p σ_Child
(Intercept) 0.0022 0.0142 0.16 0.8750 0.6422
Test: Star_r -0.0034 0.0444 -0.08 0.9384
Test: S20_r 0.0058 0.0445 0.13 0.8965
Test: SLJ -0.0042 0.0445 -0.09 0.9252
Test: BPT 0.0016 0.0445 0.04 0.9721
Residual 0.7655

In this case, any differences between tests identified by the contrasts would be spurious because each test was standardized (i.e., M=0, \(SD\)=1). The differences could also be due to an imbalance in the number of boys and girls or in the number of missing observations for each test.

The primary interest in this study related to interactions of the test contrasts with and age and Sex. We start with age (linear) and its interaction with the four test contrasts.

m_ovi_SeqDiff_2 = let
  form = @formula zScore ~ 1 + Test * a1 + (1 | Child)
  fit(MixedModel, form, dat; contrasts=contr1)
end
Est. SE z p σ_Child
(Intercept) -0.0117 0.0145 -0.81 0.4179 0.6372
Test: Star_r -0.0043 0.0452 -0.09 0.9250
Test: S20_r -0.0035 0.0456 -0.08 0.9394
Test: SLJ 0.0103 0.0455 0.23 0.8213
Test: BPT -0.0201 0.0450 -0.45 0.6546
a1 0.2451 0.0490 5.01 <1e-06
Test: Star_r & a1 0.0074 0.1528 0.05 0.9613
Test: S20_r & a1 0.1039 0.1527 0.68 0.4963
Test: SLJ & a1 -0.1772 0.1523 -1.16 0.2445
Test: BPT & a1 0.4825 0.1529 3.15 0.0016
Residual 0.7648

The difference between older and younger childrend is larger for Star_r than for Run (0.2473). S20_r did not differ significantly from Star_r (-0.0377) and SLJ (-0.0113) The largest difference in developmental gain was between BPT and SLJ (0.3355).

Please note that standard errors of this LMM are anti-conservative because the LMM is missing a lot of information in the RES (e..g., contrast-related VCs snd CPs for Child, School, and Cohort.

Next we add the main effect of Sex and its interaction with the four test contrasts.

m_ovi_SeqDiff_3 = let
  form = @formula zScore ~ 1 + Test * (a1 + Sex) + (1 | Child)
  fit(MixedModel, form, dat; contrasts=contr1)
end
Est. SE z p σ_Child
(Intercept) -0.0109 0.0141 -0.77 0.4413 0.6079
Test: Star_r -0.0036 0.0442 -0.08 0.9356
Test: S20_r -0.0077 0.0445 -0.17 0.8633
Test: SLJ 0.0128 0.0444 0.29 0.7735
Test: BPT -0.0187 0.0440 -0.43 0.6701
a1 0.2324 0.0478 4.86 <1e-05
Sex: Boys 0.2065 0.0138 14.91 <1e-49
Test: Star_r & a1 0.0168 0.1493 0.11 0.9106
Test: S20_r & a1 0.1391 0.1492 0.93 0.3513
Test: SLJ & a1 -0.1989 0.1487 -1.34 0.1811
Test: BPT & a1 0.4527 0.1494 3.03 0.0024
Test: Star_r & Sex: Boys -0.1280 0.0432 -2.96 0.0031
Test: S20_r & Sex: Boys 0.0435 0.0433 1.00 0.3159
Test: SLJ & Sex: Boys -0.0276 0.0433 -0.64 0.5243
Test: BPT & Sex: Boys 0.1511 0.0433 3.49 0.0005
Residual 0.7581

The significant interactions with Sex reflect mostly differences related to muscle power, where the physiological constitution gives boys an advantage. The sex difference is smaller when coordination and cognition play a role – as in the Star_r test. (Caveat: SEs are estimated with an underspecified RES.)

The final step in this first series is to add the interactions between the three covariates. A significant interaction between any of the four Test contrasts and age (linear) x Sex was hypothesized to reflect a prepubertal signal (i.e., hormones start to rise in girls’ ninth year of life). However, this hypothesis is linked to a specific shape of the interaction: Girls would need to gain more than boys in tests of muscular power.

f_ovi = @formula zScore ~ 1 + Test * a1 * Sex + (1 | Child)
m_ovi_SeqDiff = fit(MixedModel, f_ovi, dat; contrasts=contr1)
Est. SE z p σ_Child
(Intercept) -0.0107 0.0141 -0.76 0.4488 0.6087
Test: Star_r -0.0029 0.0442 -0.07 0.9468
Test: S20_r -0.0054 0.0445 -0.12 0.9029
Test: SLJ 0.0111 0.0444 0.25 0.8018
Test: BPT -0.0171 0.0440 -0.39 0.6969
a1 0.2304 0.0478 4.82 <1e-05
Sex: Boys 0.2030 0.0141 14.35 <1e-45
Test: Star_r & a1 0.0215 0.1493 0.14 0.8854
Test: S20_r & a1 0.1325 0.1492 0.89 0.3748
Test: SLJ & a1 -0.1964 0.1487 -1.32 0.1866
Test: BPT & a1 0.4483 0.1495 3.00 0.0027
Test: Star_r & Sex: Boys -0.1207 0.0442 -2.73 0.0063
Test: S20_r & Sex: Boys 0.0365 0.0445 0.82 0.4129
Test: SLJ & Sex: Boys -0.0193 0.0444 -0.43 0.6647
Test: BPT & Sex: Boys 0.1586 0.0440 3.61 0.0003
a1 & Sex: Boys 0.0464 0.0478 0.97 0.3321
Test: Star_r & a1 & Sex: Boys -0.1271 0.1493 -0.85 0.3946
Test: S20_r & a1 & Sex: Boys 0.0913 0.1492 0.61 0.5406
Test: SLJ & a1 & Sex: Boys -0.1081 0.1487 -0.73 0.4672
Test: BPT & a1 & Sex: Boys -0.1543 0.1495 -1.03 0.3018
Residual 0.7568

The results are very clear: Despite an abundance of statistical power there is no evidence for the differences between boys and girls in how much they gain in the ninth year of life in these five tests. The authors argue that, in this case, absence of evidence looks very much like evidence of absence of a hypothesized interaction.

In the next two sections we use different contrasts. Does this have a bearing on this result? We still ignore for now that we are looking at anti-conservative test statistics.

2.2 HelmertCoding: contr2

The second set of contrasts uses HelmertCoding. Helmert coding codes each level as the difference from the average of the lower levels. With the default order of Test levels we get the following test statistics which we describe in reverse order of appearance in model output

  • HeC4: 5 - mean(1,2,3,4)
  • HeC3: 4 - mean(1,2,3)
  • HeC2: 3 - mean(1,2)
  • HeC1: 2 - 1

In the model output, HeC1 will be reported first and HeC4 last.

There is some justification for the HeC4 specification in a post-hoc manner because the fifth test (BPT) turned out to be different from the other four tests in that high performance is most likely not only related to physical fitness, but also to overweight/obesity, that is for a subset of children high scores on this test might be indicative of physical unfitness. A priori the SDC4 contrast 5-4 between BPT (5) and SLJ (4) was motivated because conceptually both are tests of the physical fitness component Muscular Power, BPT for upper limbs and SLJ for lower limbs, respectively.

One could argue that there is justification for HeC3 because Run (1), Star_r (2), and S20 (3) involve running but SLJ (4) does not. Sports scientists, however, recoil. For them it does not make much sense to average the different running tests, because they draw on completely different physiological resources; it is a variant of the old apples-and-oranges problem.

The justification for HeC3 is thatRun (1) and Star_r (2) draw more strongly on cardiosrespiratory Endurance than S20 (3) due to the longer duration of the runs compared to sprinting for 20 m which is a pure measure of the physical-fitness component Speed. Again, sports scientists are not very happy with this proposal.

Finally, HeC1 contrasts the fitness components Endurance, indicated best by Run (1), and Coordination, indicated by Star_r (2). Endurance (i.e., running for 6 minutes) is considered to be the best indicator of health-related status among the five tests because it is a rather pure measure of cardiorespiratory fitness. The Star_r test requires execution of a pre-instructed sequence of forward, sideways, and backward runs. This coordination of body movements implies a demand on working memory (i.e., remembering the order of these subruns) and executive control processes, but performats also depends on endurance. HeC1 yields a measure of Coordination “corrected” for the contribution of Endurance.

The statistical advantage of HelmertCoding is that the resulting contrasts are orthogonal (uncorrelated). This allows for optimal partitioning of variance and statistical power. It is also more efficient to estimate “orthogonal” than “non-orthogonal” random-effect structures.

contr2 = Dict(
  :School => Grouping(),
  :Child => Grouping(),
  :Cohort => Grouping(),
  :Sex => EffectsCoding(; levels=["Girls", "Boys"]),
  :Test => HelmertCoding(;
    levels=["Run", "Star_r", "S20_r", "SLJ", "BPT"],
  ),
);
m_ovi_Helmert = fit(MixedModel, f_ovi, dat; contrasts=contr2)
Est. SE z p σ_Child
(Intercept) -0.0107 0.0141 -0.76 0.4488 0.6087
Test: Star_r -0.0015 0.0221 -0.07 0.9468
Test: S20_r -0.0023 0.0129 -0.18 0.8581
Test: SLJ 0.0016 0.0090 0.18 0.8558
Test: BPT -0.0024 0.0069 -0.35 0.7251
a1 0.2304 0.0478 4.82 <1e-05
Sex: Boys 0.2030 0.0141 14.35 <1e-45
Test: Star_r & a1 0.0108 0.0747 0.14 0.8854
Test: S20_r & a1 0.0477 0.0431 1.11 0.2684
Test: SLJ & a1 -0.0252 0.0303 -0.83 0.4047
Test: BPT & a1 0.0745 0.0238 3.13 0.0017
Test: Star_r & Sex: Boys -0.0604 0.0221 -2.73 0.0063
Test: S20_r & Sex: Boys -0.0080 0.0129 -0.62 0.5361
Test: SLJ & Sex: Boys -0.0088 0.0090 -0.98 0.3287
Test: BPT & Sex: Boys 0.0264 0.0069 3.81 0.0001
a1 & Sex: Boys 0.0464 0.0478 0.97 0.3321
Test: Star_r & a1 & Sex: Boys -0.0635 0.0747 -0.85 0.3946
Test: S20_r & a1 & Sex: Boys 0.0093 0.0431 0.21 0.8301
Test: SLJ & a1 & Sex: Boys -0.0224 0.0303 -0.74 0.4594
Test: BPT & a1 & Sex: Boys -0.0443 0.0238 -1.86 0.0625
Residual 0.7568

We forego a detailed discussion of the effects, but note that again none of the interactions between age x Sex with the four test contrasts was significant.

The default labeling of Helmert contrasts may lead to confusions with other contrasts. Therefore, we could provide our own labels:

labels=["c2.1", "c3.12", "c4.123", "c5.1234"]

Once the order of levels is memorized the proposed labelling is very transparent.

2.3 HypothesisCoding: contr3

The third set of contrasts uses HypothesisCoding. Hypothesis coding allows the user to specify their own a priori contrast matrix, subject to the mathematical constraint that the matrix has full rank. For example, sport scientists agree that the first four tests can be contrasted with BPT, because the difference is akin to a correction of overall physical fitness. However, they want to keep the pairwise comparisons for the first four tests.

  • HyC1: BPT - mean(1,2,3,4)
  • HyC2: Star_r - Run_r
  • HyC3: Run_r - S20_r
  • HyC4: S20_r - SLJ
contr3 = Dict(
  :School => Grouping(),
  :Child => Grouping(),
  :Cohort => Grouping(),
  :Sex => EffectsCoding(; levels=["Girls", "Boys"]),
  :Test => HypothesisCoding(
    [
      -1 -1 -1 -1 +4
      -1 +1 0 0 0
       0 -1 +1 0 0
       0 0 -1 +1 0
    ];
    levels=["Run", "Star_r", "S20_r", "SLJ", "BPT"],
    labels=["BPT-other", "Star-End", "S20-Star", "SLJ-S20"],
  ),
);
m_ovi_Hypo = fit(MixedModel, f_ovi, dat; contrasts=contr3)
Est. SE z p σ_Child
(Intercept) -0.0107 0.0141 -0.76 0.4488 0.6087
Test: BPT-other -0.0488 0.1389 -0.35 0.7251
Test: Star-End -0.0029 0.0442 -0.07 0.9468
Test: S20-Star -0.0054 0.0445 -0.12 0.9029
Test: SLJ-S20 0.0111 0.0444 0.25 0.8018
a1 0.2304 0.0478 4.82 <1e-05
Sex: Boys 0.2030 0.0141 14.35 <1e-45
Test: BPT-other & a1 1.4902 0.4757 3.13 0.0017
Test: Star-End & a1 0.0215 0.1493 0.14 0.8854
Test: S20-Star & a1 0.1325 0.1492 0.89 0.3748
Test: SLJ-S20 & a1 -0.1964 0.1487 -1.32 0.1866
Test: BPT-other & Sex: Boys 0.5287 0.1389 3.81 0.0001
Test: Star-End & Sex: Boys -0.1207 0.0442 -2.73 0.0063
Test: S20-Star & Sex: Boys 0.0365 0.0445 0.82 0.4129
Test: SLJ-S20 & Sex: Boys -0.0193 0.0444 -0.43 0.6647
a1 & Sex: Boys 0.0464 0.0478 0.97 0.3321
Test: BPT-other & a1 & Sex: Boys -0.8862 0.4757 -1.86 0.0625
Test: Star-End & a1 & Sex: Boys -0.1271 0.1493 -0.85 0.3946
Test: S20-Star & a1 & Sex: Boys 0.0913 0.1492 0.61 0.5406
Test: SLJ-S20 & a1 & Sex: Boys -0.1081 0.1487 -0.73 0.4672
Residual 0.7568

With HypothesisCoding we must generate our own labels for the contrasts. The default labeling of contrasts is usually not interpretable. Therefore, we provide our own.

Anyway, none of the interactions between age x Sex with the four Test contrasts was significant for these contrasts.

contr1b = Dict(
  :School => Grouping(),
  :Child => Grouping(),
  :Cohort => Grouping(),
  :Sex => EffectsCoding(; levels=["Girls", "Boys"]),
  :Test => HypothesisCoding(
    [
      -1 +1 0 0 0
      0 -1 +1 0 0
      0 0 -1 +1 0
      0 0 0 -1 +1
    ];
    levels=["Run", "Star_r", "S20_r", "SLJ", "BPT"],
    labels=["Star-Run", "S20-Star", "SLJ-S20", "BPT-SLJ"],
  ),
);
m_ovi_SeqDiff_v2 = fit(MixedModel, f_ovi, dat; contrasts=contr1b)
Est. SE z p σ_Child
(Intercept) -0.0107 0.0141 -0.76 0.4488 0.6087
Test: Star-Run -0.0029 0.0442 -0.07 0.9468
Test: S20-Star -0.0054 0.0445 -0.12 0.9029
Test: SLJ-S20 0.0111 0.0444 0.25 0.8018
Test: BPT-SLJ -0.0171 0.0440 -0.39 0.6969
a1 0.2304 0.0478 4.82 <1e-05
Sex: Boys 0.2030 0.0141 14.35 <1e-45
Test: Star-Run & a1 0.0215 0.1493 0.14 0.8854
Test: S20-Star & a1 0.1325 0.1492 0.89 0.3748
Test: SLJ-S20 & a1 -0.1964 0.1487 -1.32 0.1866
Test: BPT-SLJ & a1 0.4483 0.1495 3.00 0.0027
Test: Star-Run & Sex: Boys -0.1207 0.0442 -2.73 0.0063
Test: S20-Star & Sex: Boys 0.0365 0.0445 0.82 0.4129
Test: SLJ-S20 & Sex: Boys -0.0193 0.0444 -0.43 0.6647
Test: BPT-SLJ & Sex: Boys 0.1586 0.0440 3.61 0.0003
a1 & Sex: Boys 0.0464 0.0478 0.97 0.3321
Test: Star-Run & a1 & Sex: Boys -0.1271 0.1493 -0.85 0.3946
Test: S20-Star & a1 & Sex: Boys 0.0913 0.1492 0.61 0.5406
Test: SLJ-S20 & a1 & Sex: Boys -0.1081 0.1487 -0.73 0.4672
Test: BPT-SLJ & a1 & Sex: Boys -0.1543 0.1495 -1.03 0.3018
Residual 0.7568
m_zcp_SeqD = let
  form = @formula(
    zScore ~ 1 + Test * a1 * Sex + zerocorr(1 + Test | Child)
  )
  fit(MixedModel, form, dat; contrasts=contr1b)
end
Minimizing 116    Time: 0:00:00 ( 6.30 ms/it)
  objective:  13873.949546391092
Est. SE z p σ_Child
(Intercept) -0.0110 0.0141 -0.78 0.4376 0.6111
Test: Star-Run -0.0025 0.0445 -0.06 0.9556 0.0000
Test: S20-Star -0.0059 0.0444 -0.13 0.8940 0.4768
Test: SLJ-S20 0.0121 0.0441 0.27 0.7838 0.3355
Test: BPT-SLJ -0.0186 0.0438 -0.42 0.6712 0.0000
a1 0.2312 0.0478 4.84 <1e-05
Sex: Boys 0.2032 0.0141 14.37 <1e-46
Test: Star-Run & a1 0.0310 0.1506 0.21 0.8368
Test: S20-Star & a1 0.1248 0.1490 0.84 0.4020
Test: SLJ-S20 & a1 -0.1969 0.1475 -1.33 0.1819
Test: BPT-SLJ & a1 0.4525 0.1491 3.04 0.0024
Test: Star-Run & Sex: Boys -0.1195 0.0445 -2.68 0.0073
Test: S20-Star & Sex: Boys 0.0351 0.0444 0.79 0.4295
Test: SLJ-S20 & Sex: Boys -0.0197 0.0441 -0.45 0.6545
Test: BPT-SLJ & Sex: Boys 0.1589 0.0438 3.62 0.0003
a1 & Sex: Boys 0.0463 0.0478 0.97 0.3330
Test: Star-Run & a1 & Sex: Boys -0.1286 0.1506 -0.85 0.3930
Test: S20-Star & a1 & Sex: Boys 0.0901 0.1490 0.60 0.5455
Test: SLJ-S20 & a1 & Sex: Boys -0.1082 0.1475 -0.73 0.4633
Test: BPT-SLJ & a1 & Sex: Boys -0.1591 0.1491 -1.07 0.2858
Residual 0.6988
m_zcp_SeqD_2 = let
  form = @formula(
    zScore ~ 1 + Test * a1 * Sex + (0 + Test | Child)
  )
  fit(MixedModel, form, dat; contrasts=contr1b)
end
Minimizing 3140    Time: 0:00:25 ( 8.04 ms/it)
Est. SE z p σ_Child
(Intercept) -0.0127 0.0141 -0.90 0.3665
Test: Star-Run -0.0024 0.0441 -0.06 0.9558
Test: S20-Star -0.0123 0.0448 -0.28 0.7827
Test: SLJ-S20 0.0187 0.0453 0.41 0.6792
Test: BPT-SLJ -0.0126 0.0439 -0.29 0.7738
a1 0.2310 0.0476 4.85 <1e-05
Sex: Boys 0.2051 0.0141 14.57 <1e-47
Test: Star-Run & a1 0.0401 0.1490 0.27 0.7880
Test: S20-Star & a1 0.1273 0.1501 0.85 0.3962
Test: SLJ-S20 & a1 -0.1968 0.1517 -1.30 0.1945
Test: BPT-SLJ & a1 0.4479 0.1493 3.00 0.0027
Test: Star-Run & Sex: Boys -0.1245 0.0441 -2.83 0.0047
Test: S20-Star & Sex: Boys 0.0373 0.0448 0.83 0.4052
Test: SLJ-S20 & Sex: Boys -0.0199 0.0453 -0.44 0.6598
Test: BPT-SLJ & Sex: Boys 0.1559 0.0439 3.55 0.0004
a1 & Sex: Boys 0.0510 0.0476 1.07 0.2841
Test: Star-Run & a1 & Sex: Boys -0.1409 0.1490 -0.95 0.3445
Test: S20-Star & a1 & Sex: Boys 0.0953 0.1501 0.63 0.5255
Test: SLJ-S20 & a1 & Sex: Boys -0.1127 0.1517 -0.74 0.4575
Test: BPT-SLJ & a1 & Sex: Boys -0.1457 0.1493 -0.98 0.3290
Test: BPT 0.9343
Test: SLJ 1.0006
Test: Star_r 0.9722
Test: Run 0.9611
Test: S20_r 0.9785
Residual 0.0000
m_cpx_0_SeqDiff = let
  f_cpx_0 = @formula(
    zScore ~ 1 + Test * a1 * Sex + (0 + Test | Child)
  )
  fit(MixedModel, f_cpx_0, dat; contrasts=contr1b)
end
Minimizing 3140    Time: 0:00:26 ( 8.44 ms/it)
Est. SE z p σ_Child
(Intercept) -0.0127 0.0141 -0.90 0.3665
Test: Star-Run -0.0024 0.0441 -0.06 0.9558
Test: S20-Star -0.0123 0.0448 -0.28 0.7827
Test: SLJ-S20 0.0187 0.0453 0.41 0.6792
Test: BPT-SLJ -0.0126 0.0439 -0.29 0.7738
a1 0.2310 0.0476 4.85 <1e-05
Sex: Boys 0.2051 0.0141 14.57 <1e-47
Test: Star-Run & a1 0.0401 0.1490 0.27 0.7880
Test: S20-Star & a1 0.1273 0.1501 0.85 0.3962
Test: SLJ-S20 & a1 -0.1968 0.1517 -1.30 0.1945
Test: BPT-SLJ & a1 0.4479 0.1493 3.00 0.0027
Test: Star-Run & Sex: Boys -0.1245 0.0441 -2.83 0.0047
Test: S20-Star & Sex: Boys 0.0373 0.0448 0.83 0.4052
Test: SLJ-S20 & Sex: Boys -0.0199 0.0453 -0.44 0.6598
Test: BPT-SLJ & Sex: Boys 0.1559 0.0439 3.55 0.0004
a1 & Sex: Boys 0.0510 0.0476 1.07 0.2841
Test: Star-Run & a1 & Sex: Boys -0.1409 0.1490 -0.95 0.3445
Test: S20-Star & a1 & Sex: Boys 0.0953 0.1501 0.63 0.5255
Test: SLJ-S20 & a1 & Sex: Boys -0.1127 0.1517 -0.74 0.4575
Test: BPT-SLJ & a1 & Sex: Boys -0.1457 0.1493 -0.98 0.3290
Test: BPT 0.9343
Test: SLJ 1.0006
Test: Star_r 0.9722
Test: Run 0.9611
Test: S20_r 0.9785
Residual 0.0000
VarCorr(m_cpx_0_SeqDiff)
Column Variance Std.Dev Corr.
Child Test: Run 0.92380604 0.96114829
Test: Star_r 0.94508363 0.97215412 +0.30
Test: S20_r 0.95749430 0.97851638 -0.63 +0.33
Test: SLJ 1.00116952 1.00058459 +0.35 +0.34 +0.41
Test: BPT 0.87300628 0.93434805 +0.02 +0.59 +0.38 +0.18
Residual 0.00000000 0.00000482
m_cpx_0_SeqDiff.PCA
(Child = 
Principal components based on correlation matrix
 Test: Run      1.0     .      .      .      .
 Test: Star_r   0.3    1.0     .      .      .
 Test: S20_r   -0.63   0.33   1.0     .      .
 Test: SLJ      0.35   0.34   0.41   1.0     .
 Test: BPT      0.02   0.59   0.38   0.18   1.0

Normalized cumulative variances:
[0.4251, 0.7483, 0.9265, 0.9999, 1.0]

Component loadings
                 PC1    PC2    PC3    PC4    PC5
 Test: Run     -0.02   0.79  -0.04   0.06  -0.62
 Test: Star_r  -0.55   0.24   0.29  -0.7    0.24
 Test: S20_r   -0.48  -0.52  -0.25  -0.12  -0.64
 Test: SLJ     -0.43   0.23  -0.74   0.24   0.38
 Test: BPT     -0.52   0.0    0.54   0.66   0.05,)
f_cpx_1 = @formula(
  zScore ~ 1 + Test * a1 * Sex + (1 + Test | Child)
)
m_cpx_1_SeqDiff =
fit(MixedModel, f_cpx_1, dat; contrasts=contr1b)
Minimizing 3296    Time: 0:00:19 ( 5.89 ms/it)
Est. SE z p σ_Child
(Intercept) -0.0112 0.0141 -0.79 0.4290 0.6566
Test: Star-Run -0.0054 0.0440 -0.12 0.9021 1.0016
Test: S20-Star -0.0083 0.0448 -0.18 0.8538 0.8706
Test: SLJ-S20 0.0137 0.0452 0.30 0.7624 0.8767
Test: BPT-SLJ -0.0142 0.0439 -0.32 0.7468 1.2302
a1 0.2325 0.0477 4.87 <1e-05
Sex: Boys 0.2049 0.0141 14.50 <1e-46
Test: Star-Run & a1 0.0338 0.1488 0.23 0.8206
Test: S20-Star & a1 0.1387 0.1502 0.92 0.3557
Test: SLJ-S20 & a1 -0.2013 0.1514 -1.33 0.1838
Test: BPT-SLJ & a1 0.4480 0.1493 3.00 0.0027
Test: Star-Run & Sex: Boys -0.1221 0.0440 -2.77 0.0055
Test: S20-Star & Sex: Boys 0.0338 0.0448 0.75 0.4504
Test: SLJ-S20 & Sex: Boys -0.0179 0.0452 -0.40 0.6917
Test: BPT-SLJ & Sex: Boys 0.1560 0.0439 3.55 0.0004
a1 & Sex: Boys 0.0497 0.0477 1.04 0.2978
Test: Star-Run & a1 & Sex: Boys -0.1332 0.1488 -0.89 0.3709
Test: S20-Star & a1 & Sex: Boys 0.0909 0.1502 0.61 0.5449
Test: SLJ-S20 & a1 & Sex: Boys -0.1048 0.1514 -0.69 0.4887
Test: BPT-SLJ & a1 & Sex: Boys -0.1511 0.1493 -1.01 0.3116
Residual 0.0000
m_cpx_1_SeqDiff.PCA
(Child = 
Principal components based on correlation matrix
 (Intercept)      1.0     .      .      .      .
 Test: Star-Run   0.45   1.0     .      .      .
 Test: S20-Star  -0.21   0.21   1.0     .      .
 Test: SLJ-S20    0.14  -0.71  -0.36   1.0     .
 Test: BPT-SLJ   -0.29   0.48  -0.23  -0.47   1.0

Normalized cumulative variances:
[0.4323, 0.7042, 0.9473, 0.9991, 1.0]

Component loadings
                   PC1    PC2    PC3    PC4    PC5
 (Intercept)     -0.01   0.85  -0.14  -0.07   0.51
 Test: Star-Run  -0.61   0.37  -0.07  -0.2   -0.67
 Test: S20-Star  -0.21  -0.31  -0.77  -0.46   0.24
 Test: SLJ-S20    0.62   0.14   0.11  -0.71  -0.28
 Test: BPT-SLJ   -0.45  -0.18   0.61  -0.49   0.39,)

2.4 PCA-based HypothesisCoding: contr4

The fourth set of contrasts uses HypothesisCoding to specify the set of contrasts implementing the loadings of the four principle components of the published LMM based on test scores, not test effects (contrasts) - coarse-grained, that is roughly according to their signs. This is actually a very interesting and plausible solution nobody had proposed a priori.

  • PC1: BPT - Run_r
  • PC2: (Star_r + S20_r + SLJ) - (BPT + Run_r)
  • PC3: Star_r - (S20_r + SLJ)
  • PC4: S20_r - SLJ

PC1 contrasts the worst and the best indicator of physical health; PC2 contrasts these two against the core indicators of physical fitness; PC3 contrasts the cognitive and the physical tests within the narrow set of physical fitness components; and PC4, finally, contrasts two types of lower muscular fitness differing in speed and power.

contr4 = Dict(
  :School => Grouping(),
  :Child => Grouping(),
  :Cohort => Grouping(),
  :Sex => EffectsCoding(; levels=["Girls", "Boys"]),
  :Test => HypothesisCoding(
    [
      -1 0 0 0 +1
      -3 +2 +2 +2 -3
      0 +2 -1 -1 0
      0 0 +1 -1 0
    ];
    levels=["Run", "Star_r", "S20_r", "SLJ", "BPT"],
    labels=["c5.1", "c234.15", "c2.34", "c3.4"],
  ),
);
m_cpx_1_PC = fit(MixedModel, f_cpx_1, dat; contrasts=contr4)
Minimizing 1534    Time: 0:00:12 ( 8.30 ms/it)
Est. SE z p σ_Child
(Intercept) -0.0105 0.0141 -0.74 0.4594 0.6430
Test: c5.1 -0.0113 0.0429 -0.26 0.7920 1.4300
Test: c234.15 0.0073 0.1683 0.04 0.9656 0.9880
Test: c2.34 -0.0117 0.0781 -0.15 0.8813 1.3538
Test: c3.4 -0.0113 0.0452 -0.25 0.8033 1.6103
a1 0.2378 0.0478 4.97 <1e-06
Sex: Boys 0.2043 0.0141 14.44 <1e-46
Test: c5.1 & a1 0.4108 0.1465 2.80 0.0050
Test: c234.15 & a1 -0.8395 0.5708 -1.47 0.1414
Test: c2.34 & a1 -0.0799 0.2623 -0.30 0.7607
Test: c3.4 & a1 0.2216 0.1515 1.46 0.1437
Test: c5.1 & Sex: Boys 0.0542 0.0429 1.27 0.2059
Test: c234.15 & Sex: Boys -0.7856 0.1683 -4.67 <1e-05
Test: c2.34 & Sex: Boys -0.0471 0.0781 -0.60 0.5465
Test: c3.4 & Sex: Boys 0.0161 0.0452 0.36 0.7224
a1 & Sex: Boys 0.0487 0.0478 1.02 0.3081
Test: c5.1 & a1 & Sex: Boys -0.2955 0.1465 -2.02 0.0436
Test: c234.15 & a1 & Sex: Boys 0.3089 0.5708 0.54 0.5884
Test: c2.34 & a1 & Sex: Boys -0.0625 0.2623 -0.24 0.8117
Test: c3.4 & a1 & Sex: Boys 0.0929 0.1515 0.61 0.5400
Residual 0.0000
VarCorr(m_cpx_1_PC)
Column Variance Std.Dev Corr.
Child (Intercept) 0.41338814 0.64295267
Test: c5.1 2.04484693 1.42998144 -0.14
Test: c234.15 0.97608251 0.98796888 +0.33 -0.72
Test: c2.34 1.83288851 1.35384213 +0.60 +0.50 -0.12
Test: c3.4 2.59301071 1.61028281 -0.11 -0.16 +0.70 -0.10
Residual 0.00000000 0.00001636
m_cpx_1_PC.PCA
(Child = 
Principal components based on correlation matrix
 (Intercept)     1.0     .      .      .     .
 Test: c5.1     -0.14   1.0     .      .     .
 Test: c234.15   0.33  -0.72   1.0     .     .
 Test: c2.34     0.6    0.5   -0.12   1.0    .
 Test: c3.4     -0.11  -0.16   0.7   -0.1   1.0

Normalized cumulative variances:
[0.4454, 0.7738, 0.9693, 0.9991, 1.0]

Component loadings
                  PC1   PC2    PC3    PC4    PC5
 (Intercept)    -0.05  0.71  -0.33   0.6    0.16
 Test: c5.1      0.56  0.05   0.54   0.41  -0.48
 Test: c234.15  -0.63  0.26   0.08  -0.12  -0.72
 Test: c2.34     0.3   0.65   0.23  -0.66   0.1
 Test: c3.4     -0.45  0.06   0.74   0.18   0.46,)

There is a numerical interaction with a z-value > 2.0 for the first PCA (i.e., BPT - Run_r). This interaction would really need to be replicated to be taken seriously. It is probably due to larger “unfitness” gains in boys than girls (i.e., in BPT) relative to the slightly larger health-related “fitness” gains of girls than boys (i.e., in Run_r).

contr4b = merge(
  Dict(nm => Grouping() for nm in (:School, :Child, :Cohort)),
  Dict(
    :Sex => EffectsCoding(; levels=["Girls", "Boys"]),
    :Test => HypothesisCoding(
      [
        0.49 -0.04 0.20 0.03 -0.85
        0.70 -0.56 -0.21 -0.13 0.37
        0.31 0.68 -0.56 -0.35 0.00
        0.04 0.08 0.61 -0.78 0.13
      ];
      levels=["Run", "Star_r", "S20_r", "SLJ", "BPT"],
      labels=["c5.1", "c234.15", "c12.34", "c3.4"],
    ),
  ),
);
m_cpx_1_PC_2 = fit(MixedModel, f_cpx_1, dat; contrasts=contr4b)
Minimizing 2384    Time: 0:00:16 ( 7.05 ms/it)
Est. SE z p σ_Child
(Intercept) -0.0124 0.0143 -0.87 0.3861 0.5488
Test: c5.1 0.0063 0.0302 0.21 0.8361 0.6452
Test: c234.15 0.0026 0.0316 0.08 0.9332 0.8329
Test: c12.34 0.0041 0.0320 0.13 0.8983 1.1700
Test: c3.4 -0.0136 0.0316 -0.43 0.6666 0.8093
a1 0.2188 0.0481 4.55 <1e-05
Sex: Boys 0.1967 0.0143 13.79 <1e-42
Test: c5.1 & a1 -0.2874 0.1033 -2.78 0.0054
Test: c234.15 & a1 0.0745 0.1068 0.70 0.4858
Test: c12.34 & a1 -0.0613 0.1075 -0.57 0.5686
Test: c3.4 & a1 0.1842 0.1061 1.74 0.0826
Test: c5.1 & Sex: Boys -0.0730 0.0302 -2.41 0.0158
Test: c234.15 & Sex: Boys 0.1316 0.0316 4.16 <1e-04
Test: c12.34 & Sex: Boys 0.0064 0.0320 0.20 0.8416
Test: c3.4 & Sex: Boys 0.0305 0.0316 0.96 0.3353
a1 & Sex: Boys 0.0502 0.0481 1.04 0.2963
Test: c5.1 & a1 & Sex: Boys 0.2192 0.1033 2.12 0.0338
Test: c234.15 & a1 & Sex: Boys 0.0228 0.1068 0.21 0.8309
Test: c12.34 & a1 & Sex: Boys -0.0071 0.1075 -0.07 0.9471
Test: c3.4 & a1 & Sex: Boys 0.0462 0.1061 0.44 0.6635
Residual 0.0000
VarCorr(m_cpx_1_PC_2)
Column Variance Std.Dev Corr.
Child (Intercept) 0.30118044 0.54879909
Test: c5.1 0.41627796 0.64519606 -0.53
Test: c234.15 0.69367490 0.83287148 +0.15 +0.54
Test: c12.34 1.36897619 1.17003256 +0.02 -0.33 +0.12
Test: c3.4 0.65494448 0.80928640 -0.37 -0.17 -0.23 -0.22
Residual 0.00000000 0.00000798
m_cpx_1_PC_2.PCA
(Child = 
Principal components based on correlation matrix
 (Intercept)     1.0     .      .      .      .
 Test: c5.1     -0.53   1.0     .      .      .
 Test: c234.15   0.15   0.54   1.0     .      .
 Test: c12.34    0.02  -0.33   0.12   1.0     .
 Test: c3.4     -0.37  -0.17  -0.23  -0.22   1.0

Normalized cumulative variances:
[0.3479, 0.6644, 0.8616, 0.9971, 1.0]

Component loadings
                  PC1    PC2    PC3    PC4    PC5
 (Intercept)    -0.47  -0.43   0.52  -0.25  -0.5
 Test: c5.1      0.74  -0.14   0.02   0.16  -0.64
 Test: c234.15   0.38  -0.55  -0.02  -0.62   0.4
 Test: c12.34   -0.29  -0.29  -0.85  -0.1   -0.31
 Test: c3.4      0.04   0.64  -0.05  -0.72  -0.27,)
f_zcp_1 = @formula(zScore ~ 1 + Test*a1*Sex + zerocorr(1 + Test | Child))
m_zcp_1_PC_2 = fit(MixedModel, f_zcp_1, dat; contrasts=contr4b)
Minimizing 495    Time: 0:00:02 ( 4.39 ms/it)
Est. SE z p σ_Child
(Intercept) -0.0106 0.0144 -0.73 0.4624 0.6597
Test: c5.1 0.0086 0.0292 0.29 0.7683 0.5881
Test: c234.15 -0.0017 0.0315 -0.05 0.9566 0.7884
Test: c12.34 0.0021 0.0327 0.06 0.9495 0.9596
Test: c3.4 -0.0093 0.0322 -0.29 0.7718 0.8222
a1 0.2167 0.0486 4.46 <1e-05
Sex: Boys 0.1963 0.0144 13.61 <1e-41
Test: c5.1 & a1 -0.2989 0.1000 -2.99 0.0028
Test: c234.15 & a1 0.0897 0.1071 0.84 0.4021
Test: c12.34 & a1 -0.0531 0.1100 -0.48 0.6295
Test: c3.4 & a1 0.1735 0.1078 1.61 0.1075
Test: c5.1 & Sex: Boys -0.0725 0.0292 -2.48 0.0131
Test: c234.15 & Sex: Boys 0.1256 0.0315 3.99 <1e-04
Test: c12.34 & Sex: Boys 0.0083 0.0327 0.25 0.7999
Test: c3.4 & Sex: Boys 0.0311 0.0322 0.97 0.3331
a1 & Sex: Boys 0.0534 0.0486 1.10 0.2721
Test: c5.1 & a1 & Sex: Boys 0.2220 0.1000 2.22 0.0264
Test: c234.15 & a1 & Sex: Boys 0.0085 0.1071 0.08 0.9368
Test: c12.34 & a1 & Sex: Boys -0.0072 0.1100 -0.07 0.9479
Test: c3.4 & a1 & Sex: Boys 0.0529 0.1078 0.49 0.6233
Residual 0.0000
VarCorr(m_zcp_1_PC_2)
Column Variance Std.Dev Corr.
Child (Intercept) 0.43514940 0.65965855
Test: c5.1 0.34583510 0.58807746 .
Test: c234.15 0.62155002 0.78838444 . .
Test: c12.34 0.92087872 0.95962426 . . .
Test: c3.4 0.67599521 0.82218928 . . . .
Residual 0.00000000 0.00000070
MixedModels.likelihoodratiotest(m_zcp_1_PC_2, m_cpx_1_PC_2)
model-dof deviance χ² χ²-dof P(>χ²)
zScore ~ 1 + Test + a1 + Sex + Test & a1 + Test & Sex + a1 & Sex + Test & a1 & Sex + zerocorr(1 + Test | Child) 26 13134
zScore ~ 1 + Test + a1 + Sex + Test & a1 + Test & Sex + a1 & Sex + Test & a1 & Sex + (1 + Test | Child) 36 13249 -115 10 NaN

3 Other topics

3.1 Contrasts are re-parameterizations of the same model

The choice of contrast does not affect the model objective, in other words, they all yield the same goodness of fit. It does not matter whether a contrast is orthogonal or not.

[
  objective(m_ovi_SeqDiff),
  objective(m_ovi_Helmert),
  objective(m_ovi_Hypo),
]
3-element Vector{Float64}:
 13876.09548624337
 13876.09548624337
 13876.095486243365

3.2 VCs and CPs depend on contrast coding

Trivially, the meaning of a contrast depends on its definition. Consequently, the contrast specification has a big effect on the random-effect structure. As an illustration, we refit the LMMs with variance components (VCs) and correlation parameters (CPs) for Child-related contrasts of Test. Unfortunately, it is not easy, actually rather quite difficult, to grasp the meaning of correlations of contrast-based effects; they represent two-way interactions.

begin
  f_Child = @formula zScore ~
    1 + Test * a1 * Sex + (1 + Test | Child)
  m_Child_SDC = fit(MixedModel, f_Child, dat; contrasts=contr1)
  m_Child_HeC = fit(MixedModel, f_Child, dat; contrasts=contr2)
  m_Child_HyC = fit(MixedModel, f_Child, dat; contrasts=contr3)
  m_Child_PCA = fit(MixedModel, f_Child, dat; contrasts=contr4)
end
Minimizing 1534    Time: 0:00:10 ( 6.68 ms/it)
Est. SE z p σ_Child
(Intercept) -0.0105 0.0141 -0.74 0.4594 0.6430
Test: c5.1 -0.0113 0.0429 -0.26 0.7920 1.4300
Test: c234.15 0.0073 0.1683 0.04 0.9656 0.9880
Test: c2.34 -0.0117 0.0781 -0.15 0.8813 1.3538
Test: c3.4 -0.0113 0.0452 -0.25 0.8033 1.6103
a1 0.2378 0.0478 4.97 <1e-06
Sex: Boys 0.2043 0.0141 14.44 <1e-46
Test: c5.1 & a1 0.4108 0.1465 2.80 0.0050
Test: c234.15 & a1 -0.8395 0.5708 -1.47 0.1414
Test: c2.34 & a1 -0.0799 0.2623 -0.30 0.7607
Test: c3.4 & a1 0.2216 0.1515 1.46 0.1437
Test: c5.1 & Sex: Boys 0.0542 0.0429 1.27 0.2059
Test: c234.15 & Sex: Boys -0.7856 0.1683 -4.67 <1e-05
Test: c2.34 & Sex: Boys -0.0471 0.0781 -0.60 0.5465
Test: c3.4 & Sex: Boys 0.0161 0.0452 0.36 0.7224
a1 & Sex: Boys 0.0487 0.0478 1.02 0.3081
Test: c5.1 & a1 & Sex: Boys -0.2955 0.1465 -2.02 0.0436
Test: c234.15 & a1 & Sex: Boys 0.3089 0.5708 0.54 0.5884
Test: c2.34 & a1 & Sex: Boys -0.0625 0.2623 -0.24 0.8117
Test: c3.4 & a1 & Sex: Boys 0.0929 0.1515 0.61 0.5400
Residual 0.0000
VarCorr(m_Child_SDC)
Column Variance Std.Dev Corr.
Child (Intercept) 0.407724 0.638533
Test: Star_r 1.056832 1.028024 +0.43
Test: S20_r 0.813550 0.901970 -0.10 +0.39
Test: SLJ 0.467706 0.683890 +0.28 -0.61 -0.47
Test: BPT 1.144422 1.069777 -0.39 -0.47 +0.23 -0.25
Residual 0.000000 0.000011
VarCorr(m_Child_HeC)
Column Variance Std.Dev Corr.
Child (Intercept) 0.35474562 0.59560526
Test: Star_r 0.29928970 0.54707376 +0.42
Test: S20_r 0.22589565 0.47528481 -0.12 +0.46
Test: SLJ 0.03500164 0.18708724 +0.30 -0.39 +0.27
Test: BPT 0.02756931 0.16604008 +0.11 +0.52 +0.03 -0.48
Residual 0.00000000 0.00002613
VarCorr(m_Child_HyC)
Column Variance Std.Dev Corr.
Child (Intercept) 0.40400839 0.63561654
Test: BPT-other 2.22593133 1.49195554 +0.98
Test: Star-End 1.23204816 1.10997665 +0.01 +0.23
Test: S20-Star 1.75345888 1.32418235 -0.16 -0.17 -0.06
Test: SLJ-S20 1.42782262 1.19491532 +0.25 +0.06 -0.84 -0.32
Residual 0.00000000 0.00000437
VarCorr(m_Child_PCA)
Column Variance Std.Dev Corr.
Child (Intercept) 0.41338814 0.64295267
Test: c5.1 2.04484693 1.42998144 -0.14
Test: c234.15 0.97608251 0.98796888 +0.33 -0.72
Test: c2.34 1.83288851 1.35384213 +0.60 +0.50 -0.12
Test: c3.4 2.59301071 1.61028281 -0.11 -0.16 +0.70 -0.10
Residual 0.00000000 0.00001636

The CPs for the various contrasts are in line with expectations. For the SDC we observe substantial negative CPs between neighboring contrasts. For the orthogonal HeC, all CPs are small; they are uncorrelated. HyC contains some of the SDC contrasts and we observe again the negative CPs. The (roughly) PCA-based contrasts are small with one exception; there is a sizeable CP of +.41 between GM and the core of adjusted physical fitness (c234.15).

Do these differences in CPs imply that we can move to zcpLMMs when we have orthogonal contrasts? We pursue this question with by refitting the four LMMs with zerocorr() and compare the goodness of fit.

begin
  f_Child0 = @formula zScore ~
    1 + Test * a1 * Sex + zerocorr(1 + Test | Child)
  m_Child_SDC0 = fit(MixedModel, f_Child0, dat; contrasts=contr1)
  m_Child_HeC0 = fit(MixedModel, f_Child0, dat; contrasts=contr2)
  m_Child_HyC0 = fit(MixedModel, f_Child0, dat; contrasts=contr3)
  m_Child_PCA0 = fit(MixedModel, f_Child0, dat; contrasts=contr4)
end
Minimizing 547    Time: 0:00:01 ( 3.47 ms/it)
Est. SE z p σ_Child
(Intercept) -0.0105 0.0141 -0.74 0.4593 0.6726
Test: c5.1 -0.0135 0.0434 -0.31 0.7563 1.3181
Test: c234.15 0.0266 0.1691 0.16 0.8753 1.5594
Test: c2.34 -0.0038 0.0782 -0.05 0.9615 2.1713
Test: c3.4 -0.0094 0.0446 -0.21 0.8327 1.1917
a1 0.2323 0.0478 4.86 <1e-05
Sex: Boys 0.2043 0.0141 14.45 <1e-46
Test: c5.1 & a1 0.4159 0.1481 2.81 0.0050
Test: c234.15 & a1 -0.9476 0.5737 -1.65 0.0986
Test: c2.34 & a1 -0.0278 0.2631 -0.11 0.9157
Test: c3.4 & a1 0.1935 0.1494 1.30 0.1952
Test: c5.1 & Sex: Boys 0.0529 0.0434 1.22 0.2220
Test: c234.15 & Sex: Boys -0.7771 0.1691 -4.59 <1e-05
Test: c2.34 & Sex: Boys -0.0490 0.0782 -0.63 0.5314
Test: c3.4 & Sex: Boys 0.0177 0.0446 0.40 0.6915
a1 & Sex: Boys 0.0468 0.0478 0.98 0.3276
Test: c5.1 & a1 & Sex: Boys -0.3017 0.1481 -2.04 0.0417
Test: c234.15 & a1 & Sex: Boys 0.3045 0.5737 0.53 0.5956
Test: c2.34 & a1 & Sex: Boys -0.0782 0.2631 -0.30 0.7663
Test: c3.4 & a1 & Sex: Boys 0.1068 0.1494 0.72 0.4746
Residual 0.0000
MixedModels.likelihoodratiotest(m_Child_SDC0, m_Child_SDC)
model-dof deviance χ² χ²-dof P(>χ²)
zScore ~ 1 + Test + a1 + Sex + Test & a1 + Test & Sex + a1 & Sex + Test & a1 & Sex + zerocorr(1 + Test | Child) 26 13874
zScore ~ 1 + Test + a1 + Sex + Test & a1 + Test & Sex + a1 & Sex + Test & a1 & Sex + (1 + Test | Child) 36 13276 597 10 <1e-99
MixedModels.likelihoodratiotest(m_Child_HeC0, m_Child_HeC)
model-dof deviance χ² χ²-dof P(>χ²)
zScore ~ 1 + Test + a1 + Sex + Test & a1 + Test & Sex + a1 & Sex + Test & a1 & Sex + zerocorr(1 + Test | Child) 26 13242
zScore ~ 1 + Test + a1 + Sex + Test & a1 + Test & Sex + a1 & Sex + Test & a1 & Sex + (1 + Test | Child) 36 13309 -67 10 NaN
MixedModels.likelihoodratiotest(m_Child_HyC0, m_Child_HyC)
model-dof deviance χ² χ²-dof P(>χ²)
zScore ~ 1 + Test + a1 + Sex + Test & a1 + Test & Sex + a1 & Sex + Test & a1 & Sex + zerocorr(1 + Test | Child) 26 13228
zScore ~ 1 + Test + a1 + Sex + Test & a1 + Test & Sex + a1 & Sex + Test & a1 & Sex + (1 + Test | Child) 36 13226 3 10 0.9903
MixedModels.likelihoodratiotest(m_Child_PCA0, m_Child_PCA)
model-dof deviance χ² χ²-dof P(>χ²)
zScore ~ 1 + Test + a1 + Sex + Test & a1 + Test & Sex + a1 & Sex + Test & a1 & Sex + zerocorr(1 + Test | Child) 26 13194
zScore ~ 1 + Test + a1 + Sex + Test & a1 + Test & Sex + a1 & Sex + Test & a1 & Sex + (1 + Test | Child) 36 13300 -106 10 NaN

Obviously, we can not drop CPs from any of the LMMs. The full LMMs all have the same objective, but we can compare the goodness-of-fit statistics of zcpLMMs more directly.

begin
  zcpLMM = ["SDC0", "HeC0", "HyC0", "PCA0"]
  mods = [m_Child_SDC0, m_Child_HeC0, m_Child_HyC0, m_Child_PCA0]
  gof_summary = sort!(
    DataFrame(;
      zcpLMM=zcpLMM,
      dof=dof.(mods),
      deviance=deviance.(mods),
      AIC=aic.(mods),
      BIC=bic.(mods),
    ),
    :deviance,
  )
end
4×5 DataFrame
Row zcpLMM dof deviance AIC BIC
String Int64 Float64 Float64 Float64
1 PCA0 26 13194.0 13246.0 13415.4
2 HyC0 26 13228.2 13280.2 13449.6
3 HeC0 26 13242.0 13294.0 13463.5
4 SDC0 26 13873.9 13925.9 14095.4

The best fit was obtained for the PCA-based zcpLMM. Somewhat surprisingly the second best fit was obtained for the SDC. The relatively poor performance of HeC-based zcpLMM is puzzling to me. I thought it might be related to imbalance in design in the present data, but this does not appear to be the case. The same comparison of SequentialDifferenceCoding and Helmert Coding also showed a worse fit for the zcp-HeC LMM than the zcp-SDC LMM.

3.3 VCs and CPs depend on random factor

VCs and CPs resulting from a set of test contrasts can also be estimated for the random factor School. Of course, these VCs and CPs may look different from the ones we just estimated for Child.

The effect of age (i.e., developmental gain) varies within School. Therefore, we also include its VCs and CPs in this model; the school-related VC for Sex was not significant.

f_School = @formula zScore ~
  1 + Test * a1 * Sex + (1 + Test + a1 | School);
m_School_SeqDiff = fit(MixedModel, f_School, dat; contrasts=contr1);
m_School_Helmert = fit(MixedModel, f_School, dat; contrasts=contr2);
m_School_Hypo = fit(MixedModel, f_School, dat; contrasts=contr3);
m_School_PCA = fit(MixedModel, f_School, dat; contrasts=contr4);
Minimizing 1240    Time: 0:00:00 ( 0.69 ms/it)
VarCorr(m_School_SeqDiff)
Column Variance Std.Dev Corr.
School (Intercept) 0.037707 0.194184
Test: Star_r 0.106569 0.326449 +0.35
Test: S20_r 0.063627 0.252244 +0.21 -0.07
Test: SLJ 0.129961 0.360501 -0.08 -0.71 -0.27
Test: BPT 0.149083 0.386113 -0.48 +0.54 +0.03 -0.81
a1 0.025147 0.158577 -0.44 -0.19 -0.93 +0.35 +0.01
Residual 0.862747 0.928842
VarCorr(m_School_Helmert)
Column Variance Std.Dev Corr.
School (Intercept) 0.0377016 0.1941691
Test: Star_r 0.0266388 0.1632140 +0.35
Test: S20_r 0.0093783 0.0968416 +0.37 +0.50
Test: SLJ 0.0049383 0.0702729 +0.15 -0.57 -0.12
Test: BPT 0.0024522 0.0495201 -0.63 +0.35 +0.40 -0.41
a1 0.0249275 0.1578846 -0.45 -0.19 -0.92 -0.18 -0.14
Residual 0.8628151 0.9288784
VarCorr(m_School_Hypo)
Column Variance Std.Dev Corr.
School (Intercept) 0.037710 0.194191
Test: BPT-other 0.984103 0.992019 -0.62
Test: Star-End 0.106587 0.326477 +0.35 +0.35
Test: S20-Star 0.063625 0.252240 +0.21 +0.24 -0.07
Test: SLJ-S20 0.129977 0.360523 -0.08 -0.54 -0.71 -0.27
a1 0.025203 0.158753 -0.44 -0.13 -0.19 -0.93 +0.35
Residual 0.862733 0.928834
VarCorr(m_School_PCA)
Column Variance Std.Dev Corr.
School (Intercept) 0.037703 0.194174
Test: c5.1 0.136693 0.369720 -0.14
Test: c234.15 1.104159 1.050790 +0.94 +0.15
Test: c2.34 0.287138 0.535853 -0.14 +0.09 -0.10
Test: c3.4 0.129866 0.360369 +0.08 +0.68 +0.18 +0.42
a1 0.025136 0.158543 -0.44 -0.45 -0.53 +0.64 -0.35
Residual 0.862773 0.928856

We compare again how much of the fit resides in the CPs.

begin
  f_School0 = @formula zScore ~
    1 + Test * a1 * Sex + zerocorr(1 + Test + a1 | School)
  m_School_SDC0 = fit(MixedModel, f_School0, dat; contrasts=contr1)
  m_School_HeC0 = fit(MixedModel, f_School0, dat; contrasts=contr2)
  m_School_HyC0 = fit(MixedModel, f_School0, dat; contrasts=contr3)
  m_School_PCA0 = fit(MixedModel, f_School0, dat; contrasts=contr4)
  #
  zcpLMM2 = ["SDC0", "HeC0", "HyC0", "PCA0"]
  mods2 = [
    m_School_SDC0, m_School_HeC0, m_School_HyC0, m_School_PCA0
  ]
  gof_summary2 = sort!(
    DataFrame(;
      zcpLMM=zcpLMM2,
      dof=dof.(mods2),
      deviance=deviance.(mods2),
      AIC=aic.(mods2),
      BIC=bic.(mods2),
    ),
    :deviance,
  )
end
Minimizing 223    Time: 0:00:00 ( 0.46 ms/it)
  objective:  13848.77736950177
4×5 DataFrame
Row zcpLMM dof deviance AIC BIC
String Int64 Float64 Float64 Float64
1 PCA0 27 13848.8 13902.8 14078.7
2 HeC0 27 13849.6 13903.6 14079.6
3 HyC0 27 13853.4 13907.4 14083.3
4 SDC0 27 13857.1 13911.1 14087.1

For the random factor School the Helmert contrast, followed by PCA-based contrasts have least information in the CPs; SDC has the largest contribution from CPs. Interesting.

4 That’s it

That’s it for this tutorial. It is time to try your own contrast coding. You can use these data; there are many alternatives to set up hypotheses for the five tests. Of course and even better, code up some contrasts for data of your own.

Have fun!

Fühner, T., Granacher, U., Golle, K., & Kliegl, R. (2021). Age and sex effects in physical fitness components of 108,295 third graders including 515 primary schools and 9 cohorts. Scientific Reports, 11(1). https://doi.org/10.1038/s41598-021-97000-4
Back to top