Mixed Models Tutorial: Contrast Coding

Author

Phillip Alday, Douglas Bates, and Reinhold Kliegl

Published

2022-09-27

Ths script uses a subset of data reported in Fühner, Golle, Granacher, & Kliegl (2021). Age and sex effects in physical fitness components of 108,295 third graders including 515 primary schools and 9 cohorts. (Fuenher2021?)

To circumvent delays associated with model fitting we work with models that are less complex than those in the reference publication. All the data to reproduce the models in the publication are used here, too; the script requires only a few changes to specify the more complex models in the paper.

All children were between 6.0 and 6.99 years at legal keydate (30 September) of school enrollement, that is in their ninth year of life in the third grade. To avoid delays associated with model fitting we work with a reduced data set and less complex models than those in the reference publication. The script requires only a few changes to specify the more complex models in the paper.

The script is structured in three main sections:

  1. Setup with reading and examing the data

  2. Contrasts coding

  1. Other topics

1. Setup

1.0 Packages and functions

Code
using AlgebraOfGraphics
using AlgebraOfGraphics: linear
using Arrow
using CairoMakie
using Chain
using CategoricalArrays
using DataFrames
using DataFrameMacros
using MixedModels
using ProgressMeter
using Statistics
using StatsBase

ProgressMeter.ijulia_behavior(:clear);

1.1 Readme for ‘./data/fggk21.rds’

Number of scores: 525126

  1. Cohort: 9 levels; 2011-2019

  2. School: 515 levels

  3. Child: 108295 levels; all children are between 8.0 and 8.99 years old

  4. Sex: “Girls” (n=55,086), “Boys” (n= 53,209)

  5. age: testdate - middle of month of birthdate

  6. Test: 5 levels

    • Endurance (Run): 6 minute endurance run [m]; to nearest 9m in 9x18m field
    • Coordination (Star_r): star coordination run [m/s]; 9x9m field, 4 x diagonal = 50.912 m
    • Speed(S20_r): 20-meters sprint [m/s]
    • Muscle power low (SLJ): standing long jump [cm]
    • Muscle power up (BPT): 1-kg medicine ball push test [m]
  7. score - see units

1.2 Preprocessing

1.2.1 Read data

tbl = Arrow.Table("./data/fggk21.arrow")
Arrow.Table with 525126 rows, 7 columns, and schema:
 :Cohort  String
 :School  String
 :Child   String
 :Sex     String
 :age     Float64
 :Test    String
 :score   Float64
df = DataFrame(tbl)
describe(df)

7 rows × 7 columns

variablemeanminmedianmaxnmissingeltype
SymbolUnion…AnyUnion…AnyInt64DataType
1Cohort201120190String
2SchoolS100043S8002000String
3ChildC002352C1179660String
4Sexfemalemale0String
5age8.560737.994528.558529.106090Float64
6TestBPTStar_r0String
7score226.1411.141524.651161530.00Float64

1.2.3 Extract a stratified subsample

We extract a random sample of 500 children from the Sex (2) x Test (5) cells of the design. Cohort and School are random.

begin
  dat = @chain df begin
    @transform(:Sex = :Sex == "female" ? "Girls" : "Boys")
    @groupby(:Test, :Sex)
    combine(x -> x[sample(1:nrow(x), 500), :])
  end
end

5,000 rows × 7 columns

TestSexCohortSchoolChildagescore
StringStringStringStringStringFloat64Float64
1S20_rBoys2012S105843C0892228.832314.65116
2S20_rBoys2011S103226C0149248.169754.54545
3S20_rBoys2015S105363C0842348.796714.87805
4S20_rBoys2018S112057C0730548.692684.87805
5S20_rBoys2017S105983C0152958.169754.34783
6S20_rBoys2015S103317C1003858.922665.0
7S20_rBoys2012S105909C0509338.501034.44444
8S20_rBoys2015S103380C0275748.301164.54545
9S20_rBoys2011S100584C0323688.336764.16667
10S20_rBoys2013S101618C0493228.498295.55556
11S20_rBoys2015S103068C1092899.018484.7619
12S20_rBoys2018S111314C1020408.939084.0
13S20_rBoys2017S104954C0634178.610544.65116
14S20_rBoys2012S104644C0318798.334024.54545
15S20_rBoys2015S100973C0945578.881594.7619
16S20_rBoys2019S104929C0765228.717324.87805
17S20_rBoys2014S104929C0097728.12324.16667
18S20_rBoys2015S100146C1006938.936344.44444
19S20_rBoys2016S130370C1042098.960995.55556
20S20_rBoys2018S111211C0164398.188914.44444
21S20_rBoys2011S101825C0795268.750174.08163
22S20_rBoys2018S111302C0347718.358664.54545
23S20_rBoys2014S112057C0362318.375094.65116
24S20_rBoys2019S102295C0282988.30394.54545
25S20_rBoys2014S103718C0547298.542094.16667
26S20_rBoys2017S100171C0294698.312115.12821
27S20_rBoys2017S103664C0120978.142375.40541
28S20_rBoys2019S104899C0852678.802195.26316
29S20_rBoys2015S101059C0391558.40525.26316
30S20_rBoys2014S101291C0447318.457225.26316

1.2.2 Transformations

begin
  transform!(dat, :age, :age => (x -> x .- 8.5) => :a1) # centered age (linear)
  select!(groupby(dat, :Test), :, :score => zscore => :zScore) # z-score
end

5,000 rows × 9 columns

TestSexCohortSchoolChildagescorea1zScore
StringStringStringStringStringFloat64Float64Float64Float64
1S20_rBoys2012S105843C0892228.832314.651160.3323070.271572
2S20_rBoys2011S103226C0149248.169754.54545-0.3302530.00985277
3S20_rBoys2015S105363C0842348.796714.878050.2967150.833312
4S20_rBoys2018S112057C0730548.692684.878050.1926760.833312
5S20_rBoys2017S105983C0152958.169754.34783-0.330253-0.479449
6S20_rBoys2015S103317C1003858.922665.00.4226561.13525
7S20_rBoys2012S105909C0509338.501034.444440.00102669-0.240235
8S20_rBoys2015S103380C0275748.301164.54545-0.1988360.00985277
9S20_rBoys2011S100584C0323688.336764.16667-0.163244-0.927976
10S20_rBoys2013S101618C0493228.498295.55556-0.001711162.51073
11S20_rBoys2015S103068C1092899.018484.76190.518480.545755
12S20_rBoys2018S111314C1020408.939084.00.439083-1.34062
13S20_rBoys2017S104954C0634178.610544.651160.1105410.271572
14S20_rBoys2012S104644C0318798.334024.54545-0.1659820.00985277
15S20_rBoys2015S100973C0945578.881594.76190.3815880.545755
16S20_rBoys2019S104929C0765228.717324.878050.2173170.833312
17S20_rBoys2014S104929C0097728.12324.16667-0.376797-0.927976
18S20_rBoys2015S100146C1006938.936344.444440.436345-0.240235
19S20_rBoys2016S130370C1042098.960995.555560.4609862.51073
20S20_rBoys2018S111211C0164398.188914.44444-0.311088-0.240235
21S20_rBoys2011S101825C0795268.750174.081630.250171-1.13851
22S20_rBoys2018S111302C0347718.358664.54545-0.1413420.00985277
23S20_rBoys2014S112057C0362318.375094.65116-0.1249140.271572
24S20_rBoys2019S102295C0282988.30394.54545-0.1960990.00985277
25S20_rBoys2014S103718C0547298.542094.166670.0420945-0.927976
26S20_rBoys2017S100171C0294698.312115.12821-0.1878851.45267
27S20_rBoys2017S103664C0120978.142375.40541-0.3576322.13898
28S20_rBoys2019S104899C0852678.802195.263160.302191.78679
29S20_rBoys2015S101059C0391558.40525.26316-0.09479811.78679
30S20_rBoys2014S101291C0447318.457225.26316-0.04277891.78679
begin
  dat2 = combine(
    groupby(dat, [:Test, :Sex]),
    :score => mean,
    :score => std,
    :zScore => mean,
    :zScore => std,
  )
end

10 rows × 6 columns

TestSexscore_meanscore_stdzScore_meanzScore_std
StringStringFloat64Float64Float64Float64
1S20_rBoys4.630430.4056150.2202391.00425
2BPTBoys3.9620.6918040.3277810.971136
3SLJBoys129.29219.01290.1796170.993033
4Star_rBoys2.075350.3085390.1153741.04791
5RunBoys1044.73150.9890.2990591.04055
6S20_rGirls4.452520.382379-0.2202390.946721
7BPTGirls3.4950.654287-0.3277810.91847
8SLJGirls122.41418.6736-0.1796170.975313
9Star_rGirls2.007420.275763-0.1153740.936594
10RunGirls957.938124.827-0.2990590.860253

1.2.3 Figure of age x Sex x Test interactions

The main results of relevance here are shown in Figure 2 of Scientific Reports 11:17566.

2. Contrast coding

Contrast coding is part of StatsModels.jl. Here is the primary author’s (i.e., Dave Kleinschmidt’s documentation of Modeling Categorical Data.

The random factors Child, School, and Cohort are assigned a Grouping contrast. This contrast is needed when the number of groups (i.e., units, levels) is very large. This is the case for Child (i.e., the 108,925 children in the full and probably also the 11,566 children in the reduced data set). The assignment is not necessary for the typical sample size of experiments. However, we use this coding of random factors irrespective of the number of units associated with them to be transparent about the distinction between random and fixed factors.

A couple of general remarks about the following examples. First, all contrasts defined in this tutorial return an estimate of the Grand Mean (GM) in the intercept, that is they are so-called sum-to-zero contrasts. In both Julia and R the default contrast is Dummy coding which is not a sum-to-zero contrast, but returns the mean of the reference (control) group - unfortunately for (quasi-)experimentally minded scientists.

Second, The factor Sex has only two levels. We use EffectCoding (also known as Sum coding in R) to estimate the difference of the levels from the Grand Mean. Unlike in R, the default sign of the effect is for the second level (base is the first, not the last level), but this can be changed with the base kwarg in the command. Effect coding is a sum-to-zero contrast, but when applied to factors with more than two levels does not yield orthogonal contrasts.

Finally, contrasts for the five levels of the fixed factor Test represent the hypotheses about differences between them. In this tutorial, we use this factor to illustrate various options.

We (initially) include only Test as fixed factor and Child as random factor. More complex LMMs can be specified by simply adding other fixed or random factors to the formula.

2.1 SeqDiffCoding: contr1

SeqDiffCoding was used in the publication. This specification tests pairwise differences between the five neighboring levels of Test, that is:

  • SDC1: 2-1
  • SDC2: 3-2
  • SDC3: 4-3
  • SDC4: 5-4

The levels were sorted such that these contrasts map onto four a priori hypotheses; in other words, they are theoretically motivated pairwise comparisons. The motivation also encompasses theoretically motivated interactions with Sex. The order of levels can also be explicitly specified during contrast construction. This is very useful if levels are in a different order in the dataframe. We recommend the explicit specification to increase transparency of the code.

The statistical disadvantage of SeqDiffCoding is that the contrasts are not orthogonal, that is the contrasts are correlated. This is obvious from the fact that levels 2, 3, and 4 are all used in two contrasts. One consequence of this is that correlation parameters estimated between neighboring contrasts (e.g., 2-1 and 3-2) are difficult to interpret. Usually, they will be negative because assuming some practical limitation on the overall range (e.g., between levels 1 and 3), a small “2-1” effect “correlates” negatively with a larger “3-2” effect for mathematical reasons.

Obviously, the tradeoff between theoretical motivation and statistical purity is something that must be considered carefully when planning the analysis.

contr1 = merge(
  Dict(nm => Grouping() for nm in (:School, :Child, :Cohort)),
  Dict(
    :Sex => EffectsCoding(; levels=["Girls", "Boys"]),
    :Test => SeqDiffCoding(;
      levels=["Run", "Star_r", "S20_r", "SLJ", "BPT"]
    ),
  ),
)
Dict{Symbol, StatsModels.AbstractContrasts} with 5 entries:
  :Child  => Grouping()
  :School => Grouping()
  :Test   => SeqDiffCoding(["Run", "Star_r", "S20_r", "SLJ", "BPT"])
  :Cohort => Grouping()
  :Sex    => EffectsCoding(nothing, ["Girls", "Boys"])
f_ovi_1 = @formula zScore ~ 1 + Test + (1 | Child);
m_ovi_SeqDiff_1 = fit(MixedModel, f_ovi_1, dat; contrasts=contr1)
Minimizing 13    Time: 0:00:00 (19.66 ms/it)
Est. SE z p σ_Child
(Intercept) -0.0012 0.0143 -0.08 0.9356 0.7639
Test: Star_r -0.0069 0.0440 -0.16 0.8748
Test: S20_r -0.0087 0.0440 -0.20 0.8427
Test: SLJ 0.0091 0.0439 0.21 0.8352
Test: BPT -0.0032 0.0438 -0.07 0.9411
Residual 0.6426

In this case, any differences between tests identified by the contrasts would be spurious because each test was standardized (i.e., M=0, \(SD\)=1). The differences could also be due to an imbalance in the number of boys and girls or in the number of missing observations for each test.

The primary interest in this study related to interactions of the test contrasts with and age and Sex. We start with age (linear) and its interaction with the four test contrasts.

m_ovi_SeqDiff_2 = let
  form = @formula zScore ~ 1 + Test * a1 + (1 | Child)
  fit(MixedModel, form, dat; contrasts=contr1)
end
Est. SE z p σ_Child
(Intercept) -0.0184 0.0145 -1.27 0.2047 0.7593
Test: Star_r -0.0056 0.0447 -0.13 0.8997
Test: S20_r -0.0143 0.0447 -0.32 0.7484
Test: SLJ 0.0133 0.0446 0.30 0.7663
Test: BPT -0.0161 0.0445 -0.36 0.7168
a1 0.3062 0.0499 6.14 <1e-09
Test: Star_r & a1 0.0392 0.1570 0.25 0.8027
Test: S20_r & a1 0.0528 0.1568 0.34 0.7363
Test: SLJ & a1 -0.0355 0.1526 -0.23 0.8159
Test: BPT & a1 0.1896 0.1510 1.26 0.2092
Residual 0.6416

The difference between older and younger childrend is larger for Star_r than for Run (0.2473). S20_r did not differ significantly from Star_r (-0.0377) and SLJ (-0.0113) The largest difference in developmental gain was between BPT and SLJ (0.3355).

Please note that standard errors of this LMM are anti-conservative because the LMM is missing a lot of information in the RES (e..g., contrast-related VCs snd CPs for Child, School, and Cohort.

Next we add the main effect of Sex and its interaction with the four test contrasts.

m_ovi_SeqDiff_3 = let
  form = @formula zScore ~ 1 + Test * (a1 + Sex) + (1 | Child)
  fit(MixedModel, form, dat; contrasts=contr1)
end
Est. SE z p σ_Child
(Intercept) -0.0184 0.0140 -1.31 0.1908 0.7244
Test: Star_r -0.0059 0.0435 -0.14 0.8925
Test: S20_r -0.0125 0.0434 -0.29 0.7737
Test: SLJ 0.0132 0.0434 0.30 0.7617
Test: BPT -0.0164 0.0433 -0.38 0.7049
a1 0.2973 0.0484 6.14 <1e-09
Sex: Boys 0.2248 0.0138 16.31 <1e-59
Test: Star_r & a1 0.0390 0.1527 0.26 0.7983
Test: S20_r & a1 0.0354 0.1525 0.23 0.8165
Test: SLJ & a1 -0.0333 0.1485 -0.22 0.8227
Test: BPT & a1 0.2010 0.1469 1.37 0.1711
Test: Star_r & Sex: Boys -0.1901 0.0426 -4.46 <1e-05
Test: S20_r & Sex: Boys 0.1033 0.0426 2.43 0.0153
Test: SLJ & Sex: Boys -0.0322 0.0425 -0.76 0.4487
Test: BPT & Sex: Boys 0.1436 0.0425 3.38 0.0007
Residual 0.6376

The significant interactions with Sex reflect mostly differences related to muscle power, where the physiological constitution gives boys an advantage. The sex difference is smaller when coordination and cognition play a role – as in the Star_r test. (Caveat: SEs are estimated with an underspecified RES.)

The final step in this first series is to add the interactions between the three covariates. A significant interaction between any of the four Test contrasts and age (linear) x Sex was hypothesized to reflect a prepubertal signal (i.e., hormones start to rise in girls’ ninth year of life). However, this hypothesis is linked to a specific shape of the interaction: Girls would need to gain more than boys in tests of muscular power.

f_ovi = @formula zScore ~ 1 + Test * a1 * Sex + (1 | Child)
m_ovi_SeqDiff = fit(MixedModel, f_ovi, dat; contrasts=contr1)
Est. SE z p σ_Child
(Intercept) -0.0184 0.0140 -1.31 0.1899 0.7248
Test: Star_r -0.0054 0.0435 -0.12 0.9015
Test: S20_r -0.0122 0.0434 -0.28 0.7784
Test: SLJ 0.0119 0.0434 0.27 0.7840
Test: BPT -0.0149 0.0433 -0.34 0.7301
a1 0.2972 0.0484 6.14 <1e-09
Sex: Boys 0.2262 0.0140 16.11 <1e-57
Test: Star_r & a1 0.0370 0.1526 0.24 0.8083
Test: S20_r & a1 0.0344 0.1524 0.23 0.8216
Test: SLJ & a1 -0.0389 0.1485 -0.26 0.7935
Test: BPT & a1 0.2018 0.1469 1.37 0.1696
Test: Star_r & Sex: Boys -0.1762 0.0435 -4.05 <1e-04
Test: S20_r & Sex: Boys 0.0988 0.0434 2.27 0.0230
Test: SLJ & Sex: Boys -0.0416 0.0434 -0.96 0.3374
Test: BPT & Sex: Boys 0.1545 0.0433 3.57 0.0004
a1 & Sex: Boys -0.0279 0.0484 -0.58 0.5637
Test: Star_r & a1 & Sex: Boys -0.2517 0.1526 -1.65 0.0992
Test: S20_r & a1 & Sex: Boys 0.0997 0.1524 0.65 0.5129
Test: SLJ & a1 & Sex: Boys 0.1646 0.1485 1.11 0.2674
Test: BPT & a1 & Sex: Boys -0.1940 0.1469 -1.32 0.1865
Residual 0.6364

The results are very clear: Despite an abundance of statistical power there is no evidence for the differences between boys and girls in how much they gain in the ninth year of life in these five tests. The authors argue that, in this case, absence of evidence looks very much like evidence of absence of a hypothesized interaction.

In the next two sections we use different contrasts. Does this have a bearing on this result? We still ignore for now that we are looking at anti-conservative test statistics.

2.2 HelmertCoding: contr2

The second set of contrasts uses HelmertCoding. Helmert coding codes each level as the difference from the average of the lower levels. With the default order of Test levels we get the following test statistics which we describe in reverse order of appearance in model output

  • HeC4: 5 - mean(1,2,3,4)
  • HeC3: 4 - mean(1,2,3)
  • HeC2: 3 - mean(1,2)
  • HeC1: 2 - 1

In the model output, HeC1 will be reported first and HeC4 last.

There is some justification for the HeC4 specification in a post-hoc manner because the fifth test (BPT) turned out to be different from the other four tests in that high performance is most likely not only related to physical fitness, but also to overweight/obesity, that is for a subset of children high scores on this test might be indicative of physical unfitness. A priori the SDC4 contrast 5-4 between BPT (5) and SLJ (4) was motivated because conceptually both are tests of the physical fitness component Muscular Power, BPT for upper limbs and SLJ for lower limbs, respectively.

One could argue that there is justification for HeC3 because Run (1), Star_r (2), and S20 (3) involve running but SLJ (4) does not. Sports scientists, however, recoil. For them it does not make much sense to average the different running tests, because they draw on completely different physiological resources; it is a variant of the old apples-and-oranges problem.

The justification for HeC3 is thatRun (1) and Star_r (2) draw more strongly on cardiosrespiratory Endurance than S20 (3) due to the longer duration of the runs compared to sprinting for 20 m which is a pure measure of the physical-fitness component Speed. Again, sports scientists are not very happy with this proposal.

Finally, HeC1 contrasts the fitness components Endurance, indicated best by Run (1), and Coordination, indicated by Star_r (2). Endurance (i.e., running for 6 minutes) is considered to be the best indicator of health-related status among the five tests because it is a rather pure measure of cardiorespiratory fitness. The Star_r test requires execution of a pre-instructed sequence of forward, sideways, and backward runs. This coordination of body movements implies a demand on working memory (i.e., remembering the order of these subruns) and executive control processes, but performats also depends on endurance. HeC1 yields a measure of Coordination “corrected” for the contribution of Endurance.

The statistical advantage of HelmertCoding is that the resulting contrasts are orthogonal (uncorrelated). This allows for optimal partitioning of variance and statistical power. It is also more efficient to estimate “orthogonal” than “non-orthogonal” random-effect structures.

contr2 = Dict(
  :School => Grouping(),
  :Child => Grouping(),
  :Cohort => Grouping(),
  :Sex => EffectsCoding(; levels=["Girls", "Boys"]),
  :Test => HelmertCoding(;
    levels=["Run", "Star_r", "S20_r", "SLJ", "BPT"],
  ),
);
m_ovi_Helmert = fit(MixedModel, f_ovi, dat; contrasts=contr2)
Est. SE z p σ_Child
(Intercept) -0.0184 0.0140 -1.31 0.1899 0.7248
Test: Star_r -0.0027 0.0217 -0.12 0.9015
Test: S20_r -0.0050 0.0126 -0.40 0.6925
Test: SLJ 0.0005 0.0088 0.06 0.9560
Test: BPT -0.0027 0.0069 -0.39 0.6952
a1 0.2972 0.0484 6.14 <1e-09
Sex: Boys 0.2262 0.0140 16.11 <1e-57
Test: Star_r & a1 0.0185 0.0763 0.24 0.8083
Test: S20_r & a1 0.0176 0.0437 0.40 0.6868
Test: SLJ & a1 -0.0009 0.0302 -0.03 0.9762
Test: BPT & a1 0.0398 0.0234 1.70 0.0890
Test: Star_r & Sex: Boys -0.0881 0.0217 -4.05 <1e-04
Test: S20_r & Sex: Boys 0.0036 0.0126 0.28 0.7773
Test: SLJ & Sex: Boys -0.0086 0.0088 -0.98 0.3290
Test: BPT & Sex: Boys 0.0257 0.0069 3.74 0.0002
a1 & Sex: Boys -0.0279 0.0484 -0.58 0.5637
Test: Star_r & a1 & Sex: Boys -0.1258 0.0763 -1.65 0.0992
Test: S20_r & a1 & Sex: Boys -0.0087 0.0437 -0.20 0.8421
Test: SLJ & a1 & Sex: Boys 0.0368 0.0302 1.22 0.2227
Test: BPT & a1 & Sex: Boys -0.0167 0.0234 -0.71 0.4749
Residual 0.6364

We forego a detailed discussion of the effects, but note that again none of the interactions between age x Sex with the four test contrasts was significant.

The default labeling of Helmert contrasts may lead to confusions with other contrasts. Therefore, we could provide our own labels:

labels=["c2.1", "c3.12", "c4.123", "c5.1234"]

Once the order of levels is memorized the proposed labelling is very transparent.

2.3 HypothesisCoding: contr3

The third set of contrasts uses HypothesisCoding. Hypothesis coding allows the user to specify their own a priori contrast matrix, subject to the mathematical constraint that the matrix has full rank. For example, sport scientists agree that the first four tests can be contrasted with BPT, because the difference is akin to a correction of overall physical fitness. However, they want to keep the pairwise comparisons for the first four tests.

  • HyC1: BPT - mean(1,2,3,4)
  • HyC2: Star_r - Run_r
  • HyC3: Run_r - S20_r
  • HyC4: S20_r - SLJ
contr3 = Dict(
  :School => Grouping(),
  :Child => Grouping(),
  :Cohort => Grouping(),
  :Sex => EffectsCoding(; levels=["Girls", "Boys"]),
  :Test => HypothesisCoding(
    [
      -1 -1 -1 -1 +4
      -1 +1 0 0 0
       0 -1 +1 0 0
       0 0 -1 +1 0
    ];
    levels=["Run", "Star_r", "S20_r", "SLJ", "BPT"],
    labels=["BPT-other", "Star-End", "S20-Star", "SLJ-S20"],
  ),
);
m_ovi_Hypo = fit(MixedModel, f_ovi, dat; contrasts=contr3)
Est. SE z p σ_Child
(Intercept) -0.0184 0.0140 -1.31 0.1899 0.7248
Test: BPT-other -0.0539 0.1375 -0.39 0.6952
Test: Star-End -0.0054 0.0435 -0.12 0.9015
Test: S20-Star -0.0122 0.0434 -0.28 0.7784
Test: SLJ-S20 0.0119 0.0434 0.27 0.7840
a1 0.2972 0.0484 6.14 <1e-09
Sex: Boys 0.2262 0.0140 16.11 <1e-57
Test: BPT-other & a1 0.7962 0.4681 1.70 0.0890
Test: Star-End & a1 0.0370 0.1526 0.24 0.8083
Test: S20-Star & a1 0.0344 0.1524 0.23 0.8216
Test: SLJ-S20 & a1 -0.0389 0.1485 -0.26 0.7935
Test: BPT-other & Sex: Boys 0.5144 0.1375 3.74 0.0002
Test: Star-End & Sex: Boys -0.1762 0.0435 -4.05 <1e-04
Test: S20-Star & Sex: Boys 0.0988 0.0434 2.27 0.0230
Test: SLJ-S20 & Sex: Boys -0.0416 0.0434 -0.96 0.3374
a1 & Sex: Boys -0.0279 0.0484 -0.58 0.5637
Test: BPT-other & a1 & Sex: Boys -0.3345 0.4681 -0.71 0.4749
Test: Star-End & a1 & Sex: Boys -0.2517 0.1526 -1.65 0.0992
Test: S20-Star & a1 & Sex: Boys 0.0997 0.1524 0.65 0.5129
Test: SLJ-S20 & a1 & Sex: Boys 0.1646 0.1485 1.11 0.2674
Residual 0.6364

With HypothesisCoding we must generate our own labels for the contrasts. The default labeling of contrasts is usually not interpretable. Therefore, we provide our own.

Anyway, none of the interactions between age x Sex with the four Test contrasts was significant for these contrasts.

contr1b = Dict(
  :School => Grouping(),
  :Child => Grouping(),
  :Cohort => Grouping(),
  :Sex => EffectsCoding(; levels=["Girls", "Boys"]),
  :Test => HypothesisCoding(
    [
      -1 +1 0 0 0
      0 -1 +1 0 0
      0 0 -1 +1 0
      0 0 0 -1 +1
    ];
    levels=["Run", "Star_r", "S20_r", "SLJ", "BPT"],
    labels=["Star-Run", "S20-Star", "SLJ-S20", "BPT-SLJ"],
  ),
);
m_ovi_SeqDiff_v2 = fit(MixedModel, f_ovi, dat; contrasts=contr1b)
Est. SE z p σ_Child
(Intercept) -0.0184 0.0140 -1.31 0.1899 0.7248
Test: Star-Run -0.0054 0.0435 -0.12 0.9015
Test: S20-Star -0.0122 0.0434 -0.28 0.7784
Test: SLJ-S20 0.0119 0.0434 0.27 0.7840
Test: BPT-SLJ -0.0149 0.0433 -0.34 0.7301
a1 0.2972 0.0484 6.14 <1e-09
Sex: Boys 0.2262 0.0140 16.11 <1e-57
Test: Star-Run & a1 0.0370 0.1526 0.24 0.8083
Test: S20-Star & a1 0.0344 0.1524 0.23 0.8216
Test: SLJ-S20 & a1 -0.0389 0.1485 -0.26 0.7935
Test: BPT-SLJ & a1 0.2018 0.1469 1.37 0.1696
Test: Star-Run & Sex: Boys -0.1762 0.0435 -4.05 <1e-04
Test: S20-Star & Sex: Boys 0.0988 0.0434 2.27 0.0230
Test: SLJ-S20 & Sex: Boys -0.0416 0.0434 -0.96 0.3374
Test: BPT-SLJ & Sex: Boys 0.1545 0.0433 3.57 0.0004
a1 & Sex: Boys -0.0279 0.0484 -0.58 0.5637
Test: Star-Run & a1 & Sex: Boys -0.2517 0.1526 -1.65 0.0992
Test: S20-Star & a1 & Sex: Boys 0.0997 0.1524 0.65 0.5129
Test: SLJ-S20 & a1 & Sex: Boys 0.1646 0.1485 1.11 0.2674
Test: BPT-SLJ & a1 & Sex: Boys -0.1940 0.1469 -1.32 0.1865
Residual 0.6364
m_zcp_SeqD = let
  form = @formula(
    zScore ~ 1 + Test * a1 * Sex + zerocorr(1 + Test | Child)
  )
  fit(MixedModel, form, dat; contrasts=contr1b)
end
Minimizing 119   Time: 0:00:00 ( 5.75 ms/it)
  objective:  13765.072908090397
Est. SE z p σ_Child
(Intercept) -0.0184 0.0141 -1.31 0.1912 0.7435
Test: Star-Run -0.0070 0.0439 -0.16 0.8737 0.0000
Test: S20-Star -0.0068 0.0431 -0.16 0.8746 0.6872
Test: SLJ-S20 0.0117 0.0424 0.28 0.7820 0.4721
Test: BPT-SLJ -0.0202 0.0427 -0.47 0.6354 0.1769
a1 0.2973 0.0485 6.14 <1e-09
Sex: Boys 0.2260 0.0141 16.08 <1e-57
Test: Star-Run & a1 0.0309 0.1537 0.20 0.8409
Test: S20-Star & a1 0.0345 0.1513 0.23 0.8195
Test: SLJ-S20 & a1 -0.0255 0.1450 -0.18 0.8604
Test: BPT-SLJ & a1 0.1765 0.1456 1.21 0.2255
Test: Star-Run & Sex: Boys -0.1718 0.0439 -3.91 <1e-04
Test: S20-Star & Sex: Boys 0.0965 0.0431 2.24 0.0252
Test: SLJ-S20 & Sex: Boys -0.0377 0.0424 -0.89 0.3737
Test: BPT-SLJ & Sex: Boys 0.1496 0.0427 3.50 0.0005
a1 & Sex: Boys -0.0273 0.0485 -0.56 0.5729
Test: Star-Run & a1 & Sex: Boys -0.2833 0.1537 -1.84 0.0653
Test: S20-Star & a1 & Sex: Boys 0.1226 0.1513 0.81 0.4175
Test: SLJ-S20 & a1 & Sex: Boys 0.1645 0.1450 1.13 0.2568
Test: BPT-SLJ & a1 & Sex: Boys -0.2100 0.1456 -1.44 0.1492
Residual 0.4553
m_zcp_SeqD_2 = let
  form = @formula(
    zScore ~ 1 + Test * a1 * Sex + (0 + Test | Child)
  )
  fit(MixedModel, form, dat; contrasts=contr1b)
end
Minimizing 3065      Time: 0:00:19 ( 6.52 ms/it)
Est. SE z p σ_Child
(Intercept) -0.0185 0.0140 -1.32 0.1867
Test: Star-Run -0.0112 0.0429 -0.26 0.7949
Test: S20-Star -0.0012 0.0442 -0.03 0.9787
Test: SLJ-S20 0.0101 0.0437 0.23 0.8177
Test: BPT-SLJ -0.0110 0.0429 -0.26 0.7979
a1 0.2985 0.0483 6.18 <1e-09
Sex: Boys 0.2263 0.0140 16.16 <1e-57
Test: Star-Run & a1 0.0282 0.1503 0.19 0.8510
Test: S20-Star & a1 0.0286 0.1550 0.18 0.8534
Test: SLJ-S20 & a1 -0.0303 0.1494 -0.20 0.8391
Test: BPT-SLJ & a1 0.1899 0.1455 1.30 0.1919
Test: Star-Run & Sex: Boys -0.1733 0.0429 -4.04 <1e-04
Test: S20-Star & Sex: Boys 0.0991 0.0442 2.24 0.0250
Test: SLJ-S20 & Sex: Boys -0.0413 0.0437 -0.95 0.3442
Test: BPT-SLJ & Sex: Boys 0.1555 0.0429 3.63 0.0003
a1 & Sex: Boys -0.0255 0.0483 -0.53 0.5979
Test: Star-Run & a1 & Sex: Boys -0.2945 0.1503 -1.96 0.0500
Test: S20-Star & a1 & Sex: Boys 0.1227 0.1550 0.79 0.4285
Test: SLJ-S20 & a1 & Sex: Boys 0.1608 0.1494 1.08 0.2818
Test: BPT-SLJ & a1 & Sex: Boys -0.1935 0.1455 -1.33 0.1836
Test: BPT 0.9400
Test: SLJ 0.9683
Test: Star_r 0.9867
Test: Run 0.9408
Test: S20_r 0.9752
Residual 0.0000
m_cpx_0_SeqDiff = let
  f_cpx_0 = @formula(
    zScore ~ 1 + Test * a1 * Sex + (0 + Test | Child)
  )
  fit(MixedModel, f_cpx_0, dat; contrasts=contr1b)
end
Minimizing 3065      Time: 0:00:20 ( 6.58 ms/it)
Est. SE z p σ_Child
(Intercept) -0.0185 0.0140 -1.32 0.1867
Test: Star-Run -0.0112 0.0429 -0.26 0.7949
Test: S20-Star -0.0012 0.0442 -0.03 0.9787
Test: SLJ-S20 0.0101 0.0437 0.23 0.8177
Test: BPT-SLJ -0.0110 0.0429 -0.26 0.7979
a1 0.2985 0.0483 6.18 <1e-09
Sex: Boys 0.2263 0.0140 16.16 <1e-57
Test: Star-Run & a1 0.0282 0.1503 0.19 0.8510
Test: S20-Star & a1 0.0286 0.1550 0.18 0.8534
Test: SLJ-S20 & a1 -0.0303 0.1494 -0.20 0.8391
Test: BPT-SLJ & a1 0.1899 0.1455 1.30 0.1919
Test: Star-Run & Sex: Boys -0.1733 0.0429 -4.04 <1e-04
Test: S20-Star & Sex: Boys 0.0991 0.0442 2.24 0.0250
Test: SLJ-S20 & Sex: Boys -0.0413 0.0437 -0.95 0.3442
Test: BPT-SLJ & Sex: Boys 0.1555 0.0429 3.63 0.0003
a1 & Sex: Boys -0.0255 0.0483 -0.53 0.5979
Test: Star-Run & a1 & Sex: Boys -0.2945 0.1503 -1.96 0.0500
Test: S20-Star & a1 & Sex: Boys 0.1227 0.1550 0.79 0.4285
Test: SLJ-S20 & a1 & Sex: Boys 0.1608 0.1494 1.08 0.2818
Test: BPT-SLJ & a1 & Sex: Boys -0.1935 0.1455 -1.33 0.1836
Test: BPT 0.9400
Test: SLJ 0.9683
Test: Star_r 0.9867
Test: Run 0.9408
Test: S20_r 0.9752
Residual 0.0000
VarCorr(m_cpx_0_SeqDiff)
Column Variance Std.Dev Corr.
Child Test: Run 0.88515452 0.94082651
Test: Star_r 0.97357048 0.98669675 +0.81
Test: S20_r 0.95106568 0.97522597 +0.37 +0.39
Test: SLJ 0.93766328 0.96833015 +0.45 +0.52 +0.67
Test: BPT 0.88365342 0.94002842 +0.08 +0.57 +0.32 +0.57
Residual 0.00000000 0.00002194
m_cpx_0_SeqDiff.PCA
(Child = 
Principal components based on correlation matrix
 Test: Run     1.0    .     .     .     .
 Test: Star_r  0.81  1.0    .     .     .
 Test: S20_r   0.37  0.39  1.0    .     .
 Test: SLJ     0.45  0.52  0.67  1.0    .
 Test: BPT     0.08  0.57  0.32  0.57  1.0

Normalized cumulative variances:
[0.5857, 0.7855, 0.9391, 0.995, 1.0]

Component loadings
                 PC1    PC2    PC3    PC4    PC5
 Test: Run     -0.42   0.68  -0.03   0.13  -0.59
 Test: Star_r  -0.51   0.31   0.4   -0.26   0.65
 Test: S20_r   -0.42  -0.22  -0.68  -0.56  -0.04
 Test: SLJ     -0.49  -0.28  -0.25   0.76   0.21
 Test: BPT     -0.38  -0.56   0.57  -0.15  -0.44,)
f_cpx_1 = @formula(
  zScore ~ 1 + Test * a1 * Sex + (1 + Test | Child)
)
m_cpx_1_SeqDiff =
fit(MixedModel, f_cpx_1, dat; contrasts=contr1b)
Minimizing 1957      Time: 0:00:12 ( 6.55 ms/it)
Est. SE z p σ_Child
(Intercept) -0.0180 0.0140 -1.29 0.1981 0.7462
Test: Star-Run -0.0141 0.0427 -0.33 0.7404 0.5259
Test: S20-Star -0.0032 0.0436 -0.07 0.9422 0.9352
Test: SLJ-S20 0.0163 0.0435 0.38 0.7069 0.7005
Test: BPT-SLJ -0.0120 0.0428 -0.28 0.7801 0.8674
a1 0.2997 0.0483 6.20 <1e-09
Sex: Boys 0.2264 0.0140 16.15 <1e-57
Test: Star-Run & a1 0.0333 0.1492 0.22 0.8235
Test: S20-Star & a1 0.0280 0.1530 0.18 0.8547
Test: SLJ-S20 & a1 -0.0171 0.1487 -0.11 0.9086
Test: BPT-SLJ & a1 0.1836 0.1454 1.26 0.2067
Test: Star-Run & Sex: Boys -0.1743 0.0427 -4.09 <1e-04
Test: S20-Star & Sex: Boys 0.1021 0.0436 2.34 0.0193
Test: SLJ-S20 & Sex: Boys -0.0403 0.0435 -0.93 0.3535
Test: BPT-SLJ & Sex: Boys 0.1527 0.0428 3.57 0.0004
a1 & Sex: Boys -0.0240 0.0483 -0.50 0.6193
Test: Star-Run & a1 & Sex: Boys -0.3039 0.1492 -2.04 0.0417
Test: S20-Star & a1 & Sex: Boys 0.1262 0.1530 0.82 0.4095
Test: SLJ-S20 & a1 & Sex: Boys 0.1656 0.1487 1.11 0.2657
Test: BPT-SLJ & a1 & Sex: Boys -0.1951 0.1454 -1.34 0.1797
Residual 0.0000
m_cpx_1_SeqDiff.PCA
(Child = 
Principal components based on correlation matrix
 (Intercept)      1.0     .      .      .      .
 Test: Star-Run   0.51   1.0     .      .      .
 Test: S20-Star  -0.29   0.08   1.0     .      .
 Test: SLJ-S20    0.18   0.67  -0.51   1.0     .
 Test: BPT-SLJ   -0.17  -0.33  -0.86   0.42   1.0

Normalized cumulative variances:
[0.4556, 0.8259, 0.9916, 0.9982, 1.0]

Component loadings
                   PC1    PC2    PC3    PC4    PC5
 (Intercept)     -0.23   0.46   0.77  -0.18  -0.34
 Test: Star-Run  -0.21   0.67  -0.27  -0.29   0.59
 Test: S20-Star   0.6    0.24  -0.3   -0.53  -0.46
 Test: SLJ-S20   -0.55   0.24  -0.5    0.27  -0.57
 Test: BPT-SLJ   -0.5   -0.47  -0.05  -0.72  -0.02,)

2.4 PCA-based HypothesisCoding: contr4

The fourth set of contrasts uses HypothesisCoding to specify the set of contrasts implementing the loadings of the four principle components of the published LMM based on test scores, not test effects (contrasts) - coarse-grained, that is roughly according to their signs. This is actually a very interesting and plausible solution nobody had proposed a priori.

  • PC1: BPT - Run_r
  • PC2: (Star_r + S20_r + SLJ) - (BPT + Run_r)
  • PC3: Star_r - (S20_r + SLJ)
  • PC4: S20_r - SLJ

PC1 contrasts the worst and the best indicator of physical health; PC2 contrasts these two against the core indicators of physical fitness; PC3 contrasts the cognitive and the physical tests within the narrow set of physical fitness components; and PC4, finally, contrasts two types of lower muscular fitness differing in speed and power.

contr4 = Dict(
  :School => Grouping(),
  :Child => Grouping(),
  :Cohort => Grouping(),
  :Sex => EffectsCoding(; levels=["Girls", "Boys"]),
  :Test => HypothesisCoding(
    [
      -1 0 0 0 +1
      -3 +2 +2 +2 -3
      0 +2 -1 -1 0
      0 0 +1 -1 0
    ];
    levels=["Run", "Star_r", "S20_r", "SLJ", "BPT"],
    labels=["c5.1", "c234.15", "c2.34", "c3.4"],
  ),
);
m_cpx_1_PC = fit(MixedModel, f_cpx_1, dat; contrasts=contr4)
Minimizing 1870      Time: 0:00:12 ( 6.57 ms/it)
Est. SE z p σ_Child
(Intercept) -0.0188 0.0140 -1.34 0.1789 0.7269
Test: c5.1 -0.0110 0.0426 -0.26 0.7955 1.0970
Test: c234.15 -0.0327 0.1658 -0.20 0.8437 0.4344
Test: c2.34 -0.0149 0.0757 -0.20 0.8436 1.1739
Test: c3.4 -0.0126 0.0442 -0.28 0.7765 1.5239
a1 0.3038 0.0483 6.29 <1e-09
Sex: Boys 0.2262 0.0140 16.15 <1e-57
Test: c5.1 & a1 0.2190 0.1463 1.50 0.1343
Test: c234.15 & a1 -0.3363 0.5723 -0.59 0.5568
Test: c2.34 & a1 -0.0496 0.2658 -0.19 0.8520
Test: c3.4 & a1 0.0400 0.1515 0.26 0.7918
Test: c5.1 & Sex: Boys 0.0407 0.0426 0.95 0.3400
Test: c234.15 & Sex: Boys -0.8463 0.1658 -5.11 <1e-06
Test: c2.34 & Sex: Boys -0.1674 0.0757 -2.21 0.0271
Test: c3.4 & Sex: Boys 0.0367 0.0442 0.83 0.4070
a1 & Sex: Boys -0.0249 0.0483 -0.52 0.6058
Test: c5.1 & a1 & Sex: Boys -0.2049 0.1463 -1.40 0.1612
Test: c234.15 & a1 & Sex: Boys -0.2557 0.5723 -0.45 0.6550
Test: c2.34 & a1 & Sex: Boys -0.3822 0.2658 -1.44 0.1505
Test: c3.4 & a1 & Sex: Boys -0.1708 0.1515 -1.13 0.2595
Residual 0.0000
VarCorr(m_cpx_1_PC)
Column Variance Std.Dev Corr.
Child (Intercept) 0.52833959 0.72686972
Test: c5.1 1.20332652 1.09696241 -0.00
Test: c234.15 0.18866200 0.43435240 -0.92 +0.25
Test: c2.34 1.37797055 1.17386990 +0.59 -0.04 -0.63
Test: c3.4 2.32213690 1.52385593 -0.08 -0.22 -0.20 -0.30
Residual 0.00000000 0.00001896
m_cpx_1_PC.PCA
(Child = 
Principal components based on correlation matrix
 (Intercept)     1.0     .      .      .     .
 Test: c5.1     -0.0    1.0     .      .     .
 Test: c234.15  -0.92   0.25   1.0     .     .
 Test: c2.34     0.59  -0.04  -0.63   1.0    .
 Test: c3.4     -0.08  -0.22  -0.2   -0.3   1.0

Normalized cumulative variances:
[0.4921, 0.7619, 0.9259, 0.9983, 1.0]

Component loadings
                  PC1    PC2    PC3    PC4    PC5
 (Intercept)    -0.59  -0.05  -0.23  -0.49   0.6
 Test: c5.1      0.11  -0.58  -0.78   0.14  -0.12
 Test: c234.15   0.61  -0.21   0.15   0.12   0.74
 Test: c2.34    -0.51  -0.27   0.23   0.76   0.18
 Test: c3.4      0.03   0.74  -0.51   0.38   0.22,)

There is a numerical interaction with a z-value > 2.0 for the first PCA (i.e., BPT - Run_r). This interaction would really need to be replicated to be taken seriously. It is probably due to larger “unfitness” gains in boys than girls (i.e., in BPT) relative to the slightly larger health-related “fitness” gains of girls than boys (i.e., in Run_r).

contr4b = merge(
  Dict(nm => Grouping() for nm in (:School, :Child, :Cohort)),
  Dict(
    :Sex => EffectsCoding(; levels=["Girls", "Boys"]),
    :Test => HypothesisCoding(
      [
        0.49 -0.04 0.20 0.03 -0.85
        0.70 -0.56 -0.21 -0.13 0.37
        0.31 0.68 -0.56 -0.35 0.00
        0.04 0.08 0.61 -0.78 0.13
      ];
      levels=["Run", "Star_r", "S20_r", "SLJ", "BPT"],
      labels=["c5.1", "c234.15", "c12.34", "c3.4"],
    ),
  ),
);
m_cpx_1_PC_2 = fit(MixedModel, f_cpx_1, dat; contrasts=contr4b)
Minimizing 1944      Time: 0:00:12 ( 6.65 ms/it)
Est. SE z p σ_Child
(Intercept) -0.0188 0.0142 -1.32 0.1852 0.7639
Test: c5.1 0.0079 0.0302 0.26 0.7932 0.8044
Test: c234.15 0.0126 0.0302 0.42 0.6764 0.2195
Test: c12.34 0.0040 0.0310 0.13 0.8965 0.7552
Test: c3.4 -0.0114 0.0312 -0.36 0.7153 0.7013
a1 0.2918 0.0489 5.96 <1e-08
Sex: Boys 0.2184 0.0142 15.40 <1e-52
Test: c5.1 & a1 -0.1667 0.1031 -1.62 0.1058
Test: c234.15 & a1 0.0363 0.1049 0.35 0.7296
Test: c12.34 & a1 -0.0300 0.1084 -0.28 0.7818
Test: c3.4 & a1 0.0446 0.1066 0.42 0.6754
Test: c5.1 & Sex: Boys -0.0596 0.0302 -1.97 0.0483
Test: c234.15 & Sex: Boys 0.1534 0.0302 5.09 <1e-06
Test: c12.34 & Sex: Boys -0.0316 0.0310 -1.02 0.3080
Test: c3.4 & Sex: Boys 0.0417 0.0312 1.34 0.1817
a1 & Sex: Boys -0.0221 0.0489 -0.45 0.6523
Test: c5.1 & a1 & Sex: Boys 0.1324 0.1031 1.28 0.1990
Test: c234.15 & a1 & Sex: Boys 0.1531 0.1049 1.46 0.1446
Test: c12.34 & a1 & Sex: Boys -0.0897 0.1084 -0.83 0.4083
Test: c3.4 & a1 & Sex: Boys -0.1393 0.1066 -1.31 0.1913
Residual 0.0000
VarCorr(m_cpx_1_PC_2)
Column Variance Std.Dev Corr.
Child (Intercept) 0.58359539 0.76393415
Test: c5.1 0.64704635 0.80439191 +0.12
Test: c234.15 0.04816176 0.21945789 -0.33 +0.15
Test: c12.34 0.57028932 0.75517502 +0.12 -0.07 +0.60
Test: c3.4 0.49178378 0.70127297 +0.06 +0.30 +0.75 +0.17
Residual 0.00000000 0.00000966
m_cpx_1_PC_2.PCA
(Child = 
Principal components based on correlation matrix
 (Intercept)     1.0     .      .      .      .
 Test: c5.1      0.12   1.0     .      .      .
 Test: c234.15  -0.33   0.15   1.0     .      .
 Test: c12.34    0.12  -0.07   0.6    1.0     .
 Test: c3.4      0.06   0.3    0.75   0.17   1.0

Normalized cumulative variances:
[0.4222, 0.6648, 0.8771, 0.999, 1.0]

Component loadings
                  PC1    PC2    PC3    PC4    PC5
 (Intercept)    -0.1    0.65   0.63   0.25  -0.32
 Test: c5.1      0.21   0.66  -0.38  -0.62   0.03
 Test: c234.15   0.67  -0.2   -0.06   0.02  -0.71
 Test: c12.34    0.42  -0.2    0.65  -0.46   0.39
 Test: c3.4      0.57   0.26  -0.16   0.59   0.49,)
f_zcp_1 = @formula(zScore ~ 1 + Test*a1*Sex + zerocorr(1 + Test | Child))
m_zcp_1_PC_2 = fit(MixedModel, f_zcp_1, dat; contrasts=contr4b)
Minimizing 706   Time: 0:00:03 ( 4.27 ms/it)
Est. SE z p σ_Child
(Intercept) -0.0192 0.0142 -1.35 0.1781 0.7735
Test: c5.1 0.0127 0.0305 0.42 0.6766 0.6827
Test: c234.15 0.0108 0.0294 0.37 0.7125 0.1493
Test: c12.34 0.0016 0.0312 0.05 0.9602 0.8373
Test: c3.4 -0.0065 0.0319 -0.20 0.8392 0.7042
a1 0.2909 0.0491 5.93 <1e-08
Sex: Boys 0.2184 0.0142 15.35 <1e-52
Test: c5.1 & a1 -0.1680 0.1040 -1.61 0.1063
Test: c234.15 & a1 0.0412 0.1023 0.40 0.6873
Test: c12.34 & a1 -0.0260 0.1090 -0.24 0.8114
Test: c3.4 & a1 0.0514 0.1090 0.47 0.6370
Test: c5.1 & Sex: Boys -0.0572 0.0305 -1.87 0.0608
Test: c234.15 & Sex: Boys 0.1556 0.0294 5.29 <1e-06
Test: c12.34 & Sex: Boys -0.0274 0.0312 -0.88 0.3788
Test: c3.4 & Sex: Boys 0.0420 0.0319 1.32 0.1877
a1 & Sex: Boys -0.0239 0.0491 -0.49 0.6265
Test: c5.1 & a1 & Sex: Boys 0.1355 0.1040 1.30 0.1928
Test: c234.15 & a1 & Sex: Boys 0.1592 0.1023 1.56 0.1197
Test: c12.34 & a1 & Sex: Boys -0.0943 0.1090 -0.87 0.3868
Test: c3.4 & a1 & Sex: Boys -0.1326 0.1090 -1.22 0.2238
Residual 0.0000
VarCorr(m_zcp_1_PC_2)
Column Variance Std.Dev Corr.
Child (Intercept) 0.59835584 0.77353464
Test: c5.1 0.46605890 0.68268506 .
Test: c234.15 0.02230000 0.14933184 . .
Test: c12.34 0.70113074 0.83733550 . . .
Test: c3.4 0.49587481 0.70418379 . . . .
Residual 0.00000000 0.00000246
MixedModels.likelihoodratiotest(m_zcp_1_PC_2, m_cpx_1_PC_2)
model-dof deviance χ² χ²-dof P(>χ²)
zScore ~ 1 + Test + a1 + Sex + Test & a1 + Test & Sex + a1 & Sex + Test & a1 & Sex + MixedModels.ZeroCorr((1 + Test | Child)) 26 13310
zScore ~ 1 + Test + a1 + Sex + Test & a1 + Test & Sex + a1 & Sex + Test & a1 & Sex + (1 + Test | Child) 36 13341 -31 10 NaN

3. Other topics

3.1 Contrasts are re-parameterizations of the same model

The choice of contrast does not affect the model objective, in other words, they all yield the same goodness of fit. It does not matter whether a contrast is orthogonal or not.

[
  objective(m_ovi_SeqDiff),
  objective(m_ovi_Helmert),
  objective(m_ovi_Hypo),
]
3-element Vector{Float64}:
 13778.065241357293
 13778.065241357299
 13778.065241357299

3.2 VCs and CPs depend on contrast coding

Trivially, the meaning of a contrast depends on its definition. Consequently, the contrast specification has a big effect on the random-effect structure. As an illustration, we refit the LMMs with variance components (VCs) and correlation parameters (CPs) for Child-related contrasts of Test. Unfortunately, it is not easy, actually rather quite difficult, to grasp the meaning of correlations of contrast-based effects; they represent two-way interactions.

begin
  f_Child = @formula zScore ~
    1 + Test * a1 * Sex + (1 + Test | Child)
  m_Child_SDC = fit(MixedModel, f_Child, dat; contrasts=contr1)
  m_Child_HeC = fit(MixedModel, f_Child, dat; contrasts=contr2)
  m_Child_HyC = fit(MixedModel, f_Child, dat; contrasts=contr3)
  m_Child_PCA = fit(MixedModel, f_Child, dat; contrasts=contr4)
end
Minimizing 1870      Time: 0:00:12 ( 6.65 ms/it)
Est. SE z p σ_Child
(Intercept) -0.0188 0.0140 -1.34 0.1789 0.7269
Test: c5.1 -0.0110 0.0426 -0.26 0.7955 1.0970
Test: c234.15 -0.0327 0.1658 -0.20 0.8437 0.4344
Test: c2.34 -0.0149 0.0757 -0.20 0.8436 1.1739
Test: c3.4 -0.0126 0.0442 -0.28 0.7765 1.5239
a1 0.3038 0.0483 6.29 <1e-09
Sex: Boys 0.2262 0.0140 16.15 <1e-57
Test: c5.1 & a1 0.2190 0.1463 1.50 0.1343
Test: c234.15 & a1 -0.3363 0.5723 -0.59 0.5568
Test: c2.34 & a1 -0.0496 0.2658 -0.19 0.8520
Test: c3.4 & a1 0.0400 0.1515 0.26 0.7918
Test: c5.1 & Sex: Boys 0.0407 0.0426 0.95 0.3400
Test: c234.15 & Sex: Boys -0.8463 0.1658 -5.11 <1e-06
Test: c2.34 & Sex: Boys -0.1674 0.0757 -2.21 0.0271
Test: c3.4 & Sex: Boys 0.0367 0.0442 0.83 0.4070
a1 & Sex: Boys -0.0249 0.0483 -0.52 0.6058
Test: c5.1 & a1 & Sex: Boys -0.2049 0.1463 -1.40 0.1612
Test: c234.15 & a1 & Sex: Boys -0.2557 0.5723 -0.45 0.6550
Test: c2.34 & a1 & Sex: Boys -0.3822 0.2658 -1.44 0.1505
Test: c3.4 & a1 & Sex: Boys -0.1708 0.1515 -1.13 0.2595
Residual 0.0000
VarCorr(m_Child_SDC)
Column Variance Std.Dev Corr.
Child (Intercept) 0.5663537 0.7525647
Test: Star_r 0.2621396 0.5119957 +0.40
Test: S20_r 0.8578761 0.9262160 -0.07 -0.07
Test: SLJ 0.5492260 0.7410978 -0.06 +0.76 -0.55
Test: BPT 0.9810827 0.9904962 -0.14 -0.54 +0.16 -0.53
Residual 0.0000000 0.0000321
VarCorr(m_Child_HeC)
Column Variance Std.Dev Corr.
Child (Intercept) 0.54474446 0.73806806
Test: Star_r 0.07683435 0.27719010 +0.20
Test: S20_r 0.12889480 0.35901922 -0.08 -0.24
Test: SLJ 0.03889375 0.19721499 +0.04 +0.52 +0.52
Test: BPT 0.02561024 0.16003200 -0.09 +0.51 -0.13 +0.11
Residual 0.00000000 0.00006133
VarCorr(m_Child_HyC)
Column Variance Std.Dev Corr.
Child (Intercept) 0.58573612 0.76533400
Test: BPT-other 0.69040770 0.83090776 +0.98
Test: Star-End 0.37368594 0.61129857 +0.06 +0.22
Test: S20-Star 1.58357772 1.25840284 -0.04 -0.17 -0.54
Test: SLJ-S20 0.67426163 0.82113436 -0.03 +0.13 +0.81 -0.45
Residual 0.00000000 0.00001723
VarCorr(m_Child_PCA)
Column Variance Std.Dev Corr.
Child (Intercept) 0.52833959 0.72686972
Test: c5.1 1.20332652 1.09696241 -0.00
Test: c234.15 0.18866200 0.43435240 -0.92 +0.25
Test: c2.34 1.37797055 1.17386990 +0.59 -0.04 -0.63
Test: c3.4 2.32213690 1.52385593 -0.08 -0.22 -0.20 -0.30
Residual 0.00000000 0.00001896

The CPs for the various contrasts are in line with expectations. For the SDC we observe substantial negative CPs between neighboring contrasts. For the orthogonal HeC, all CPs are small; they are uncorrelated. HyC contains some of the SDC contrasts and we observe again the negative CPs. The (roughly) PCA-based contrasts are small with one exception; there is a sizeable CP of +.41 between GM and the core of adjusted physical fitness (c234.15).

Do these differences in CPs imply that we can move to zcpLMMs when we have orthogonal contrasts? We pursue this question with by refitting the four LMMs with zerocorr() and compare the goodness of fit.

begin
  f_Child0 = @formula zScore ~
    1 + Test * a1 * Sex + zerocorr(1 + Test | Child)
  m_Child_SDC0 = fit(MixedModel, f_Child0, dat; contrasts=contr1)
  m_Child_HeC0 = fit(MixedModel, f_Child0, dat; contrasts=contr2)
  m_Child_HyC0 = fit(MixedModel, f_Child0, dat; contrasts=contr3)
  m_Child_PCA0 = fit(MixedModel, f_Child0, dat; contrasts=contr4)
end
Minimizing 562   Time: 0:00:02 ( 4.42 ms/it)
Est. SE z p σ_Child
(Intercept) -0.0189 0.0140 -1.35 0.1776 0.7670
Test: c5.1 -0.0183 0.0429 -0.43 0.6696 1.0793
Test: c234.15 -0.0063 0.1665 -0.04 0.9697 1.3105
Test: c2.34 0.0022 0.0755 0.03 0.9769 1.7288
Test: c3.4 -0.0090 0.0442 -0.20 0.8391 1.0606
a1 0.2987 0.0484 6.18 <1e-09
Sex: Boys 0.2258 0.0140 16.10 <1e-57
Test: c5.1 & a1 0.2162 0.1469 1.47 0.1412
Test: c234.15 & a1 -0.4236 0.5742 -0.74 0.4607
Test: c2.34 & a1 -0.0184 0.2653 -0.07 0.9448
Test: c3.4 & a1 0.0461 0.1514 0.30 0.7608
Test: c5.1 & Sex: Boys 0.0384 0.0429 0.90 0.3707
Test: c234.15 & Sex: Boys -0.8584 0.1665 -5.16 <1e-06
Test: c2.34 & Sex: Boys -0.1552 0.0755 -2.05 0.0399
Test: c3.4 & Sex: Boys 0.0399 0.0442 0.90 0.3672
a1 & Sex: Boys -0.0256 0.0484 -0.53 0.5966
Test: c5.1 & a1 & Sex: Boys -0.1974 0.1469 -1.34 0.1791
Test: c234.15 & a1 & Sex: Boys -0.2249 0.5742 -0.39 0.6953
Test: c2.34 & a1 & Sex: Boys -0.3658 0.2653 -1.38 0.1679
Test: c3.4 & a1 & Sex: Boys -0.1569 0.1514 -1.04 0.2999
Residual 0.0000
MixedModels.likelihoodratiotest(m_Child_SDC0, m_Child_SDC)
model-dof deviance χ² χ²-dof P(>χ²)
zScore ~ 1 + Test + a1 + Sex + Test & a1 + Test & Sex + a1 & Sex + Test & a1 & Sex + MixedModels.ZeroCorr((1 + Test | Child)) 26 13765
zScore ~ 1 + Test + a1 + Sex + Test & a1 + Test & Sex + a1 & Sex + Test & a1 & Sex + (1 + Test | Child) 36 13385 380 10 <1e-74
MixedModels.likelihoodratiotest(m_Child_HeC0, m_Child_HeC)
model-dof deviance χ² χ²-dof P(>χ²)
zScore ~ 1 + Test + a1 + Sex + Test & a1 + Test & Sex + a1 & Sex + Test & a1 & Sex + MixedModels.ZeroCorr((1 + Test | Child)) 26 13329
zScore ~ 1 + Test + a1 + Sex + Test & a1 + Test & Sex + a1 & Sex + Test & a1 & Sex + (1 + Test | Child) 36 13405 -76 10 NaN
MixedModels.likelihoodratiotest(m_Child_HyC0, m_Child_HyC)
model-dof deviance χ² χ²-dof P(>χ²)
zScore ~ 1 + Test + a1 + Sex + Test & a1 + Test & Sex + a1 & Sex + Test & a1 & Sex + MixedModels.ZeroCorr((1 + Test | Child)) 26 13349
zScore ~ 1 + Test + a1 + Sex + Test & a1 + Test & Sex + a1 & Sex + Test & a1 & Sex + (1 + Test | Child) 36 13371 -22 10 NaN
MixedModels.likelihoodratiotest(m_Child_PCA0, m_Child_PCA)
model-dof deviance χ² χ²-dof P(>χ²)
zScore ~ 1 + Test + a1 + Sex + Test & a1 + Test & Sex + a1 & Sex + Test & a1 & Sex + MixedModels.ZeroCorr((1 + Test | Child)) 26 13359
zScore ~ 1 + Test + a1 + Sex + Test & a1 + Test & Sex + a1 & Sex + Test & a1 & Sex + (1 + Test | Child) 36 13369 -11 10 NaN

Obviously, we can not drop CPs from any of the LMMs. The full LMMs all have the same objective, but we can compare the goodness-of-fit statistics of zcpLMMs more directly.

begin
  zcpLMM = ["SDC0", "HeC0", "HyC0", "PCA0"]
  mods = [m_Child_SDC0, m_Child_HeC0, m_Child_HyC0, m_Child_PCA0]
  gof_summary = sort!(
    DataFrame(;
      zcpLMM=zcpLMM,
      dof=dof.(mods),
      deviance=deviance.(mods),
      AIC=aic.(mods),
      BIC=bic.(mods),
    ),
    :deviance,
  )
end

4 rows × 5 columns

zcpLMMdofdevianceAICBIC
StringInt64Float64Float64Float64
1HeC02613329.113381.113550.6
2HyC02613349.213401.213570.7
3PCA02613358.613410.613580.1
4SDC02613765.113817.113986.5

The best fit was obtained for the PCA-based zcpLMM. Somewhat surprisingly the second best fit was obtained for the SDC. The relatively poor performance of HeC-based zcpLMM is puzzling to me. I thought it might be related to imbalance in design in the present data, but this does not appear to be the case. The same comparison of SequentialDifferenceCoding and Helmert Coding also showed a worse fit for the zcp-HeC LMM than the zcp-SDC LMM.

3.3 VCs and CPs depend on random factor

VCs and CPs resulting from a set of test contrasts can also be estimated for the random factor School. Of course, these VCs and CPs may look different from the ones we just estimated for Child.

The effect of age (i.e., developmental gain) varies within School. Therefore, we also include its VCs and CPs in this model; the school-related VC for Sex was not significant.

f_School = @formula zScore ~
  1 + Test * a1 * Sex + (1 + Test + a1 | School);
m_School_SeqDiff = fit(MixedModel, f_School, dat; contrasts=contr1);
m_School_Helmert = fit(MixedModel, f_School, dat; contrasts=contr2);
m_School_Hypo = fit(MixedModel, f_School, dat; contrasts=contr3);
m_School_PCA = fit(MixedModel, f_School, dat; contrasts=contr4);
Minimizing 1077      Time: 0:00:00 ( 0.72 ms/it)
VarCorr(m_School_SeqDiff)
Column Variance Std.Dev Corr.
School (Intercept) 0.034491 0.185718
Test: Star_r 0.226045 0.475441 +0.19
Test: S20_r 0.141041 0.375554 +0.03 -0.72
Test: SLJ 0.084395 0.290507 -0.15 +0.24 -0.53
Test: BPT 0.094110 0.306773 -0.61 -0.24 +0.15 -0.34
a1 0.055720 0.236051 +0.13 +0.16 +0.12 -0.31 -0.56
Residual 0.837440 0.915118
VarCorr(m_School_Helmert)
Column Variance Std.Dev Corr.
School (Intercept) 0.0344864 0.1857052
Test: Star_r 0.0565120 0.2377225 +0.19
Test: S20_r 0.0076090 0.0872298 +0.21 -0.13
Test: SLJ 0.0037642 0.0613533 -0.03 +0.19 +0.07
Test: BPT 0.0032827 0.0572944 -0.67 -0.14 +0.04 +0.21
a1 0.0556419 0.2358853 +0.13 +0.16 +0.31 -0.14 -0.69
Residual 0.8374421 0.9151187
VarCorr(m_School_Hypo)
Column Variance Std.Dev Corr.
School (Intercept) 0.034500 0.185741
Test: BPT-other 1.313904 1.146256 -0.67
Test: Star-End 0.226131 0.475533 +0.19 -0.14
Test: S20-Star 0.141009 0.375512 +0.03 +0.11 -0.72
Test: SLJ-S20 0.084454 0.290609 -0.15 +0.15 +0.24 -0.53
a1 0.055614 0.235826 +0.13 -0.69 +0.16 +0.12 -0.31
Residual 0.837426 0.915110
VarCorr(m_School_PCA)
Column Variance Std.Dev Corr.
School (Intercept) 0.034489 0.185712
Test: c5.1 0.141115 0.375652 -0.35
Test: c234.15 2.605393 1.614123 +0.55 +0.15
Test: c2.34 0.419017 0.647315 +0.04 +0.11 +0.42
Test: c3.4 0.084451 0.290604 +0.15 -0.27 -0.10 -0.16
a1 0.055730 0.236072 +0.13 -0.37 +0.54 -0.00 +0.31
Residual 0.837441 0.915118

We compare again how much of the fit resides in the CPs.

begin
  f_School0 = @formula zScore ~
    1 + Test * a1 * Sex + zerocorr(1 + Test + a1 | School)
  m_School_SDC0 = fit(MixedModel, f_School0, dat; contrasts=contr1)
  m_School_HeC0 = fit(MixedModel, f_School0, dat; contrasts=contr2)
  m_School_HyC0 = fit(MixedModel, f_School0, dat; contrasts=contr3)
  m_School_PCA0 = fit(MixedModel, f_School0, dat; contrasts=contr4)
  #
  zcpLMM2 = ["SDC0", "HeC0", "HyC0", "PCA0"]
  mods2 = [
    m_School_SDC0, m_School_HeC0, m_School_HyC0, m_School_PCA0
  ]
  gof_summary2 = sort!(
    DataFrame(;
      zcpLMM=zcpLMM2,
      dof=dof.(mods2),
      deviance=deviance.(mods2),
      AIC=aic.(mods2),
      BIC=bic.(mods2),
    ),
    :deviance,
  )
end

4 rows × 5 columns

zcpLMMdofdevianceAICBIC
StringInt64Float64Float64Float64
1HeC02713765.313819.313995.2
2PCA02713767.213821.213997.2
3HyC02713777.613831.614007.6
4SDC02713780.613834.614010.6

For the random factor School the Helmert contrast, followed by PCA-based contrasts have least information in the CPs; SDC has the largest contribution from CPs. Interesting.

5. That’s it

That’s it for this tutorial. It is time to try your own contrast coding. You can use these data; there are many alternatives to set up hypotheses for the five tests. Of course and even better, code up some contrasts for data of your own.

Have fun!

References