A large-scale designed experiment

Load the packages to be used.

Code
using AlgebraOfGraphics
using Arrow
using CairoMakie
using Chain
using DataFrameMacros
using DataFrames
using Effects
using MixedModels
using MixedModelsMakie
using ProgressMeter
using StandardizedPredictors
using StatsBase

datadir(paths::AbstractString...) = joinpath(@__DIR__, "data", paths...)
CairoMakie.activate!(; type="svg");
ProgressMeter.ijulia_behavior(:clear);
┌ Info: Precompiling Effects [8f03c58b-bd97-4933-a826-f71b64d2cca2]
└ @ Base loading.jl:1662
┌ Info: Precompiling StandardizedPredictors [5064a6a7-f8c2-40e2-8bdc-797ec6f1ae18]
└ @ Base loading.jl:1662

The English Lexicon Project (Balota et al., 2007) was a large-scale multicenter study to examine properties of English words. It incorporated both a lexical decision task and a word recognition task. Different groups of subjects participated in the different tasks.

Extracting data tables from the raw data

The raw data are available as an OSF project as Zip files for each of the tasks. These Zip files contain one data file for each participant, which has a mixture of demographic data, responses on some pre-tests, and the actual trial runs.

Parsing these data files is not fun – see this repository for some of the code used to untangle the data. (This respository is an unregistered Julia package.)

Some lessons from this:

  • When an identifier is described as a “unique subject id”, it probably isn’t.
  • In a multi-center trial, the coordinating center should assign the range of id’s for each satellite site. Failure of a satellite site to stay within its range should result in banishment to a Siberian work camp.
  • As with all data cleaning, the prevailing attitude should be “trust, but verify”. Just because you are told that the file is in a certain format, doesn’t mean it is. Just because you are told that the identifiers are unique doesn’t mean they are, etc.
  • It works best if each file has a well-defined, preferably simple, stucture. These data files had two different formats mushed together.
  • This is the idea of “tidy data” - each file contains only one type of record along with well-defined rules of how you relate one file to another.
  • If one of the fields is a date, declare the only acceptable form of writing a date, preferably yyyy-mm-dd. Anyone getting creative about the format of the dates will be required to write the software to parse that form (and that is usually not an easy task).
  • As you make changes in a file, document them. If you look at the EnglishLexicon.jl repository you will see that it is in the form of scripts that take the original Zip files and produce the Arrow files. That way, if necessary, the changes can be undone or modified.
  • Remember, the data are only as useful as their provenance. If you invested a lot of time and money in gathering the data you should treat it as a valued resource and exercise great care with it.
  • The Arrow.jl package allows you to add metadata as key/value pairs, called a Dict (or dictionary). Use this capability. The name of the file is not a suitable location for metadata.

Trial-level data from the LDT

In the lexical decision task the study participant is shown a character string, under carefully controlled conditions, and responds according to whether they identify the string as a word or not. Two responses are recorded: whether the choice of word/non-word is correct and the time that elapsed between exposure to the string and registering a decision.

Several covariates, some relating to the subject and some relating to the target, were recorded. Initially we consider only the trial-level data.

ldttrial = Arrow.Table(datadir("ELP_ldt_trial.arrow"))
Arrow.Table with 2745952 rows, 5 columns, and schema:
 :subj  Int16
 :seq   Int16
 :acc   Union{Missing, Bool}
 :rt    Int16
 :item  String

with metadata given by a Base.ImmutableDict{String, String} with 3 entries:
  "title"     => "Trial-level data from Lexical Discrimination Task in the Engl…
  "reference" => "Balota et al. (2007), The English Lexicon Project, Behavior R…
  "source"    => "https://osf.io/n63s2"

The two response variables are acc - the accuracy of the response - and rt, the response time in milliseconds. There is one trial-level covariate, seq, the sequence number of the trial within subj. Each subject participated in two sessions on different days, with 2000 trials recorded on the first day.

Notice the metadata with a citation and a URL for the OSF project.

We convert to a DataFrame and add a Boolean column s2 which is true for trials in the second session.

ldttrial = @transform!(DataFrame(ldttrial), :s2 = :seq > 2000)
describe(ldttrial)

6 rows × 7 columns

variablemeanminmedianmaxnmissingeltype
SymbolUnion…AnyUnion…AnyInt64Type
1subj409.311409.08160Int16
2seq1687.2111687.033740Int16
3acc0.8560401.011370Union{Missing, Bool}
4rt846.325-16160732.0320610Int16
5itemAarodzuss0String
6s20.40712800.010Bool

Initial data exploration

From the basic summary of ldttrial we can see that there are some questionable response times — negative values and values over 32 seconds.

Because of obvious outliers we will use the median response time, which is not strongly influenced by outliers, rather than the mean response time when summarizing by item or by subject.

Also, there are missing values of the accuracy. We should check if these are associated with particular subjects or particular items.

Summaries by item

To summarize by item we group the trials by item and use combine to produce the various summary statistics. As we will create similar summaries by subject, we incorporate an ‘i’ in the names of these summaries (and an ‘s’ in the name of the summaries by subject) to be able to identify the grouping used.

byitem = @chain ldttrial begin
  groupby(:item)
  @combine(
    :ni = length(:acc),               # no. of obs
    :imiss = count(ismissing, :acc),  # no. of missing acc
    :iacc = count(skipmissing(:acc)), # no. of accurate
    :imedianrt = median(:rt),
  )
  @transform!(
    :wrdlen = Int8(length(:item)),
    :ipropacc = :iacc / :ni
  )
end

80,962 rows × 7 columns

itemniimissiaccimedianrtwrdlenipropacc
StringInt64Int64Int64Float64Int8Float64
1a35026743.010.742857
2e35019824.010.542857
3aah34021770.530.617647
4aal34032702.530.941176
5Aaron33031625.050.939394
6Aarod33023810.050.69697
7aback34015710.050.441176
8ahack34034662.051.0
9abacus34017671.560.5
10alacus34029640.060.852941
11abandon34032641.070.941176
12acandon34033725.570.970588
13abandoned34031667.590.911765
14adandoned34011760.590.323529
15abandoning34034662.0101.0
16abantoning34030848.5100.882353
17abandonment35035734.0111.0
18apandonment35030817.0110.857143
19abase34123750.550.676471
20abose34023805.550.676471
21abasement33017850.090.515152
22afasement33030649.090.909091
23abash32022727.550.6875
24adash32025784.550.78125
25abate34024687.050.705882
26abape34032675.050.941176
27abated34023775.060.676471
28agated34014897.560.411765
29abbess3407837.560.205882
30abbass34028788.060.823529

It can be seen that the items occur in word/nonword pairs and the pairs are sorted alphabetically by the word in the pair (ignoring case). We can add the word/nonword status for the items as

byitem.isword = isodd.(eachindex(byitem.item))
describe(byitem)

8 rows × 7 columns

variablemeanminmedianmaxnmissingeltype
SymbolUnion…AnyUnion…AnyInt64DataType
1itemAarodzuss0String
2ni33.91663034.0370Int64
3imiss0.016921500.020Int64
4iacc29.0194031.0370Int64
5imedianrt753.069458.0737.51691.00Float64
6wrdlen7.998818.0210Int8
7ipropacc0.8556160.00.9117651.00Float64
8isword0.500.510Bool

This table shows that some of the items were never identified correctly. These are

filter(:iacc => iszero, byitem)

9 rows × 8 columns

itemniimissiaccimedianrtwrdlenipropaccisword
StringInt64Int64Int64Float64Int8Float64Bool
1baobab3400616.560.01
2haulage3400708.570.01
3leitmotif3500688.090.01
4miasmal3500774.070.01
5peahen3400684.060.01
6plosive3400663.070.01
7plugugly3300709.080.01
8poshest3400740.070.01
9servo3300697.050.01

Notice that these are all words but somewhat obscure words such that none of the subjects exposed to the word identified it correctly.

We can incorporate characteristics like wrdlen and isword back into the original trial table with a “left join”. This operation joins two tables by values in a common column. It is called a left join because the left (or first) table takes precedence, in the sense that every row in the left table is present in the result. If there is no matching row in the second table then missing values are inserted for the columns from the right table in the result.

describe(
  leftjoin!(
    ldttrial,
    select(byitem, :item, :wrdlen, :isword);
    on=:item,
  ),
)

8 rows × 7 columns

variablemeanminmedianmaxnmissingeltype
SymbolUnion…AnyUnion…AnyInt64Type
1subj409.311409.08160Int16
2seq1687.2111687.033740Int16
3acc0.8560401.011370Union{Missing, Bool}
4rt846.325-16160732.0320610Int16
5itemAarodzuss0String
6s20.40712800.010Bool
7wrdlen7.9983518.0210Union{Missing, Int8}
8isword0.49999500.010Union{Missing, Bool}

Notice that the wrdlen and isword variables in this table allow for missing values, because they are derived from the second argument, but there are no missing values for these variables. If there is no need to allow for missing values, there is a slight advantage in disallowing them in the element type, because the code to check for and handle missing values is not needed.

This could be done separately for each column or for the whole data frame, as in

describe(disallowmissing!(ldttrial; error=false))

8 rows × 7 columns

variablemeanminmedianmaxnmissingeltype
SymbolUnion…AnyUnion…AnyInt64Type
1subj409.311409.08160Int16
2seq1687.2111687.033740Int16
3acc0.8560401.011370Union{Missing, Bool}
4rt846.325-16160732.0320610Int16
5itemAarodzuss0String
6s20.40712800.010Bool
7wrdlen7.9983518.0210Int8
8isword0.49999500.010Bool

The named argument error=false is required because there is one column, acc, that does incorporate missing values. If error=false were not given then the error thrown when trying to disallowmissing on the acc column would be propagated and the top-level call would fail.

A barchart of the word length counts, Figure 1, shows that the majority of the items are between 3 and 14 characters.

Code
let
  wlen = 1:21
  draw(
    data((; wrdlen=wlen, count=counts(byitem.wrdlen, wlen))) *
    mapping(:wrdlen => "Length of word", :count) *
    visual(BarPlot),
  )
end

Figure 1: Histogram of word lengths in the items used in the lexical decision task.

To examine trends in accuracy by word length we create a plot of the response versus word length using just a scatterplot smoother. It would not be meaningful to plot the raw data because that would just provide horizontal lines at \(\pm 1\). Instead we add the smoother to show the trend and omit the raw data points.

The resulting plot, Figure 2, shows the accuracy of identifying words is more-or-less constant at around 84%, but accuracy decreases with increasing word length for the nonwords.

Code
draw(
  data(@subset(ldttrial, !ismissing(:acc))) *
  mapping(
    :wrdlen => "Word length",
    :acc => "Accuracy";
    color=:isword,
  ) *
  smooth();
  figure=(; resolution=(800, 450)),
)

Figure 2: Smoothed curves of accuracy versus word length in the lexical decision task.

Figure 2 may be a bit misleading because the largest discrepancies in proportion of accurate identifications of words and nonwords occur for the longest words, of which there are few. Over 95% of the words are between 4 and 13 characters in length

count(x -> 4  x  13, byitem.wrdlen) / nrow(byitem)
0.9654899829549666

If we restrict the smoother curves to this range, as in Figure 3,

Code
draw(
  data(@subset(ldttrial, !ismissing(:acc), 4  :wrdlen  13)) *
  mapping(
    :wrdlen => "Word length",
    :acc => "Accuracy";
    color=:isword,
  ) *
  smooth();
  figure=(; resolution=(800, 450)),
)

Figure 3: Smoothed curves of accuracy versus word length in the range 4 to 13 characters in the lexical decision task.

the differences are less dramatic.

Another way to visualize these results is by plotting the proportion accurate versus word-length separately for words and non-words with the area of each marker proportional to the number of observations for that combinations (Figure 4).

Code
let
  itemsummry = combine(
    groupby(byitem, [:wrdlen, :isword]),
    :ni => sum,
    :imiss => sum,
    :iacc => sum,
  )
  @transform!(
    itemsummry,
    :iacc_mean = :iacc_sum / (:ni_sum - :imiss_sum)
  )
  @transform!(itemsummry, :msz = sqrt((:ni_sum - :imiss_sum) / 800))
  draw(
    data(itemsummry) * mapping(
      :wrdlen => "Word length",
      :iacc_mean => "Proportion accurate";
      color=:isword,
      markersize=:msz,
    );
    figure=(; resolution=(800, 450)),
  )
end

Figure 4: Proportion of accurate trials in the LDT versus word length separately for words and non-words. The area of the marker is proportional to the number of observations represented.

The pattern in the range of word lengths with non-negligible counts (there are points in the plot down to word lengths of 1 and up to word lengths of 21 but these points are very small) is that the accuracy for words is nearly constant at about 84% and the accuracy fof nonwords is slightly higher until lengths of 13, at which point it falls off a bit.

Summaries by subject

A summary of accuracy and median response time by subject

bysubj = @chain ldttrial begin
  groupby(:subj)
  @combine(
    :ns = length(:acc),               # no. of obs
    :smiss = count(ismissing, :acc),  # no. of missing acc
    :sacc = count(skipmissing(:acc)), # no. of accurate
    :smedianrt = median(:rt),
  )
  @transform!(:spropacc = :sacc / :ns)
end

814 rows × 6 columns

subjnssmisssaccsmedianrtspropacc
Int16Int64Int64Int64Float64Float64
11337403158554.00.935981
22337213031960.00.898873
33337233006813.00.891459
44337413062619.00.907528
55337402574677.00.762893
66337402927855.00.867516
77337442877918.50.852697
883372127311310.00.809905
993374132669657.00.791049
1010337402722757.00.806758
1111337402894632.00.857736
1212337442979692.00.882928
13133374229801114.00.883225
1414337412697603.00.799348
1515337202957729.00.876928
1616337402924710.00.866627
1717337412947755.00.873444
1818337402851617.00.844991
1919337402890724.00.85655
2020337202905858.00.861507
21213372030511041.00.904804
2222337222756972.50.817319
2323337432543629.50.753705
2424337402995644.00.88767
2525337202988732.50.886121
2626337403024830.00.896266
27273374127741099.50.82217
2828337212898823.50.859431
29293372030221052.50.896204
3030337402946680.00.873148

shows some anomalies

describe(bysubj)

6 rows × 7 columns

variablemeanminmedianmaxnmissingeltype
SymbolFloat64RealFloat64RealInt64DataType
1subj409.3111409.58160Int16
2ns3373.4133703374.033740Int64
3smiss1.6830501.0220Int64
4sacc2886.3317272928.032860Int64
5smedianrt760.992205.0735.01804.00Float64
6spropacc0.8556130.5118550.8680310.9739180Float64

First, some subjects are accurate on only about half of their trials, which is the proportion that would be expected from random guessing. A plot of the median response time versus proportion accurate, Figure 5, shows that the subjects with lower accuracy are some of the fastest responders, further indicating that these subjects are sacrificing accuracy for speed.

Code
draw(
  data(bysubj) *
  mapping(
    :spropacc => "Proportion accurate",
    :smedianrt => "Median response time (ms)",
  ) *
  (smooth() + visual(Scatter));
)

Figure 5: Median response time versus proportion accurate by subject in the LDT.

As described in Balota et al. (2007), the participants performed the trials in blocks of 250 followed by a short break. During the break they were given feedback concerning accuracy and response latency in the previous block of trials. If the accuracy was less than 80% the participant was encouraged to improve their accuracy. Similarly, if the mean response latency was greater than 1000 ms, the participant was encouraged to decrease their response time. During the trials immediate feedback was given if the response was incorrect.

Nevertheless, approximately 15% of the subjects were unable to maintain 80% accuracy on their trials

count(<(0.8), bysubj.spropacc) / nrow(bysubj)
0.15233415233415235

and there is some association of faster response times with low accuracy. The majority of the subjects whose median response time is less than 500 ms. are accurate on less than 75% of their trials. Another way of characterizing the relationship is that none of the subjects with 90% accuracy or greater had a median response time less than 500 ms.

minimum(@subset(bysubj, :spropacc > 0.9).smedianrt)
505.0

It is common in analyses of response latency in a lexical discrimination task to consider only the latencies on correct identifications and to trim outliers. In Balota et al. (2007) a two-stage outlier removal strategy was used; first removing responses less than 200 ms or greater than 3000 ms then removing responses more than three standard deviations from the participant’s mean response.

As described in Section 2.2.3 we will analyze these data on a speed scale (the inverse of response time) using only the first-stage outlier removal of response latencies less than 200 ms or greater than 3000 ms. On the speed scale the limits are 0.333 per second up to 5 per second.

To examine the effects of the fast but inaccurate responders we will fit models to the data from all the participants and to the data from the 85% of participants who maintained an overall accuracy of 80% or greater.

pruned = @chain ldttrial begin
  @subset(!ismissing(:acc), 200  :rt  3000,)
  leftjoin!(select(bysubj, :subj, :spropacc); on=:subj)
  dropmissing!
end
size(pruned)
(2714311, 9)
describe(pruned)

9 rows × 7 columns

variablemeanminmedianmaxnmissingeltype
SymbolUnion…AnyUnion…AnyInt64DataType
1subj409.8021410.08160Int16
2seq1684.5611684.033740Int16
3acc0.85988401.010Bool
4rt838.712200733.030000Int16
5itemAarodzuss0String
6s20.4066300.010Bool
7wrdlen7.9924418.0210Int8
8isword0.50012601.010Bool
9spropacc0.8571690.5118550.8692950.9739180Float64

Choice of response scale

As we have indicated, generally the response times are analyzed for the correct identifications only. Furthermore, unrealistically large or small response times are eliminated. For this example we only use the responses between 200 and 3000 ms.

A density plot of the pruned response times, Figure 6, shows they are skewed to the right.

Code
draw(
  data(pruned) *
  mapping(:rt => "Response time (ms.) for correct responses") *
  AlgebraOfGraphics.density();
  figure=(; resolution=(800, 450)),
)

Figure 6: Kernel density plot of the pruned response times (ms.) in the LDT.

In such cases it is common to transform the response to a scale such as the logarithm of the response time or to the speed of the response, which is the inverse of the response time.

The density of the response speed, in responses per second, is shown in Figure 7.

Code
draw(
  data(pruned) *
  mapping(
    :rt => (x -> 1000 / x) => "Response speed (s⁻¹) for correct responses") *
  AlgebraOfGraphics.density();
  figure=(; resolution=(800, 450)),
)

Figure 7: Kernel density plot of the pruned response speed in the LDT.

Figure 6 and Figure 7 indicate that it may be more reasonable to establish a lower bound of 1/3 second (333 ms) on the response latency, corresponding to an upper bound of 3 per second on the response speed. However, only about one half of one percent of the correct responses have latencies in the range of 200 ms. to 333 ms.

count(
  r -> !ismissing(r.acc) && 200 < r.rt < 333,
  eachrow(ldttrial),
) / count(!ismissing, ldttrial.acc)
0.005867195806137328

so the exact position of the lower cut-off point on the response latencies is unlikely to be very important.

If you examine the code for (fit-elpldtspeeddens?), you will see that the conversion from rt to speed is done inline rather than creating and storing a new variable in the DataFrame.

I prefer to keep the DataFrame simple with the integer variables (e.g. :rt) if possible.

I recommend using the StandardizedPredictors.jl capabilities to center numeric variables or convert to zscores.

Transformation of response and the form of the model

As noted in Box & Cox (1964), a transformation of the response that produces a more Gaussian distribution often will also produce a simpler model structure. For example, Figure 8 shows the smoothed relationship between word length and response time for words and non-words separately,

Code
draw(
  data(pruned) *
  mapping(
    :wrdlen => "Word length",
    :rt => "Response time (ms)";
    :color => :isword,
  ) *
  smooth();
)

Figure 8: Scatterplot smooths of response time versus word length in the LDT.

and Figure 9 shows the similar relationships for speed

Code
draw(
  data(pruned) *
  mapping(
    :wrdlen => "Word length",
    :rt => (x -> 1000/x) => "Speed of response (s⁻¹)";
    :color => :isword,
  ) *
  smooth();
)

Figure 9: Scatterplot smooths of response speed versus word length in the LDT.

For the most part the smoother lines in Figure 9 are reasonably straight. The small amount of curvature is associated with short word lengths, say less than 4 characters, of which there are comparatively few in the study.

Figure 10 shows a “violin plot” - the empirical density of the response speed by word length separately for words and nonwords. The lines on the plot are fit by linear regression.

Code
let
  plt = data(@subset(pruned, :wrdlen > 3, :wrdlen < 14))
  plt *= mapping(
    :wrdlen => "Word length",
    :rt => (x -> 1000/x) => "Speed of response (s⁻¹)",
    color=:isword,
    side=:isword,
  )
  plt *= (visual(Violin) + linear(; interval=:confidence))
  draw(plt, axis=(; limits=(nothing, (0.0, 2.8))))
end

Figure 10: Empirical density of response speed versus word length by word/non-word status, with lines fit by linear regression to each group.

Models with scalar random effects

A major purpose of the English Lexicon Project is to characterize the items (words or nonwords) according to the observed accuracy of identification and to response latency, taking into account subject-to-subject variability, and to relate these to lexical characteristics of the items.

In Balota et al. (2007) the item response latency is characterized by the average response latency from the correct trials after outlier removal.

Mixed-effects models allow us greater flexibility and, we hope, precision in characterizing the items by controlling for subject-to-subject variability and for item characteristics such as word/nonword and item length.

We begin with a model that has scalar random effects for item and for subject and incorporates fixed-effects for word/nonword and for item length and for the interaction of these terms.

Establish the contrasts

Because there are a large number of items in the data set it is important to assign a Grouping() contrast to item (and, less importantly, to subj). For the isword factor we will use an EffectsCoding contrast with the base level as false. The non-words are assigned -1 in this contrast and the words are assigned +1. The wrdlen covariate is on its original scale but centered at 8 characters.

Thus the (Intercept) coefficient is the predicted speed of response for a typical subject and typical item (without regard to word/non-word status) of 8 characters.

Set these contrasts

contrasts = Dict(
  :subj => Grouping(),
  :item => Grouping(),
  :isword => EffectsCoding(; base=false),
  :wrdlen => Center(8),
)
Dict{Symbol, Any} with 4 entries:
  :item   => Grouping()
  :wrdlen => Center(8)
  :isword => EffectsCoding(false, nothing)
  :subj   => Grouping()

and fit a first model with simple, scalar, random effects for subj and item.

elm01 = let
  form = @formula(
    1000 / rt ~ 1 + isword * wrdlen + (1 | item) + (1 | subj)
  )
  fit(MixedModel, form, pruned; contrasts)
end
Minimizing 53    Time: 0:00:05 ( 0.11  s/it)
Est. SE z p σ_item σ_subj
(Intercept) 1.3758 0.0090 153.69 <1e-99 0.1185 0.2550
isword: true 0.0625 0.0005 131.35 <1e-99
wrdlen(centered: 8) -0.0436 0.0002 -225.38 <1e-99
isword: true & wrdlen(centered: 8) -0.0056 0.0002 -28.83 <1e-99
Residual 0.3781

The predicted response speed by word length and word/nonword status can be summarized as

effects(Dict(:isword => [false, true], :wrdlen => 4:2:12), elm01)

10 rows × 6 columns

wrdlenisword1000 / rterrlowerupper
Int64BoolFloat64Float64Float64Float64
1401.465550.009031111.456521.47458
2601.389470.008981241.380491.39845
3801.313380.008964591.304421.32235
41001.23730.008981341.228321.24628
51201.161210.009031291.152181.17025
6411.63510.00903111.626071.64413
7611.53670.008981241.527721.54569
8811.438310.008964591.429341.44727
91011.339910.008981331.330921.34889
101211.241510.009031281.232481.25054

If we restrict to only those subjects with 80% accuracy or greater the model becomes

elm02 = let
  form = @formula(
    1000 / rt ~ 1 + isword * wrdlen + (1 | item) + (1 | subj)
  )
  dat = @subset(pruned, :spropacc > 0.8)
  fit(MixedModel, form, dat; contrasts)
end
Minimizing 65    Time: 0:00:04 (75.15 ms/it)
Est. SE z p σ_item σ_subj
(Intercept) 1.3611 0.0088 153.98 <1e-99 0.1247 0.2318
isword: true 0.0656 0.0005 133.73 <1e-99
wrdlen(centered: 8) -0.0444 0.0002 -222.65 <1e-99
isword: true & wrdlen(centered: 8) -0.0057 0.0002 -28.73 <1e-99
Residual 0.3342
effects(Dict(:isword => [false, true], :wrdlen => 4:2:12), elm02)

10 rows × 6 columns

wrdlenisword1000 / rterrlowerupper
Int64BoolFloat64Float64Float64Float64
1401.450360.008924581.441441.45929
2601.372970.008870921.36411.38184
3801.295570.0088531.286721.30443
41001.218180.008871021.209311.22705
51201.140780.008924771.131861.14971
6411.627350.008924571.618421.63627
7611.527020.008870921.518151.53589
8811.42670.008852991.417841.43555
91011.326370.008871011.31751.33524
101211.226050.008924751.217121.23497

The differences in the fixed-effects parameter estimates between a model fit to the full data set and one fit to the data from accurate responders only, are small.

However, the random effects for the item, while highly correlated, are not perfectly correlated.

Code
CairoMakie.activate!(; type="png")
disallowmissing!(
  leftjoin!(
    byitem,
    leftjoin!(
      rename!(DataFrame(raneftables(elm01)[:item]), [:item, :elm01]),
      rename!(DataFrame(raneftables(elm02)[:item]), [:item, :elm02]);
      on=:item,
    ),
    on=:item,
  ),
)
disallowmissing!(
  leftjoin!(
    bysubj,
    leftjoin!(
      rename!(DataFrame(raneftables(elm01)[:subj]), [:subj, :elm01]),
      rename!(DataFrame(raneftables(elm02)[:subj]), [:subj, :elm02]);
      on=:subj,
    ),
    on=:subj,
  ); error=false,
)
draw(
  data(byitem) * mapping(
    :elm01 => "Conditional means of item random effects for model elm01",
    :elm02 => "Conditional means of item random effects for model elm02";
    color=:isword,
  );
  axis=(; width=600, height=600),
)

Figure 11: Conditional means of scalar random effects for item in model elm01, fit to the pruned data, versus those for model elm02, fit to the pruned data with inaccurate subjects removed.

Note

Adjust the alpha on Figure 11.

Figure 11 is exactly of the form that would be expected in a sample from a correlated multivariate Gaussian distribution. The correlation of the two sets of conditional means is about 96%.

cor(Matrix(select(byitem, :elm01, :elm02)))
2×2 Matrix{Float64}:
 1.0       0.958655
 0.958655  1.0

These models take only a few seconds to fit on a modern laptop computer, which is quite remarkable given the size of the data set and the number of random effects.

The amount of time to fit more complex models will be much greater so we may want to move those fits to more powerful server computers. We can split the tasks of fitting and analyzing a model between computers by saving the optimization summary after the model fit and later creating the MixedModel object followed by restoring the optsum object.

saveoptsum("./fits/elm01.json", elm01);
elm01a = restoreoptsum!(
  let
    form = @formula(
      1000 / rt ~ 1 + isword * wrdlen + (1 | item) + (1 | subj)
    )
    MixedModel(form, pruned; contrasts)
  end,
  "./fits/elm01.json",
)
Est. SE z p σ_item σ_subj
(Intercept) 1.3758 0.0090 153.69 <1e-99 0.1185 0.2550
isword: true 0.0625 0.0005 131.35 <1e-99
wrdlen(centered: 8) -0.0436 0.0002 -225.38 <1e-99
isword: true & wrdlen(centered: 8) -0.0056 0.0002 -28.83 <1e-99
Residual 0.3781

Other covariates associated with the item are available as

elpldtitem = DataFrame(Arrow.Table(datadir("ELP_ldt_item.arrow")))
describe(elpldtitem)

9 rows × 7 columns

variablemeanminmedianmaxnmissingeltype
SymbolUnion…AnyUnion…AnyInt64Type
1itemAarodzuss0String
2Ortho_N1.5330901.0250Int8
3BG_Sum13938.41113026.059803177Union{Missing, Int32}
4BG_Mean1921.255.51907.06910.0177Union{Missing, Float32}
5BG_Freq_By_Pos2043.0801928.069854Union{Missing, Int16}
6itemno40481.5140481.5809620Int32
7isword0.500.510Bool
8wrdlen7.998818.0210Int8
9pairno20241.0120241.0404810Int32

and those associated with the subject are

elpldtsubj = DataFrame(Arrow.Table(datadir("ELP_ldt_subj.arrow")))
describe(elpldtsubj)

20 rows × 7 columns (omitted printing of 2 columns)

variablemeanminmedianmax
SymbolUnion…AnyAnyAny
1subj409.3111409.5816
2univKansasWayne State
3sexfm
4DOB1938-06-071984-11-14
5MEQ44.493219.044.075.0
6vision5.5116906.07
7hearing5.8610106.07
8educatn8.89681112.028
9ncorrct29.8505530.040
10rawscor31.99251332.040
11vocabAge17.812310.317.821.0
12shipTime3.086103.09
13readTime2.502150.02.015.0
14preshlth5.4870806.07
15pasthlth4.9298905.07
16S1start2001-03-16T13:49:272001-10-16T11:38:28.5002003-07-29T18:48:44
17S2start2001-03-19T10:00:352001-10-19T14:24:19.5002003-07-30T13:07:45
18MEQstrt2001-03-22T18:32:002001-10-23T11:26:132003-07-30T14:30:49
19filename101DATA.LDTData998.LDT
20frstLangEnglishother

For the simple model elm01 the estimated standard deviation of the random effects for subject is greater than that of the random effects for item, a common occurrence. A caterpillar plot, Figure 12,

Code
qqcaterpillar!(
  Figure(resolution=(800, 650)),
  ranefinfo(elm01, :subj),
)

Figure 12: Conditional means and 95% prediction intervals for subject random effects in elm01.

shows definite distinctions between subjects because the widths of the prediction intervals are small compared to the range of the conditional modes. Also, there is at least one outlier with a conditional mode over 1.0.

Figure 13 is the corresponding caterpillar plot for model elm02 fit to the data with inaccurate responders eliminated.

Code
qqcaterpillar!(
  Figure(resolution=(800, 650)),
  ranefinfo(elm02, :subj),
)

Figure 13: Conditional means and 95% prediction intervals for subject random effects in elm02.

References

Balota, D. A., Yap, M. J., Hutchison, K. A., Cortese, M. J., Kessler, B., Loftis, B., Neely, J. H., Nelson, D. L., Simpson, G. B., & Treiman, R. (2007). The english lexicon project. Behavior Research Methods, 39(3), 445–459. https://doi.org/10.3758/bf03193014
Box, G. E. P., & Cox, D. R. (1964). An analysis of transformations. Journal of the Royal Statistical Society: Series B (Methodological), 26(2), 211–243. https://doi.org/10.1111/j.2517-6161.1964.tb00553.x