Reaction Times

Margherita Calderan

“First, RT distributions are decidedly non-normal — they are almost always skewed to the right.

Second, this skew increases with task difficulty.

Third, the spread of the distribution increases with the mean.”

(Wagenmakers & Brown, 2007)

Reaction time distributions

Shifted-LogNormal
Inverse Gaussian (Wald)
Wiener et al.
Gamma
Ex-Gaussian
Weibull
…

Code

N = 10000
dat = data.frame(rt = c(rgamma(N, shape = 2.2, rate = 4.8)+.12,
  rshifted_lnorm(N, meanlog = -.8, sdlog = .75, shift = .15),
  rexgaussian(N, mu = 0.7, sigma = .07, beta = 0.4),
  rinv_gaussian(N, mu = .56,shape = 1.1)+.12,
  rwiener(N,alpha = 2, tau = .15, beta = 0.5, delta = 1.5 , types = "q")),
  Dist = factor(rep(c("ShiftGamma","ShiftLogNorm","ExGau", "ShiftInvGau", "Wiener"), each = N)))

ggplot(dat,aes(x = rt, fill = Dist))+geom_density(alpha = .4, lwd = 1,adjust = 2)+
  scale_x_continuous(breaks = seq(0,3,by = .5), limits = c(-0.5,3.5))+
  facet_wrap(~Dist)

Which one to choose?

Interpretability
Expertise
Feasibility (e.g., number of parameters)

Interpretability

Type of parameters - Jonas Kristoffer

Difficulty-like : influence both the mean and SD. Ideally a value that reflects the center (median?).
Onset/Shift : the earliest possible reaction time.
Scale : stretches or squeezes the distribution over and above difficulty, without (severely) changing the location (onset and center).

Difficulty-like

Influences both the mean and SD.

Code

N = 10000
dat = data.frame(rt = c(rshifted_lnorm(N, meanlog = -.85, sdlog = .7, shift = .2),
                        rshifted_lnorm(N, meanlog = -.5, sdlog = .7, shift = .2)),
                 Group = factor(rep(c("Easy","Hard"), each = N)))

ggplot(dat,aes(x = rt, fill = Group))+geom_density(alpha = .4, lwd = 1,adjust = 2)+
  scale_x_continuous(breaks = seq(0,3,by = .3), limits = c(0,3))

Median (Easy): 0.631

Median (Hard): 0.804

Mean (Easy): 0.749

Mean (Hard): 0.972

SD (Easy): 0.430

SD (Hard): 0.623

Onset - Shift

Influences the earliest possible reaction time.

Code

N = 10000
dat = data.frame(rt = c(rshifted_lnorm(N, meanlog = -.85, sdlog = .7, shift = .2),
                        rshifted_lnorm(N, meanlog = -.85, sdlog = .7, shift = .9)),
                 Group = factor(rep(c("valid cue","invalid cue"), each = N)))

ggplot(dat,aes(x = rt, fill = Group))+geom_density(alpha = .4, lwd = 1,adjust = 2)+
  scale_x_continuous(breaks = seq(0,3,by = .3), limits = c(0,3))

Median (val): 0.625

Median (inv): 1.324

SD (val): 0.421

SD (inv): 0.417

Scale

Stretches or squeezes the distribution, without (severely) changing the location (onset and center).

Code

N = 10000
dat = data.frame(rt = c(rshifted_lnorm(N, meanlog = -.85, sdlog = .7, shift = .2),
                        rshifted_lnorm(N, meanlog = -.85, sdlog = 2, shift = .2)),
                 Group = factor(rep(c("Control","Patient"), each = N)))

ggplot(dat,aes(x = rt, fill = Group))+geom_density(alpha = .4, lwd = 1,adjust = 2)+
  scale_x_continuous(breaks = seq(0,3,by = .3), limits = c(0.1,3))

Median (Cont): 0.624

Median (Pat): 0.647

Mean (Cont): 0.750

Mean (Pat): 3.729

SD (Cont): 0.439

SD (Pat): 20.185

Ideally, we should use distribution that can effectively separate three key parameters: Difficulty, Onset, and Scale…

Expertise

lme4

Gamma vs. Inverse Gaussian

Gamma

α > 0 shape
λ > 0 rate

$\begin{equation} f(x; \alpha, \lambda) = \frac{\lambda^\alpha}{\Gamma(\alpha)} x^{\alpha-1} e^{-\lambda x} \end{equation}$

Inverse Gaussian

μ > 0 mean
λ > 0 shape

$\begin{equation} f(x; \mu, \lambda) = \sqrt{\tfrac{\lambda}{2\pi x^3}}\ \exp\left(-\tfrac{\lambda(x-\mu)^2}{2\mu^2x}\right) \end{equation}$

Their are both difficulty-like parameter: μ = α / λ, $\sigma = \sqrt{\frac{\alpha}{\lambda^2}}$

μ reflects difficulty μ = μ , λ influence the scale. The larger μ and the smaller λ, the more variable the RTs: $\sigma = \sqrt{\frac{\mu^3}{\lambda}}$

Gamma

Interpretability

α independent events.
λ rate of events per unit time.

The total time for all events follows a Gamma distribution

Inverse Gaussian (Wald)

Interpretability

The name “inverse Gaussian” comes from its relation to Brownian motion (or Wiener process). It “describes the distribution of the time a Brownian motion with positive drift takes to reach a fixed positive level” [Wikipedia].

Code

set.seed(11)
rdm_path = function(drift, threshold, ndt, sp1=0,  noise_constant=1, dt=0.0001, max_rt=2) {
  max_tsteps = max_rt/dt
  
  # initialize the diffusion process
  tstep = 0
  x1 = c(sp1*threshold) # vector of accumulated evidence at t=tstep
  time = c(ndt)
  
  # start accumulating
  while (x1[tstep+1] < threshold  & tstep < max_tsteps) {
    x1 = c(x1, x1[tstep+1] + rnorm(mean=drift*dt, sd=noise_constant*sqrt(dt), n=1))
    time = c(time, dt*tstep + ndt)
    tstep = tstep + 1
  }
  return (data.frame(time=time, x=x1, accumulator=rep(1, length(x1))))
}

drift = 1;threshold = 2; ndt = .1

sim_path = rdm_path(drift = drift,threshold = threshold, ndt = .1)

ggplot(data = sim_path, aes(x = time, y = x))+
  geom_line(size = 1, color = "blue4") +
  geom_hline(yintercept=threshold, size=1)

Spoiler: Racing Diffusion Model

Link log!

Logarithms model proportional changes rather than absolute differences.

Code

# Set seed for reproducibility
set.seed(123)

# Parameters for simulation
n = 1000  # Number of observations
a = 6.5  # Baseline intercept for the log-transformed mean (mu)
b = 0.02  # Effect of covariate X
x = rnorm(n, 0, 0.05)  
sigma = 0.5  # Standard deviation of the log-transformed RTs

mu = a + b*x

# Effect of X on mu (mean log RT)
RT = rlnorm(n, meanlog = mu, sdlog = sigma)  # Log-normal data

ggplot(data.frame(rt = RT), aes(x = rt)) +
  geom_histogram(aes(y = ..density..), bins = 30, fill = "lightblue", color = "black", alpha = 0.6) +
  labs(title = "Distribution of Simulated RTs", 
       x = "RTs (ms)", y = "Density")

Code

RT_diff = exp(log(RT) + 0.03) - RT  

ggplot(data.frame(mu = RT, relative_change = RT_diff), 
       aes(x = mu, y = RT_diff)) +
  geom_point(alpha = 0.6, color = "red") +
  labs(title = "Change in log scale = 0.03", 
       x = "RT ms", y = "Change in RT ms") +
  scale_x_continuous(breaks = seq(0,4000,400))+
  scale_y_continuous(breaks = seq(0,120,10))

Reading time

High frequency vs. low frequency words, children 9-15 y.o.

Inverse Gaussian vs. Gamma

Code

mInv = lme4::glmer(time ~ freq  + age + (1|ID), 
                    data = dat_e, family = inverse.gaussian(link = "log"))
sumInv = summary(mInv)

mGam = lme4::glmer(time ~ freq  + age + (1|ID), 
                    data = dat_e, family = Gamma(link = "log"))
sumGam = summary(mGam)

Inverse Gaussian

              Estimate Std. Error
(Intercept)  2.6436875 0.02579157
freq1       -0.3161442 0.01138232
age         -0.1090430 0.02600495

Gamma

              Estimate Std. Error
(Intercept)  2.6159104 0.02561623
freq1       -0.3286415 0.01210228
age         -0.1085667 0.02619190

Posterior predictions

Inverse Gaussian

Code

ppInv = performance::check_predictions(mInv, iterations = 100, 
                               type = "density")
time = rowMeans(ppInv)
dat_e_plot = rbind(dat_e[,c("freq","time")], cbind(dat_e[,"freq"], time))
dat_e_plot$type = as.factor(c(rep("real", nrow(dat_e)),rep("pred",length(time))))

ggplot(dat_e_plot, aes(x = time, fill = type))+
  geom_density(alpha = .5)+facet_wrap(~freq)+xlim(0,max(dat_e_plot$time))

Posterior predictions

Gamma

Code

ppGam = performance::check_predictions(mGam, iterations = 100, 
                               type = "density")
time = rowMeans(ppGam)
dat_e_plot = rbind(dat_e[,c("freq","time")], cbind(dat_e[,"freq"], time))
dat_e_plot$type = as.factor(c(rep("real", nrow(dat_e)),rep("pred",length(time))))

ggplot(dat_e_plot, aes(x = time, fill = type))+
  geom_density(alpha = .5)+facet_wrap(~freq)+xlim(0,max(dat_e_plot$time))

lme4

Inverse Gaussian, correlation real pred = 0.771
Gamma, correlation real pred = 0.773

Bayesian … :)

Inverse Gaussian (Brms non-info prior)

Bayesian … :)

Gamma (Brms non-info prior)

RSS

Frequentist

Inverse Gaussian: 11772.59

Gamma: 11063.1

Inverse Gaussian

              Estimate Std. Error
(Intercept)  2.6436875 0.02579157
freq1       -0.3161442 0.01138232
age         -0.1090430 0.02600495

Gamma

              Estimate Std. Error
(Intercept)  2.6159104 0.02561623
freq1       -0.3286415 0.01210228
age         -0.1085667 0.02619190

Bayesian

       elpd_diff se_diff
fitInv   0.0       0.0  
fitGam -33.8       8.1

Inverse Gaussian

            Estimate Est.Error   Q2.5  Q97.5
b_Intercept    2.632     0.018  2.597  2.668
b_freq1       -0.318     0.015 -0.348 -0.290
b_age         -0.101     0.018 -0.137 -0.064

Gamma

            Estimate Est.Error   Q2.5  Q97.5
b_Intercept    2.633     0.018  2.597  2.669
b_freq1       -0.330     0.015 -0.360 -0.301
b_age         -0.111     0.019 -0.149 -0.074

Shifted LogNormal

Interpretability

If Y is log-normally distributed, then log(Y) follows a normal distribution. The distribution is defined directly for Y, not for a transformed version of Y!

$\mu \in ( -\infty ,+\infty )$ - influence difficulty. Median = shift + exp(μ).
σ > 0 scale - increases mean = $\theta + \exp\left( \mu + \frac{\sigma^2}{2} \right)$ (but not the median).
θ > 0 - earliest possible response.

$\begin{equation} f(x; \mu, \sigma, \theta) =\frac{1}{(x - \theta)\sigma\sqrt{2\pi}} \exp\left[-\frac{1}{2}\left(\frac{\ln(x - \theta) - \mu}{\sigma}\right)^2\right] \end{equation}$

Shifted LogNormal (Race)

Shifted LogNormal

Shifted-LogNormal

Median real RT: 868.212

Median predicted RT: 888.652

Error: -20.441

Median real RT: 868.212

Median predicted RT: 867.514

Error: 0.698

But Feasibility :(

$\begin{aligned} \mu &\sim 1 +\text{age} + \text{freq} + (1 | \text{ID}) \\ \theta &\sim 1 + (1 | \text{ID}) \end{aligned}$

Warning: There were 144 divergent transitions after warmup. Increasing
adapt_delta above 0.8 may help. See
http://mc-stan.org/misc/warnings.html#divergent-transitions-after-warmup

 Family: shifted_lognormal 
  Links: mu = identity; sigma = identity; ndt = log 
Formula: time ~ freq + age + (1 | ID) 
         ndt ~ 1 + (1 | ID)
   Data: dat_e (Number of observations: 476) 
  Draws: 4 chains, each with iter = 6000; warmup = 3000; thin = 1;
         total post-warmup draws = 12000

Multilevel Hyperparameters:
~ID (Number of levels: 238) 
                  Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sd(Intercept)         0.51      0.04     0.43     0.59 1.00     2310     5146
sd(ndt_Intercept)     0.16      0.03     0.09     0.22 1.01      539     1355

Regression Coefficients:
              Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept         1.76      0.10     1.55     1.95 1.02      675     1355
ndt_Intercept     1.95      0.07     1.81     2.07 1.01      617     1389
freq1            -0.74      0.07    -0.89    -0.62 1.01      796     1754
age              -0.22      0.04    -0.31    -0.14 1.01     1225     2948

Further Distributional Parameters:
      Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sigma     0.31      0.03     0.26     0.37 1.01     1454     3114

Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).

$\begin{aligned} \mu &\sim 1 +\text{age} +\text{freq} + (1 | \text{ID}) \\ \theta &\sim 1 \end{aligned}$

 Family: shifted_lognormal 
  Links: mu = identity; sigma = identity; ndt = log 
Formula: time ~ freq + age + (1 | ID) 
         ndt ~ 1
   Data: dat_e (Number of observations: 476) 
  Draws: 4 chains, each with iter = 6000; warmup = 3000; thin = 1;
         total post-warmup draws = 12000

Multilevel Hyperparameters:
~ID (Number of levels: 238) 
              Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sd(Intercept)     0.44      0.03     0.39     0.50 1.00     4352     6720

Regression Coefficients:
              Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept         2.01      0.05     1.92     2.12 1.00     5631     7417
ndt_Intercept     1.73      0.05     1.63     1.81 1.00     8241     7945
freq1            -0.59      0.03    -0.66    -0.52 1.00    10150    10023
age              -0.17      0.03    -0.23    -0.11 1.00     4218     6459

Further Distributional Parameters:
      Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sigma     0.27      0.02     0.24     0.30 1.00     4938     9131

Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).

Model performance

         elpd_diff se_diff
fitShift   0.0       0.0  
fitInv    -8.2       9.2  
fitGam   -42.0      13.0

correlation = 0.94

Posterior predictions

Shifted LogNormal

Feasibility (and theory ?)

Experiment: participants viewed a central prime—either a real word or pseudo-word—followed by a spatial cue directing them to a target on the left or right, which they located by pressing a key.

label

cue validity

Shifted LogNormal

$\begin{aligned} \mu &\sim 1 +\text{label} + (1 + \text{label} | \text{Exp_Subject_Id}) \\ \theta &\sim 1 + \text{cue} + (1 + \text{cue} | \text{Exp_Subject_Id}) \end{aligned}$

                Estimate Est.Error   Q2.5  Q97.5
b_Intercept        6.015     0.051  5.919  6.117
b_ndt_Intercept   41.017    18.883  2.156 75.566
b_label1           0.047     0.007  0.035  0.060
b_ndt_cue1        28.961     3.893 21.391 36.735

PP-Check

label

PP-Check

cue validity

Correlation check - be careful

ShinyApp - Jonas Kristoffer