Outlet statistics work differently from measurement of means. The distribution of the largest draw from a collection of draws from a normal distribution depends heavily on the size of the sampled population.
Consider each runner's skill, training, etc as a sampled variable. Then the top score in the sample depends heavily on the population size. Comparing the best draw from two equivalent groups of different sizes is thus going to favor the larger group. And this sounds like what they observed in the study.
Consider each runner's skill, training, etc as a sampled variable. Then the top score in the sample depends heavily on the population size. Comparing the best draw from two equivalent groups of different sizes is thus going to favor the larger group. And this sounds like what they observed in the study.