Empirical evaluations break down quickly as the complexity of what is being measured increases, so your best bet in doing a study is to focus on one or a few features.
PLs are of course evaluated over time in the market place over time, like any other designed artifact.
I love that report and find it very inspiring (and it gives me an intuition that Haskell has an edge over the other languages), but the methodology is very disappointing:
- The requirements were mostly up to the interpretation of implementors, which more or less decided the scope of their programs.
- All results were self-reported. Even ruling out dishonesty, there are a lot of ways uncontrolled experimenters can report incorrect results.
- Many implementations weren't even runnable.
- No code was run by the reviewers, at all.
I really would love to see a more serious attempt at this experiment (probably with more modern languages).
Why not? considering "more likely to be correct" is an empirical claim.
> http://www.cs.yale.edu/publications/techreports/tr1049.pdf
Haskell vs Ada, C++, etc. in 1994.