This is a thoughtful and thought-provoking blog post. I think it's worth asking ...

skybrian · on Feb 2, 2022

There was that time when it was discovered that nearly all published binary searches and mergesorts were wrong. [1]

And yet, the concepts of binary search and merge sort are fine.

I think that's quite similar to the situation in math papers? Because math isn't executable, a math paper being "significantly" wrong would be like discovering that a program uses a fatally flawed algorithm and is trying to do the impossible. It can't be fixed.

Programs that can't be fixed seem rare?

[1] https://ai.googleblog.com/2006/06/extra-extra-read-all-about...

thethirdone · on Feb 2, 2022

Rather than "wrong", I would describe those implementations as "not fully general". They work perfectly when `n < some large number` as opposed to `n < some large number * 2`. The latter is the best you can do with the function signature, but that is somewhat arbitrary. You could easily choose a 64 bit index and exceed all practical needs.

yyyk · on Feb 3, 2022

Many of these examples aren't really in C but in C-like pseudo-code. In that domain they are perfectly accurate. Even in C, 'int' bitlength is defined by the implementation. int only needs to be 16 bits or more, but it could easily be enough bits to exceed number of known atoms in the universe in which case overflow is impractical.

I'd say that's an example of the difference between science (the authors don't need to show every detail of a practical implementation and can assume infinite bits in int) and engineering where you do need to make such consideration.

AussieWog93 · on Feb 2, 2022

From my experience in ML, I'd suspect that the "crisis" isn't that the research is false so much as it's useless (algorithm x with parameter set w happens to work well on one particular dataset, conclusion: I have revolutionised the field).

yodsanklai · on Feb 2, 2022

This isn't unique to ML. A lot of research is about adding an epsilon to an existing paper, which probably doesn't interesting anyone except a small community working in their very own niche topic.

But does it mean there's a crisis? maybe that's just a way to foster an environment that will let great ideas emerge.

fshbbdssbbgdd · on Feb 2, 2022

I’d rather be in the world where we have too many papers tweaking the details of power posing and exactly measuring how much each contributes to the effect. At least we’d know the effect is real.

gspr · on Feb 3, 2022

This course of events doesn't play out in math, though.

Analemma_ · on Feb 2, 2022

The parts of CS that are the most math-like (which include fundamental algorithms) don't have a replication crisis, but the ones that are the most social-science like probably do, or would. I would bet large sums of money that a lot of the literature on stuff like "does OOP lead to better code", "does IDE syntax highlighting lead to fewer bugs" etc. would fail to replicate if anyone bothered trying.

The thing is, the general sense I get is that people in CS already have so little confidence in these results that it's not even considered worth the time to try and refute them. Which doesn't exactly speak well of the field!

sterlind · on Feb 2, 2022

I worry about ML papers in particular. models are closely guarded, often impractical to train independently due to ownership of the training/test set, or computing power or details left out of the paper. there's no way to mathematically prove any of it works, either. it's like social science done on organisms we've designed.

zozbot234 · on Feb 2, 2022

> there's no way to mathematically prove any of it works, either

Or there is, but then you're doing statistics not just ML.

Beldin · on Feb 2, 2022

Some measurements are interesting and valuable without being replicable. For example, the number of online devices or the number of websites using wordpress. Take the same measurement at a later point in time and the results are different. Yet I wouldn't call those fields maths-like.

Karrot_Kream · on Feb 2, 2022

Research into this stuff is very young and so I think it's fair to be skeptical of the results. I'm hoping we'll eventually come up with more rigorous, reproducible results.

elisbce · on Feb 3, 2022

Anything math provable is objective. Anything depending on human opinions like "what's better code" is subjective, and thus suffers from replication.

ahelwer · on Feb 2, 2022

In competitive programming you could basically assume the pseudocode in a paper is not literally correct and requires some tweaking to work, despite a “proof” of its correctness. Particularly with string algorithms.

sterlind · on Feb 2, 2022

long time no see!

there's a couple levels there:

rote translating pseudocode into your target language isn't likely to pan out well.

so instead you run the pseudocode in your mind, develop an intuition on how it works, and that's the "replication" bit this post talks about with reviewing math papers.

but both the pseudocode and your code will likely have edge cases you didn't handle. this isn't a problem for math - that's the category of common trivial/easily fixable proof errors that don't really affect the paper. but they're a problem for machines that run them literally.

maybe a good compromise strategy for formal verification is to declare the insight of the algorithm - recurrence relation or whatever - as an axiom, and then use the prover to whack the tricky edge cases.

yodsanklai · on Feb 2, 2022

Yes, I'm convinced tons of published results are flawed! I heard top researchers tell their students "don't spend too much time on the proofs, nobody reads them"). And much CS scientific papers don't get a lot of attention. But it's not necessarily bad, other researchers builds on top of this work and results consolidate over time.

xenonite · on Feb 2, 2022

Isn’t this a misunderstanding? I suspect they rather meant to avoid spending too much time on language in these parts.

yodsanklai · on Feb 2, 2022

No, it's not. In that specific case, the supervisor thought the value of the paper didn't lie in the proofs, plus it was a rank B conference. He rather has his student working on a different paper than spending 1 week on the proof.