> Which raises another point in relation to the 1500 MTU - all of the CRC checks...

mprovost · on Feb 19, 2020

The risk is that multiple bits in the same packet are flipped, which the CRC can’t detect. If the bit error rate of the medium is constant, then the larger the frame, the more likely that is to occur. Also as Ethernet speeds increase, the underlying BER stays the same (or gets worse) so the chances of encountering errors in a specific time period go up. 100G Ethernet transmits a scary amount of bits so something that would have been rare in 10Base-T might happen every few minutes.

mlyle · on Feb 20, 2020

Your claim was it related to MTU, which you're now moving away from:

> Which raises another point in relation to the 1500 MTU - all of the CRC checks in various protocols were designed around that number.

Now we have a new claim:

> The risk is that multiple bits in the same packet are flipped, which the CRC can’t detect

Yes, that's always the risk. It's not can't detect-- it almost certainly detects it. It's just that it's not guaranteed to detect it.

It has nothing to do with MTU-- Even a 1500 MTU is much larger than the 4 octet error burst a CRC-32 is guaranteed to detect. On the other hand, the errored packet only has a 1 in 4 billion chance of getting through.

> 100G Ethernet transmits a scary amount of bits so something that would have been rare in 10Base-T might happen every few minutes.

The question is, what's the errored frame rate. 100G ethernet links have error rates (in CRC errored packets per second) compared to the 10baseT networks I administered. I used to see a few errors per day. Now I see a dozen errors on a circuit that's been up for a year (and maybe some of those were when I was plugging it in). 1 in 4 billion of those you're going to let through incorrectly.

Keep in mind faster ethernet has set tougher bit error rate requirements and we have an undetected packet error time of something like the age of the universe if links are delivering the BER in the standard.

(Of course, there's plenty of chance for even those frames that get through cause no actual problem-- even though the TCP checksum is weak, it's still going to catch a big fraction of the remaining frames).

The bigger issue is that if there's any bad memory, etc, ... there's no L2 CRC protecting it most of the time. And a frame that is garbled by some kind of DMA, bus, RAM, problem while not protected by the L2 CRC has a decent risk of getting past the weak TCP checksum.