> Which raises another point in relation to the 1500 MTU - all of the CRC checks in various protocols were designed around that number.
Hmm. Why is this? It seems if we have a CRC-32 in Ethernet (and most other layer 2 protocols), we'll have a guarantee to reject certain types of defects entirely... But mostly we're relying on the fact that we'll have a 1 in 4B chance of accepting each bad frame. Having a bigger MTU means fewer frames to pass the same data, so it would seem to me we have a lower chance of accepting a bad frame per amount of end-user data passed.
TCP itself has a weak checksum at any length. The real risk is of hosts corrupting the frame between the actual CRCs in the link layer protocols. E.g. you receive frame, NIC sees it is good in its memory, then when DMA'd to bad host memory it is corrupted. TCP's sum is not great protection against this at any frame length.
The risk is that multiple bits in the same packet are flipped, which the CRC can’t detect. If the bit error rate of the medium is constant, then the larger the frame, the more likely that is to occur. Also as Ethernet speeds increase, the underlying BER stays the same (or gets worse) so the chances of encountering errors in a specific time period go up. 100G Ethernet transmits a scary amount of bits so something that would have been rare in 10Base-T might happen every few minutes.
Your claim was it related to MTU, which you're now moving away from:
> Which raises another point in relation to the 1500 MTU - all of the CRC checks in various protocols were designed around that number.
Now we have a new claim:
> The risk is that multiple bits in the same packet are flipped, which the CRC can’t detect
Yes, that's always the risk. It's not can't detect-- it almost certainly detects it. It's just that it's not guaranteed to detect it.
It has nothing to do with MTU-- Even a 1500 MTU is much larger than the 4 octet error burst a CRC-32 is guaranteed to detect. On the other hand, the errored packet only has a 1 in 4 billion chance of getting through.
> 100G Ethernet transmits a scary amount of bits so something that would have been rare in 10Base-T might happen every few minutes.
The question is, what's the errored frame rate. 100G ethernet links have error rates (in CRC errored packets per second) compared to the 10baseT networks I administered. I used to see a few errors per day. Now I see a dozen errors on a circuit that's been up for a year (and maybe some of those were when I was plugging it in). 1 in 4 billion of those you're going to let through incorrectly.
Keep in mind faster ethernet has set tougher bit error rate requirements and we have an undetected packet error time of something like the age of the universe if links are delivering the BER in the standard.
(Of course, there's plenty of chance for even those frames that get through cause no actual problem-- even though the TCP checksum is weak, it's still going to catch a big fraction of the remaining frames).
The bigger issue is that if there's any bad memory, etc, ... there's no L2 CRC protecting it most of the time. And a frame that is garbled by some kind of DMA, bus, RAM, problem while not protected by the L2 CRC has a decent risk of getting past the weak TCP checksum.
Hmm. Why is this? It seems if we have a CRC-32 in Ethernet (and most other layer 2 protocols), we'll have a guarantee to reject certain types of defects entirely... But mostly we're relying on the fact that we'll have a 1 in 4B chance of accepting each bad frame. Having a bigger MTU means fewer frames to pass the same data, so it would seem to me we have a lower chance of accepting a bad frame per amount of end-user data passed.
TCP itself has a weak checksum at any length. The real risk is of hosts corrupting the frame between the actual CRCs in the link layer protocols. E.g. you receive frame, NIC sees it is good in its memory, then when DMA'd to bad host memory it is corrupted. TCP's sum is not great protection against this at any frame length.