The months of R&D to create a workaround could simply be because the subset of motherboards which trigger this issue are doing something borderline/unexpected with their voltage management, and finding a workaround for that behaviour in CPU microcode is non-trivial. Not all motherboard models appear to trigger the fault, which suggests that motherboard behaviour is at least a contributing factor to the problem.
Towards the middle of the video it brings up some very interesting evidence, from online game server farms that use 13900 and 14900 variants for their high single-core performance for the cost, but with server-grade motherboards and chipsets that do not do any overclocking, and would be considered "conservative". But these environments show a very high statistical failure rate for these particular CPU models. This suggests that some high percentage of CPUs produced are affected, and it's long run-time over which the problem can develop, not just enthusiast/gamer motherboards pushing high power levels.