And since PDF is encapsulated PostScript, and PostScript is a Turing-complete language, this problem is actually uncomputable in general! I guess it can be done well enough in practice by pattern-matching — if the document came from Word originally, it's probably possible to reverse the process — but that is one converter I would certainly not want to have to write!