A fair summary would be "often very well, but not always". A good exmaple would be the S2ORC dataset [0]: a dataset of full parses of scientific PDFs.
In their paper, the authors write about the difficulties of getting the parsers to work reliably, and how having multiple published versions of a PDF was helpful for when the PDF parser fails on the first one.
[0] https://allenai.org/data/s2orc