I prefer something I can install locally (doesn't need to be open source). I'm t...

severine · on Dec 20, 2019

I also have a different goal of trying to interpret structure of a PDF that has visual structure (headers, sections and subsections all numbered). But that seems to lend itself to some sort of text parsing.

Some reading here: https://stackoverflow.com/questions/53219016/detecting-secti...

jjohansson · on Dec 20, 2019

PDFTron provides an SDK and isn't really meant as a plug-and-play end-user application. But it can accomplish what you're looking for.

Here's how to extract text from a PDF based on coordinates (this explains how to do it on web, but it's also possible using other platforms):

https://groups.google.com/d/msg/pdfnet-webviewer/h2W3VksbQUI...

Here's how to extract a PDF's logical structure:

https://www.pdftron.com/documentation/samples/#logicalstruct...