EDIT: Oh it's actually reasonably simple, great use of CSS! https://github.com/desgeeko/pdfsyntax/blob/main/docs/simple_...
Depending on your transformation use case, you may write an incremental update with only a few bytes at the end of the original file instead of rewriting it entirely. To my knowledge this feature of the PDF specification is often overlooked and not a lot of libraries implements it.
It is a work in progress and I have not developed functions for images yet, though.
Here are some other similar(?) tools, for seeing the inner contents of a PDF file (the raw objects etc), but I haven't compared them to this tool here:
- https://github.com/itext/i7j-rups (java -jar ~/Downloads/itext-rups-7.2.5.jar)
- https://github.com/desgeeko/pdfsyntax (python3 -m pdfsyntax inspect foo.pdf > output.html)
- https://github.com/trailofbits/polyfile (polyfile --html output.html foo.pdf)
- https://www.reportmill.com/snaptea/PDFViewer/ = https://www.reportmill.com/snaptea/PDFViewer/pviewer.html (drag PDF onto it)
- https://sourceforge.net/projects/pdfinspector/ (an "example" of https://superficial.sourceforge.net/)
- https://www.o2sol.com/pdfxplorer/overview.htm
More?
The HTML output is like a pretty print where you can read view objects and follow links to other objects.
Since I have added a new command (disasm) that is CLI oriented and displays a greppable summary of the structure. Here is an explanation: https://github.com/desgeeko/pdfsyntax/blob/main/docs/disasse...