Stirling PDF: Self-hosted, web-based PDF manipulation tool

One thing I think is really lacking from the PDF ecosystem is good open-source tools around signing. PDF signatures are something that is legally important in a lot of the world with regulations like eIDAS. Unfortunately it's extremely difficult to cryptographically sign PDF documents with tools other than Adobe's and some other (often more sketchy) proprietary tools. Even if you figure out how to use stuff like LibreOffice or poppler to sign, you'll struggle to obtain certs that will validate without spending an arm and a leg.

I really hope that someone will decide to step in and become the Let's Encrypt of PDF and S/MIME certs, because that will improve public trust significantly.

JumpCrisscross · a year ago

> you'll struggle to obtain certs that will validate

You’ll be surprised how far you can go pasting a picture of your signature in Preview.

noodlesUK · a year ago

Absolutely, and that's what I tend to do with my documents IRL, but I think it would be really nice if we could move to a world where signing documents digitally actually meant something more than `signature.png`.

In the EU, in order to have a legal guarantee of being treated as the same as a handwritten signature in all member states, you have to meet "Qualified Electronic Signature" level, which means cryptographic signatures and the involvement of some kind of trust services provider who validates the certificate used to sign. In practice this is rare, and ordinary electronic signatures a la Preview work for most things.

wfn · a year ago

Completely agree. Source: I build things for a company[1][2] which is a TRA and a QTSP (eIDAS parlance).

Two references which I promise will be interesting (re: qcerts and QES tooling):

- excellent open source library for working with PDFs and digital signatures (incl. PDF ones): https://github.com/MatthiasValvekens/pyHanko

- European Commission's DSS Tool (you can submit one PDF only, don't need both original and signed one): https://ec.europa.eu/digital-building-blocks/DSS/webapp-demo...

[1]: https://www.zealid.com/en/ - you can onboard remotely for free, download your qualified certs at https://my.zealid.com/en - upload, QES sign, download PDFs (all of these free) - or use our APIs to integrate into us (get in touch with us if you'd like the latter).

[2]: opinions are my own.

noodlesUK · a year ago

I know this comment comes a bit late, but thanks so much! That library is exactly the kind of thing I have been looking for. I've just signed up using your app. I'm curious, my UK passport didn't scan correctly with NFC. Do you only support EU docs for NFC validation? I expected the NFC scanning to work with any ICAO 9303 document.

TheJoeMan · a year ago

You’re really right, I asked my IT guy who’s a windows server wizard about what it would take to implement basic PKI for internal document signing and he looked at me like I had 2 heads.

FredrikSE · a year ago

The PKI is not hard(Windows one works well), but to keep it secure and safe could be more challenging.

Deleted Comment

perfmode · a year ago

does Preview in macOS not fit the bill?

noodlesUK · a year ago

As far as I know preview doesn’t support creating or even viewing cryptographic signatures at all.

From the README: “Stirling PDF does not initiate any outbound calls for record-keeping or tracking purposes”. Beyond auditing the code, how could a potential user verify this claim in advance, and how can a web-based app help support such a claim (in particular when the app does need to make some web requests to operate, but only to a restricted list of URLs that might be listed in a manifest along the lines of a Content-Security-Policy for instance)?

This is a concrete problem when deploying apps that need the user to “upload” some sensitive content.

huygens6363 · a year ago

Little snitch[1] can help you out when self-hosting. When not self-hosting, all bets are off and my default stance is "expect the worst".

[1] https://www.obdev.at/products/littlesnitch/index.html

Edit: LS is MacOS oriented. I'm sure there are others, but I'm not into it. I feel it should be an OS-level feature, but who am I.

vladvasiliu · a year ago

> LS is MacOS oriented

There's opensnitch on Linux. There's also something similar on Windows but I don't remember what it's called.

arcastroe · a year ago

If you're self-hosting on kubernetes, you can set up network policies with deny-all egress rule for this deployment/pod. This would block all outward network calls.

justsomehnguy · a year ago

> in particular when the app does need to make some web requests to operate

A web app doesn't need to make an outbound web requests to operate. A user interacting with a web is the one initiating the requests.

You can give the access to the up through a HTTP proxy and you can filter out any outbound requests from the web app or even not configuring the network routing for the server hosting that app. That leaves you with only JS initiated requests in the rendered pages of the app.

TheCapeGreek · a year ago

That's a problem with just about any package, library or system you use in the end.

Open source runs in a large amount of trust, and we're all complicit.

emarsden · a year ago

Sure, but these types of applications are running in a web browser sandbox, which benefits from enormous engineering resources to protect the host computer from malicious actions by the remote code. I'm wondering whether this execution environment (augmented with some policy mechanism to allow apps to declare their URL access needs, a little like an AppArmor or network firewal policy) could also provide some guarantees concerning privacy or information security.

apexalpha · a year ago

Just put a sniffer or network capture tool like Wireshark in between. Additionally you could restrict the apps network access entirely to just your local home network.

emarsden · a year ago

It seems that there is some missing tooling to make this convenient.

You can run a local bundle of HTML/JS/WASM in a web browser instance that you isolate (for example with firejail) to prevent network access. You distribute as a zip/tgz, but it's not obvious how to handle updates without a full redownload. Distributing with a full Electron-like interface is obviously overkill.

If you're running a web app that's hosted elsewhere (which will be much more convenient for most people), your web browser or the software isolation functionality (or firewall/proxy) needs to distinguish between the initial resource loads (approve) and later sneaky logging requests (ban).

There are Android applications such as TrackerControl that have related functionality (operates as a local VPN to filter all network requests and block tracking) but I don't know of convenient tools for the desktop (Linux, in particular).

maweki · a year ago

I host this at home. I don't use it myself, I can use the linux CLI tools that its based on. But I prefer my wife to use this to convert/split/etc. her pdf files this way instead of using some random website or app (that uses the same cli tool anyway).

She doesn't mind either way. Seems to work well enough for her use cases.

smartmic · a year ago

> … use the Linux CLI tools it is based on

I am interested in this part. Here is what I found: https://pdfbox.apache.org/2.0/commandline.html

Since PDFBox is a Java application, it should work cross-platform, not just Linux. Please correct me if you mean something else.

I am sorry. I was under the impression that, at least for some features, Stirling PDF also uses pdfjam or similar, instead of wholly delegating to PDFBox. I think the point still stands, that there are powerful cli tools and I wouldn't consider running this container just for myself.

Probably anybody who can get this docker container running, can use appropiate open source cli tools. So one would wonder about the target audience. I don't. ;)

But I do indeed LibreOffice's command line conversion features.

nashashmi · a year ago

You should just get a perpetual license to a PDF tool for $70. Best money ever spent. And get a portable exe version too.

ziddoap · a year ago

They seem to have a working solution that they, and their wife, is happy with. Why should they spend money on a different solution?

anotherhue · a year ago

> Originally developed entirely by ChatGPT, this locally hosted web application has evolved to encompass a comprehensive set of features, addressing all your PDF requirements.

Well that's that then.

frooodle · a year ago

Creator here, I think there is some confusion in my doc. It's not made with fully with chatgpt

It was initially created as a 24 hour challenge to make a full app with chatgpt 3.0 in a set time limit to test what chatgpt was like last year.

I posted on Reddit it got lots of demand and I turned it into a full app,the only full chatgpt was the first 24 hours, it's over a year later now

neitsab · a year ago

Yes, it would be a great idea to update the wording as there is no way to derive that from the current one.

Even being sympathetic, my thought reading this was "probably bad code quality/rotten core despite the great feature set".

You can have a "History/Background/Origin" section where you put exactly what you wrote in your comment and it will be fine.

This notwithstanding, thank you very much for developing this app! I will look into deploying it on my server, it will be of great help to people around me who often need manipulating PDFs but are not super technical!

jpnc · a year ago

I wonder what the point of that sentence is - to get picked up by HN? Kinda like how any product or service even tangentially related to data suddenly has 'by the way also AI and stuff' added somewhere on their landing page. You don't see 'Developed using intellisense' used in READMEs.

exe34 · a year ago

You do see "sent from my iPhone"

BonusPlay · a year ago

For me it means:

1) don't expose it to public internet

2) don't give it untrusted input

Which highly reduces the usability factor for me.

arthurcolle · a year ago

That's amazing. What a time to be alive.

nolongerthere · a year ago

so do we trust it?

wakawaka28 · a year ago

The claim, or the software? If the software was developed by ChatGPT it probably sucks. If it wasn't developed by ChatGPT, its author is a liar. So either way, fuck it.

atoav · a year ago

I once tried 3 hours to get ChatGPT to write a correct cubic interpolation function. Everything it wrote was either not cubic or not an interpolation function (the resulting curve didn't pass through the input points).

So yeah.

petepete · a year ago

I'd love it if this could be integrated into Paperless. Every now and then one of my scanned documents goes in upside down and I need to rescan. Clicking rotate, maybe reordering and letting it be rescanned would be great.

moritzruth · a year ago

The last minor release of paperless-ngx added merging, splitting and rotating.

Source: https://docs.paperless-ngx.com/changelog/#paperless-ngx-270

Ah thanks. I'd actually updated but didn't read the changelog and hadn't noticed the new functionality!

Odenwaelder · a year ago

The latest version of paperless-ngx can rotate documents. Check it out!

Perfect timing, thank you.

Ouch. Rotate the document in a pdf reader. And Print to pdf.

My Brother scanner SFTPs documents right to my server where they're ingested by Paperless-ngx. It's all really seamless other than occasionally when the document appears upside down for some reason.

babox · a year ago

me tooo

tacocataco · a year ago

One of firefox's more recent updates added a PDF editor.

I think people's perception of forefox is from several versons ago. As a daily user throughout its history, Firefox has made alot of progress over the years IMO.

Give it another shot if it's been a while.

mstijak · a year ago

Working on a product that could be similarily described.

CxReports: Self-hosted, web-based PDF reporting tool.

https://www.cx-reports.com

llagerlof · a year ago

When I saw this project months ago I instant loved it. I installed and start using it for real and... I found many bugs. Some tools were unusable.

I really hope it's better now.