Quest for Permissively Licensed PDF Library in C#

> obviously needs to be a PDF

I've been making my reports in self-contained HTML files[0] and it works out so much better than PDF. It is not constrained by paper sizes, and it lets me add some nifty features. For example, I recently added support for hiding columns in a table using exclusively CSS. The only downside is browsers can render things slightly differently, but for my use cases I don't need pixel-perfect identical rendering.

[0] Images are inlined base64-encoded, CSS/JS embedded with style and script tags. No external assets / no http requests.

giancarlostoro · a month ago

You can also use media queries for printing specific styling too so you can remove things that maybe a user doesn't need to print out:

https://developer.mozilla.org/en-US/docs/Web/CSS/Guides/Medi...

dmboyd · a month ago

Being constrained by page sizes is “a feature, not a bug” in most contexts. If I’m calling out numbers on the 3rd line of page 38 of a report, it helps if that’s consistent.

kgwxd · a month ago

The only reason PDFs still have a job is: pixel perfect consistency; the built-in validity stuff (ensuring the document wasn't altered, etc.); or the customer doesn't need the other things, but isn't open to alternatives. Otherwise, PDF is just a major headache.

wongarsu · a month ago

Also page-level consistency, and generally layouting in a printable format

Even with the same word document opened only in various MS Word versions (web, desktop, etc) you won't get consistent page numbers. And HTML tables work great on screen but don't print very well if they span more than what fits on a single sheet of paper

dwroberts · a month ago

Unless you can embed fonts [into the page itself] you aren’t beating PDF

giancarlostoro · a month ago

Not only can you embed the fonts, but you can make it interactive and output a PDF if you really wanted to. The HTML might grow if you embed enough JS, but on the other hand... some PDFs are insanely large.

fuzzy2 · a month ago

Not a problem with data: URIs. But then, a report may not need fancy fonts if HTML is acceptable.

gnomewascool · a month ago

You can embed fonts into an HTML page. For example, place an @font-face with the src:url being a base64-encoded blob, in a style element.

The wider .NET ecosystem is lacking when trying to step out the mainline. I don't bother hunting for unused, partially implemented .NET libraries anymore and just call out to a process or API call when needing to get something done.

It's not ideal, but when there isn't a good option isn't available in .NET it's usually available in Python/npm. Typically I'll use background jobs when calling out of process for added resiliency/replayability and observability.

cm2187 · a month ago

Not sure I agree. Also depends of the domain. The python ecosystem is of course a lot richer for anything AI. But try to open, manipulate and export spreadsheets. In python you pretty much need a different library for every excel file format (xls, xlsx, etc) and usually the more file formats a library can handle, the least capable it is (eg pandas). In .net you have libraries like spreadsheetgear that are super powerful, including their own excel calculation engine. I see nothing remotely close in python.

exyi · a month ago

The point is that a good library usually exists for some language, which is not necessarily the one you are currently using.

IMHO, we don't lack good libraries in XY, we are lacking good interop. Going through REST or stdio is quite painful just to render PDF (or export spreadsheet, ...)

pjmlp · a month ago

There is hardly anything that isn't available in .NET, the main problem is being willing to pay for tooling.

mythz · a month ago

I'm using of a lot of ComfyUI Workflows, Custom Nodes, Image and Audio classifiers relying on PyTorch, supervision, ultralytics, MediaPipe, OpenCV, onnxruntime, pandas, numpy that says otherwise. There are some equivalents, but the ecosystems aren't playing in the same ballpark.

eXpl0it3r · a month ago

Is this for your personal workflow, or for applications that you ship?

How do you handle deployment / packaging of multiple, different ecosystems?

thiago_fm · a month ago

This looks like ChatGPT. There are PLENTY of alternatives on the post.

Python and others have similar issues, with them having limitations as well

mythz · a month ago

It wouldn't be a quest if there were lots of good options, a few good options is better than lots of unused/unmaintained ones.

tom_alexander · a month ago

smithkl42 · a month ago

We've been using Aspose.PDF for the last 10 years or so in our C# platform, and paying for the license. It's expensive and buggy and has shite support, so a year or so back I decided to see if there was some other library or combination of libraries that could meet our needs. Basically, we needed:

* HTML to PDF

* Compress PDF

* Manual PDF generation

* Text extraction

* No browser engine or other weird dependencies

I researched every library I could find, and downloaded, integrated and tested anything that looked remotely promising.

At the end of all that, I reluctantly handed my company credit card back to Aspose. There simply wasn't any open-source or even just cheaper PDF library that I could actually make work, and all the other paid ones that did work were even more expensive.

dfcab · a month ago

I am in the same boat. Aspose has been the go to for Word and PDF documents. Will say, Adobe's PDF Services API offers a ton of interesting features but comes with a price tag and in my scenario, it's not HIPAA compliant.

c0wb0yc0d3r · a month ago

Aspose is the library I’ve used commercially in the past, too. My experience was similar. The company I worked for at the time eventually charged more for PDF export as a paid add on. The software is very sticky so the people who truly needed pdf export directly paid, the rest relied on export to word then “printed” the pdf themselves.

Dead Comment

Archelaos · a month ago

I create PDF files from C# using LaTeX as an intermediate format. This works very reliable but sometimes takes a bit of tinkering until everything fits.

People here on HN recently recommended Typst as a replacement for LaTeX, but I haven't tried it myself yet.

Just today I looked at LaTeX interop for C#, but it seems the TeX world is in its own bubble of commandline tools.

Do you use any library or are you just calling the standard TeX CLI tools?

actionfromafar · a month ago

I my eyes, PdfSharpCore¹ is now the "canonical" version of pdfcore.

IMHO the list is incomplete without it.

1: https://github.com/ststeiger/PdfSharpCore

It seems the PDFSharp rabbit hole goes even deeper than I've realized!

Latest MigraDoc & PDFSharp seem to have been updated and ported to .NET 6 after a lot of the forks happened, so it was unclear to me whether there's merits in looking at other, mostly abandoned forks.

I might add PdfSharpCore, though the use of SixLabors.ImageSharp and SixLabors.Fonts leads to a disqualification from the "quest", given their custom split license [1]

Edit: Actually, the license seems to turn into an Apache 2.0 license, when used with an open source licensed project and also as transitive dependency. Certainly a confusing license.

[1] https://github.com/SixLabors/ImageSharp/blob/main/LICENSE

Edit: PSA - PdfSharpCore uses older releases of SixLabors.ImageSharp v1.0.4 and Fonts-1.0.0-beta17 which both were (and are still) distributed under plain Apache-2.0.

https://web.archive.org/web/20251104163604/https://codeload....

tonyedgecombe · a month ago

>Naturally, I first started looking for permissively licensed libraries, which could be used free of charge and without additional license requirements.

There is a lot of work in a good PDF library, expecting to get it for free feels unreasonable to me.

unethical_ban · a month ago

Given reality, it would be silly for a consumer not to look for the cheapest option available, that doesn't have vendor lock-in.

That's said, for many niche products you are correct.

kappadi3 · a month ago

If you ever revisit alternatives, you might want to try YakPDF It gives you:

- HTML → PDF without any browser engine - PDF compression & optimization - Simple API for manual PDF generation - Text extraction - No native dependencies and cheaper than Aspose

It’s not a full drop-in replacement for every Aspose feature, but it covers the core workflow you mentioned and is much lighter to integrate.

https://rapidapi.com/yakpdf-yakpdf/api/yakpdf (open via firefox)

GiorgioG · a month ago

YakPDF (as far as I can tell) is an API and not a library that generates a PDF. If you're going to go that route, host https://github.com/gotenberg/gotenberg yourself and call it a day.

edit: Stop spamming your own service.

flanbiscuit · a month ago

I needed this post a year ago when I was looking for this exact thing. I did end up going with Puppeteer because I needed it for something else that I couldn't avoid. I use a large list of flags with it to launch the most minimal version of headless Chrome that I can.

I am going to look into switching to MigraDoc and see if i can drop puppeteer

Thanks for this great research!

You're welcome!

Having played around with MigraDoc for the past few weeks, I do still recommend it, as long as you don't need more complex layouts. Here's a short and certainly incomplete list of limitations that I've run into so far:

- No tables within other tables

- No multi-column page layouts

- No multi-section on the same page (new section = new page)

- No letter spacing

- MigraDoc doesn't know about the final spacing, so you can't adjust say the width of some table column automatically. Either calculate an estimated based on the text/content or space them equally.

- Can't shade (background color) only a selection of words in a text

- Lists can only have up to three different symbols

- List indentation can behave quite strange, due to tabstops

- No horizontal rule (can be emulated)

- There's a bug with bottom border of a paragraph

On the other hand, MigraDoc & PDFsharp as less than 1MB and plenty fast, so it's a great package, as long as you can build some workarounds to achieve the desired look.