Readit News logoReadit News
fernly · 5 months ago
A bit of context regarding Project Gutenberg. Its intake process is far from casual. Take a look at Project Gutenberg Distributed Proofreaders (PGDP, [0],[1]), one of the oldest "crowd-sourcing" projects on the net (est. 2000). As you can see from [0], every book goes through three rounds of proofing, where volunteers read each page of text and compare it to the scanned image; then through two rounds of format review, where other volunteers insert or review format markup.

From that 5-pass process the marked-up text is handed to a volunteer "post-processor" who assembles the final HTML or e-book file; then the completed book gets one more "smooth reading" pass before it is posted to PG.

This it the process that produces the books input to Standard Ebooks. That they can still find scanner errors ("tne" for "the", a typical "scanno") demonstrates how difficult it is to see those. But their presence isn't from carelessness or disregard for the value of the books.

In the 20-teens I put in hundreds of volunteer hours at PGDP in all the above roles, and it was very satisfying work. I'd recommend it to anyone wanting an online hobby that feels constructive. Volunteering time to Standard Ebooks would probably feel good as well.

[0] https://www.pgdp.net/c/activity_hub.php

[1] https://en.wikipedia.org/wiki/Distributed_Proofreaders

contact9879 · 5 months ago
The work done by Distributed Proofreaders is pretty amazing. I try to contribute my 35 pages as often as I can. The backlog there is pretty insane even while finishing upwards of 150 ebooks per month

it truly is an "online hobby that feels constructive". you get these tiny glimpses into our shared literary/cultural history while knowing that the work you're doing is for the benefit of all (benefit of the public domain)

zozbot234 · 5 months ago
> The backlog there is pretty insane even while finishing upwards of 150 ebooks per month

Isn't the backlog there mostly in the post-processing step, though? To the point where they're taking finished texts and running them again through the page-by-page proofreading in hope of fishing out more OCR typos and improving the format markup?

You can also contribute at Wikisource if you prefer, that doesn't really have a post-processing step and has much less of a fixed pipeline. (There are explicit "proofreading" and "verification" steps per page, but not much beyond that.)

Arcorann · 5 months ago
In a similar vein, there is Wikisource.[0] Wikisource has the advantage of allowing for extensive formatting to closely match the source works due to its wiki-based format, but doesn't have quite as robust processes. Its flexibility is unparalleled though -- it covers virtually any form of scanned print work and even some old movies, and contributors can focus on whatever niches they're interested in if they want.

[0] https://en.wikisource.org/wiki/Main_Page

grues-dinner · 5 months ago
> doesn't have quite as robust processes

They do have a double-pass system for all works based on scanned pages, which is quite nifty. Green means two passes complete: https://en.m.wikisource.org/wiki/Index:Sophocles%27_King_Oed...

Plus you can just jump in to any work, in true wiki fashion.

brador · 5 months ago
The amount of this that could be trivially automated fills me with rage.

Even just automated flagging of common errors would save 1000s of volunteer hours.

BlackFly · 5 months ago
It's unclear that that would save time. If you put in enough hours to the project, you can get classified as one of those later pass proofers. That is extremely taxing work because most of the scannos have already been found by the earlier proofers. You will "complete" multiple pages without ever finding a scanno. The doubt starts to set in if you are on auto-pilot or not.

Meanwhile, in that early stage, because of the stream of errors, it is easy to pay attention and feel like you are doing rewarding work. Moreover, if you are quite quick and diligent, you can basically just read a book as volunteer work.

Also, sometimes the error is in the source material. Different editors have different opinions about what should be done there. Sometimes I had to re-add mistakes that were "fixed" by early proofers trying to correct grammar, if I recall correctly... it was a while back that I volunteered.

executesorder66 · 5 months ago
> In the 20-teens

That being 2013 to 2019?

HexPhantom · 5 months ago
I think a lot of people (my past self included) underestimate how much meticulous, behind-the-scenes work goes into something like PGDP
zem · 5 months ago
out of curiosity, wouldn't an automated spell check pass help catch ocr errors? e.g. "tne" would be caught immediately.
generationP · 5 months ago
The most confusing errors are the ones spellcheck doesn't catch because they transform a word into a valid word. But it's them that we want the least.
bluGill · 5 months ago
Unless tne is an abbreviation and so it should pass. Names are a common place where people make up weird spellings and so spell checkers are annoying. I have terrible spelling, and yet most of the time I run spellcheck it is tripping up on words that are spelled correct but not in the dictionary (in large part because I run spell check after each revision: words spelled wrong . Add to dictionary means that my dictionary is polluted with words that only apply to one document and would be wrong in the next)
pulkitsh1234 · 5 months ago
An LLM-based spellchecker would've caught it for sure. I am working on one here: https://github.com/pulkitsharma07/spelltastic.io, If anyone has suggestions on how this can help in Project Gutenberg / Standard Ebook's workflows, please reach out to me / open an issue.

I have seen that LLMs are pretty good at understanding context/domain / theme-specific terms, so their spellchecking is pretty good.

fernly · 5 months ago
Running spellcheck is a standard step on every page of proofreading. There's a "wordcheck" button in proofing UI.
contact9879 · 5 months ago
the distributed proofreaders process does include a mandatory spellcheck
acabal · 5 months ago
Editor-in-chief here, happy to answer any questions, as always. We also recently celebrated Public Domain Day with an especially notable crop of books, including The Sound and the Fury, All Quiet on the Western Front, John Steinbeck's first novel, some Hemingway, Gandhi, two Dashiell Hammett novels, and more: https://standardebooks.org/blog/public-domain-day-2025
frereubu · 5 months ago
Another question - in https://standardebooks.org/contribute/producing-an-ebook-ste... you talk about "modernising" spelling, e.g. changing "some one" to "someone". This may be against the implicit goal of making these accessible for a general reader, but I prefer to read what was originally written, and it feels like it crosses a line into editorialising rather than letting the original feel stand as-is. (Although of course these texts have already been "editorialised" by their original editors!) Totally your decision given the amount of effort that has clearly gone into this, but I'd be interested to read the rationale for that decision.
idoubtit · 5 months ago
I respect this choice of modernization, and I suppose some readers enjoy it, but it makes the publisher's whole work useless to me. When a text has been altered, I can't trust it respects the intent of the author, and any style inconsistency I find may be a by-product of the publisher's mangling.

So, when I care about a book, I never read Standard Ebooks' edition.

By the way, the modernization is more than joining a few words. Sometimes, Standard Ebooks replaces the word used at the time the book was written. For instance:

    This time, however, the mountain was going to [-Mahomet;-]{+Muhammad;+}
The previous quote is from Galsworthy's "Forsyte Saga". The author used many French words and French spellings – like "Tchekov" for the Russian playwriter that was living in Paris. These subtleties are lost with the modernization.

I also think some alterations are plain mistakes. For instance in the same book:

    if she wanted a good book she should read [-“Job”-]{+Job+};
    his father was rather like Job while Job still had land.

acabal · 5 months ago
That's fine! Our editions didn't erase any of the other editions you can find online and in print. You're more than welcome to select any edition that fits your reading preferences.
sbarre · 5 months ago
What's the point of including books that aren't public domain yet in your collections?

It makes it hard to browse those collections to find actual books to read. The first 3 series I clicked on all said "not P.D." (which at first I didn't know what "P.D" meant - remember your audience does not have your level of familiarity with your context, perhaps a tooltip on that badge would help)..

Then I see "this book will enter public domain in 2050"..

I commend you for this project, it's really awesome work.. From a user's experience, it would be great to have a filter on your various lists that restricts only to books that are available, and excludes these books that are not yet in your collection.

acabal · 5 months ago
In addition to what Robin mentioned below, some of these placeholders are for books on our Wanted list. I also think it's useful to show readers that particular books are looking for volunteers to produce, and also to show that some books they might want are locked away by copyright for possibly decades. In that sense it's partly a political message.
robin_reala · 5 months ago
Whenever we add a collection, the books that are in that collection but not yet in PD in the US get placeholders. But a filter might not be a bad idea.
loloquwowndueo · 5 months ago
Which ebook reader works well with standard ebooks in 2025?

(More concretely my reader is a 2nd-gen kindle which is basically useless these days and I’d love an idea of something that can display standard ebooks with all their advanced formatting)

Thanks!

acabal · 5 months ago
I read on an old Kobo, using Kepub files. Their Kepub renderer is quite good.

I think Kindle's renderer hasn't changed significantly for many years, and it had always been pretty bad. I always say that Kindle seems to have been created by people who hate books.

The best renderer around is iBooks on an iPad, which as far as I can tell uses an up-to-date Webkit.

turrican · 5 months ago
A note for Kobo users: a lot of us (myself included) use Calibre to manage and upload our ebooks. Something about Calibre messes up Kepub files and strips out a lot of the formatting (including the book’s cover).

If I want to appreciate a nice Kepub from Standard Ebooks, I upload it directly to the Kobo.

wyclif · 5 months ago
A Kobo would be a great choice. I use a Kobo Libra 2 and love it a lot more than my old Kindle Paperwhite that got stolen: https://gl.kobobooks.com/products/kobo-libra-2 The Kobo Sage is also good because it has an 8" screen.

Standard eBooks offers kepub format for Kobo devices and files, they use their advanced Webkit-based renderer: https://standardebooks.org/help/how-to-use-our-ebooks#kobo-f...

jussih · 5 months ago
I recently purchased a Pocketbook Era. It is pretty much the perfect device for me - supports open standards and does not require any cloud account signups to start using it. It is not hostile to the user, 3rd party applications such as Koreader can be simply dropped in and they appear in the menus without any shenanigans like jailbreaking or custom launchers needed.

In my ideal world all devices would be like this.

kps · 5 months ago
Piggybacking: for computers, what is a good epub viewer?

What I'm personally looking for:

- Linux and/or OS X

- No ‘import’ requirement (a viewer, not a collection manager)

- Single page or continuous (no forced double spread)

- No required animations

- At least basic control over font size, spacing, margins.

- Keyboard navigation (at least next/previous page)

pidgeon_lover · 5 months ago
KOReader for Kindle? https://github.com/koreader/koreader

It does a good job of modernising old Kindles.

rodolphoarruda · 5 months ago
For Android, Moon Reader Pro.

Unmatched UI tweaking features which make reading a pleasure. Syncs bookmarks with cloud services, thus across different devices.

carlosjobim · 5 months ago
My Kindle is 8 years old and works excellent with standard ebooks. I think you can select any device that you prefer and it will be good.

Dead Comment

frereubu · 5 months ago
I love this. However, I couldn't find an alphabetical list of authors, which is the way I wanted to browse on my first visit. Instead my only option is to show 48 on a page and paginate through, which is tedious. I know there are author pages - e.g. https://standardebooks.org/ebooks/william-makepeace-thackera... - so I presume it's feasible. An author index would significantly increase my likelihood of understanding what's available and engaging with the content.
acabal · 5 months ago
We don't have a list of authors yet, but that's a good idea to add!
homebrewer · 5 months ago
Erlangen · 5 months ago
Hi, Alex. Is there anyway to browser the ebooks filtered by languages? I tried to find some texts in French, but it doesn't seem to have any.
acabal · 5 months ago
Standard Ebooks only works on English-language books, as typography varies between languages and we're only experts in English.
LtWorf · 5 months ago
Same for me. I think it's english only.
theyinwhy · 5 months ago
Great work! Gutenberg project books have always been a pain to read. Thank you for caring!
jayanmn · 5 months ago
I am from India. Could you add local UPI based donation option at some point? Not everyone has card here.
mourner · 5 months ago
Wonderful project! One thing I wish the website would have is being able to find the right book to read out of this enormous list — e.g. showing / sorting by Goodreads ratings (which I realize you might not want to do), or at least having some kind of a "Featured" section with the most critically acclaimed / must read books of the project on one page.
cxr · 5 months ago
There are around a dozen collections on the (not prominently featured) collections page[1] like Le Monde's 100 Best Books of the Century and Modern Library's 100 Best Novels, etc.

1. <https://standardebooks.org/collections>

sgustard · 5 months ago
Steinbeck was the first name I searched for, so this was great to see even if his major works won't be available for some time. I do wonder how badly the Steinbeck or Faulkner estates are hurt by the sudden loss of royalties? Imagine working hard to write a book to make a living and then just under a hundred years it's taken away from you. Also, AI.
bodantogat · 5 months ago
Is there an API or downloadable catalog of the titles? Happy to feature them on meetnewbooks.com so more readers can find them.
acabal · 5 months ago
Yes, we have complete feeds available for our Patrons: https://standardebooks.org/feeds

Deleted Comment

agiacalone · 5 months ago
Been using Standard Ebooks for a while now, but wanted to drop by here and say how great this site is! It's replaced P.G. for me (for whatever is on this site, at least) and I like the much nicer formatting on the texts. It's great on both my physical Kindle and Apple Books on my iPhone.
htunnicliff · 5 months ago
I’d love to know more about the pattern of keeping each book in individual repos, rather than in a singular repo.
acabal · 5 months ago
Each repo is a history of the ebook including editorial changes, typos fixes, and the like. Having a single repo containing thousands of ebooks and their histories would be pretty annoying to browse.
remus · 5 months ago
Presumably to keep the repo size reasonable. Say I want to make an ad hoc contribution to a book, if step 1 is "download this multi-gigabyte repo" then that's a fairly big hurdle.
HexPhantom · 5 months ago
Really appreciate the work Standard Ebooks puts into making these texts not just available, but readable
fauria · 5 months ago
Roughly speaking, how long does it take you to produce a single ebook?
acabal · 5 months ago
Once you're very familiar with the process, you could get a draft of a basic prose novel ready for proofreading in a few hours. Then it has to be proofread and completed.

Beginners, and people working on more advanced books, can take much, much, much longer.

contact9879 · 5 months ago
it varies widely depending on the length and type of book and how much free time the volunteer has to devote to it

Anywhere between 1 week for the simplest (straight narrative, not too much verse or endnotes) and ~1 year (thousands of endnotes, pages of verse, drama, in-line references to book titles, use of technical terms, etc)

crorella · 5 months ago
In your opinion, what is the ebook reader you like the most ?
greenie_beans · 5 months ago
ooo tempted to reprint faulkner as part of a small press, thanks for the idea
ssttoo · 5 months ago
I recently started on my first title contribution to the project, it’s a rewarding experience https://github.com/stoyan/edith-wharton_the-custom-of-the-co... It’s HTML all the way down

The step-by-step: https://standardebooks.org/contribute/producing-an-ebook-ste...

In a nutshell: start with a Project Gutenberg text, clean it up to a high standard, have it peer reviewed and published

Touche · 5 months ago
Love this. So many in the archivist community are only interested in preservation and don't care at all about making the material accessible. Love to see a project like this prioritizing the latter.
stog · 5 months ago
You’re spot on with this. I recently converted a local history book from 1911 to Markdown, ePub and HTML and tracked the changes on GitHub. Only a handful of copies of this book exist in physical form and it has been photo copied (which is great).

However, I was completely shot down by the local library when I was discussing it with them. They said they already had a photo copy and didn’t need anymore digital editions, I tried to explain the benefits of having it in a machine readable format but they wouldn’t entertain it. I completed the project for me, so I wasn’t too bothered, but thought they might have been interested in archiving it but they weren’t.

My general feeling is that they didn’t like an outsider contributing and touching on a format they didn’t know so got slightly defensive.

frereubu · 5 months ago
Do you "claim" a book, to make sure that no-one else is trying to work on the same book? I presume that's part of step 4 in your link, given that it would be heartbreaking to get 90% of the way through and then be beaten to it by someone who'd started at roughly the same time!
contact9879 · 5 months ago
Yes, you signal your intent on the mailing list subject to approval by the editor-in-chief
pidgeon_lover · 5 months ago
I am interested in ebook production, as I do so for my own personal use, but the copyright issues put me off contributing on the clearnet to legit projects. I have a whole section in my Calibre library of books I've edited or converted from Archive.org scans, but can't share any of them because a) legit channels only accept public domain works, and they're all under copyright, and b) the current main ebook pirate channels don't accept any contributions
miles · 5 months ago
Some of the higher ranking previous discussions:

2017, 441 points, 97 comments https://news.ycombinator.com/item?id=14570035

2019, 820 points, 131 comments https://news.ycombinator.com/item?id=20594802

2022, 1578 points, 256 comments https://news.ycombinator.com/item?id=32215324

2024, 701 points, 154 comments https://news.ycombinator.com/item?id=38831219

Sverigevader · 5 months ago
It's thanks to this site that I learned that Kobo uses a really bad renderer for epubs unless converted to their own ebook format (Kepub). It make a huge difference in appearance and performance on a Kobo device.

https://standardebooks.org/help/how-to-use-our-ebooks#kobo-f...

Uvix · 5 months ago
You don't even have to convert it, just rename the extension to .kepub.epub. https://github.com/kobolabs/epub-spec?tab=readme-ov-file#sid...
acabal · 5 months ago
This is not entirely correct - Kobo also expects a bunch of special <span>s inserted for things like highlighting and page numbers to work.

It kills me that Kobo is so close to having plain epubs rendered with Webkit but for some reason they just won't take the leap!

stog · 5 months ago
I discovered this too. However, I now use Plato Reader on my Kobo with standard ePub and it’s lovely.
lazyeye · 5 months ago
You can use kepubify to convert epubs to kepubs (and calibre will do this as well)

https://pgaskin.net/kepubify/

_shantaram · 5 months ago
And https://send.djazz.se automatically performs the conversion for you with kepubify and sends it to your ereader! No affiliation, just a happy camper chiming in
crtasm · 5 months ago
I assume KOReader has a better renderer for epub but will have to test how it compares to the stock software+kepub. So far I've only used KOReader on my device.
contact9879 · 5 months ago
the only issues i've found with koreader is its default margin size and its display of standard ebooks' titlepages but (I believe) these can be fixed with a fairly simple user tweaks css
RVuRnvbM2e · 5 months ago
Wow I never knew this!
robin_reala · 5 months ago
Yeah, if you just load normal epubs it defaults to an old version of Adobe Digital Editions unfortunately.
kseistrup · 5 months ago
I love Standard Ebooks.

See also Global Grey ebooks: https://www.globalgreyebooks.com/ One woman has formatted hundreds of ebooks herself.

Animats · 5 months ago
Most of the big print-on-demand companies will now make hardcovers, for about $10. You can't feed raw Gutenberg files into those mills, but these "standard ebooks" have enough formatting info for that. So that would be a useful service.
m-hodges · 5 months ago
What are some examples of companies that do this?
SamBam · 5 months ago
Are there any non-English books? When I go to the search page, language isn't even a pull-down option, so I'm guessing not.

There is a huge world of out-of-copyright non-English texts, and Project Gutenberg has many thousands of them. I wonder if any interest could be generated to help bring them in by posting on foreign language subreddits or something.

slevis · 5 months ago
Just looked through the entire website to answer this question. Seems like they only accept english books :( "Types of ebooks we don’t accept: - Non-English-language books. Translations to English are, of course, OK." (https://standardebooks.org/contribute/collections-policy)
SamBam · 5 months ago
Weird. Why the explicit rule against them?

I understand if the existing editors can't personally proofread the submissions, but that's why peer-review exists. Or an open-source project in general where people can post corrections. Jimbo Wales didn't need to speak a hundred languages to launch Wikipedia.