Visualizing all books of the world in ISBN-Space

Wow.

When we started Amazon, this was precisely what I wanted to do, but using Library of Congress triple classifications instead of ISBN.

It turned out to be impossible because the data provider (a mixture of Baker & Tayler (book distributors) and Books In Print) munged the triple classification into a single string, so you could not find the boundaries reliably.

Had to abandon the idea before I even really got started on it, and it would certainly have been challenging to do this sort of "flythrough" in the 1994-1995 version of "the web".

Kudos!

dredmorbius · a year ago

What are you referring to as the LoC triple classification?

I've spent quite some time looking at both the LoC Classification and the LoC Subject Headings. Sadly the LoC don't make either freely available in a useful machine-readable form, though it's possible to play games with the PDF versions. I'd been impressed by a few aspects of this, one point that particularly sticks in my mind is that the state-law section of the Classification shows a very nonuniform density of classifications amongst states. If memory serves, NY and CA are by far the most complex, with PA a somewhat distant third, and many of the "flyover" states having almost absurdly simple classifications, often quite similar. I suspect that this reflects the underlying statutory, regulatory, and judical / caselaw complexity.

Another interesting historical factoid is that the classification and its alphabetic top-level segmentation apparently spring directly from Thomas Jefferson's personal library, which formed the origin of the LoC itself.

For those interested, there's a lot of history of the development and enlargement of the Classification in the annual reports of the Librarian of Congress to Congress, which are available at Hathi Trust.

Classification: <https://www.loc.gov/catdir/cpso/lcco/>

Subject headings: <https://id.loc.gov/authorities/subjects.html>

Annual reports:

- Recent: <https://www.loc.gov/about/reports-and-budgets/annual-reports...>

- Historical archive to ~1866: <https://catalog.hathitrust.org/Record/000072049>

smcin · a year ago

Never knew about LoC book Classification till now; based on what I read I'd call it a failed US-wide attempt to standardize US collections (not international ones). Neat as it is, it's not free to access ($; why??), it's not used outside US(/Canada) and it's not used as standard by US booksellers or libraries, and it's anglocentric as noted in [0] (an alternative being Harvard–Yenching Classification, for Chinese books). Also that's disappointing you say that the states vary greatly in applying that segmentation.

[0]: https://en.wikipedia.org/wiki/Library_of_Congress_Classifica...

CRConrad · a year ago

> What are you referring to as the LoC triple classification?

Lines of actually working code; lines of commented-out inactive code; lines of explanatory comments. HTH!

Naah, gotcha, the other "LoC"... But only got it on about the third occurrence.

ilamont · a year ago

> a mixture of Baker & Tayler (book distributors)

Having dealt with Baker & Taylor in the past, this doesn't surprise me in the least. It was one of the most technologically backwards companies I've ever dealt with. Purchase orders and reconciliations were still managed with paper, PDFs, and emails as of early 2020 (when I closed my account). I think at one point they even had me faxing documents in.

PaulDavisThe1st · a year ago

A bit tangential but one of my favorite early amzn stories is when a small group from Ingram (at the time, the other major US book distributor) came to visit us in person (they were not very far away ... by design).

It was clear that they were utterly gobsmacked that a team of 3 or 4 people could have done what we have done in the time that we had done it. They had apparently contemplated getting into online retail directly, but saw two big problems: (a) legal and moral pushback from publishers who relied on Ingram just being a distributor (b) the technological challenge. I think at the time their IT staff numbered about 20 or so. They just couldn't believe what they were seeing.

Good times (there weren't very many of those for me in the first 14 months) :)

It’s not uncommon for an ISBN to have been assigned multiple times to different books [0]. Thus “all books in ISBN space” may be an overstatement.

There’s also the problem of books with invalid ISBNs, i.e. where the check digit doesn’t match the rest of the ISBN, but where correcting the check digit would match a different book. These books would be outside of the ISBN space assumed by the blog post.

[0] https://scis.edublogs.org/2017/09/28/the-dreaded-case-of-dup...

mormegil · a year ago

And possibly not even assigned at all. I looked at the lowest known ISBNs for Czech publishers and a different color stood out: no, https://books.google.cz/books?vid=ISBN9788000000015&redir_es... is not a correct ISBN, I'd say :-) (But I don't know if the book includes such obviously-fake ISBN, or the error is just in Google Books data.)

Finnucane · a year ago

Publishers buy blocks of isbns based on expected need, how the actually assign them may be arbitrary.

layer8 · a year ago

rsecora · a year ago

Impressive presentation.

Note: The presentation reflects the contents of Anna's archive exclusively, rather than the entire ISBN catalog. There is a discernible bias towards a limited range of languages, due to Anna's collection bias to those languages. The sections marked in black represent the missing entries in the archive.

phiresky · a year ago

That's not entirely accurate since AA has separate databases for books they have as files, and one for books they only know the metadata of. The metadata database comes from various sources and as far as I know is pretty complete.

Black should mostly be sections that have no assigned books

bloak · a year ago

I found some books which are available from dozens of online bookshops but which are not in this visualisation. Perhaps they're not yet in any library that feeds into worldcat.org, though some of them were about five years old.

keepamovin · a year ago

Wow, that is really cool. What an amazing passion project and what an incredible resource!

Zooming in you can see the titles, the barcode and hovering get a book cover and details. Incredible, everything you could want!

Some improvement ideas: checkbox to hide the floating white panel at top left, and the thing at top right. Because I really like to "immerse" in these visualizations, those floaters lift you out of that experience to some extent, limiting fun and functionality for me a bit.

robwwilliams · a year ago

Ah, this is a perfect application for Microsift SilverLight PivotViewer, a terrific web interface we used for neuroimaging until Microsoft pulled the plug.

There is an awe inspiring TED talk by Gary W. Flake demonstrating its use.

https://m.youtube.com/watch?v=LT_x9s67yWA

And here is our IEEE paper from 2011.

Really sorry this is not a web standard.

https://www.dropbox.com/scl/fi/bl8zkjs3y47q3377hh3ya/Yan_Wil...

c-fe · a year ago

Very cool visualisation!

There are more cool submissions here https://software.annas-archive.li/AnnaArchivist/annas-archiv...

Mine is at https://isbnviz.pages.dev

255 · a year ago

When you zoom in it's book shelves! That's so cool

MeteorMarc · a year ago

Possible improvement: paperback and bounded editions are shown next to each other, but look the same. Do not know about the e-books.

greenie_beans · a year ago

those would be totally different isbns. to connect the related editions you'd probably need to get something like the FBR records for each work and idk if anna's archive has related books like that?

grues-dinner · a year ago

Awesome. A real life Library of Babel: https://libraryofbabel.info/

Out of all the VR vapourware, a real life infinite library or infinite museum is the one thing that could conceivably get me dropping cash.

WillAdams · a year ago

Unfortunately, the writers won't see any of that for this particular implementation.

It would be far more interesting as a project which tried to make all legitimately available downloadable texts accessible, say as an interface to:

https://onlinebooks.library.upenn.edu/