When we started Amazon, this was precisely what I wanted to do, but using Library of Congress triple classifications instead of ISBN.
It turned out to be impossible because the data provider (a mixture of Baker & Tayler (book distributors) and Books In Print) munged the triple classification into a single string, so you could not find the boundaries reliably.
Had to abandon the idea before I even really got started on it, and it would certainly have been challenging to do this sort of "flythrough" in the 1994-1995 version of "the web".
What are you referring to as the LoC triple classification?
I've spent quite some time looking at both the LoC Classification and the LoC Subject Headings. Sadly the LoC don't make either freely available in a useful machine-readable form, though it's possible to play games with the PDF versions. I'd been impressed by a few aspects of this, one point that particularly sticks in my mind is that the state-law section of the Classification shows a very nonuniform density of classifications amongst states. If memory serves, NY and CA are by far the most complex, with PA a somewhat distant third, and many of the "flyover" states having almost absurdly simple classifications, often quite similar. I suspect that this reflects the underlying statutory, regulatory, and judical / caselaw complexity.
Another interesting historical factoid is that the classification and its alphabetic top-level segmentation apparently spring directly from Thomas Jefferson's personal library, which formed the origin of the LoC itself.
For those interested, there's a lot of history of the development and enlargement of the Classification in the annual reports of the Librarian of Congress to Congress, which are available at Hathi Trust.
Never knew about LoC book Classification till now; based on what I read I'd call it a failed US-wide attempt to standardize US collections (not international ones).
Neat as it is, it's not free to access ($; why??), it's not used outside US(/Canada) and it's not used as standard by US booksellers or libraries, and it's anglocentric as noted in [0] (an alternative being Harvard–Yenching Classification, for Chinese books). Also that's disappointing you say that the states vary greatly in applying that segmentation.
Having dealt with Baker & Taylor in the past, this doesn't surprise me in the least. It was one of the most technologically backwards companies I've ever dealt with. Purchase orders and reconciliations were still managed with paper, PDFs, and emails as of early 2020 (when I closed my account). I think at one point they even had me faxing documents in.
A bit tangential but one of my favorite early amzn stories is when a small group from Ingram (at the time, the other major US book distributor) came to visit us in person (they were not very far away ... by design).
It was clear that they were utterly gobsmacked that a team of 3 or 4 people could have done what we have done in the time that we had done it. They had apparently contemplated getting into online retail directly, but saw two big problems: (a) legal and moral pushback from publishers who relied on Ingram just being a distributor (b) the technological challenge. I think at the time their IT staff numbered about 20 or so. They just couldn't believe what they were seeing.
Good times (there weren't very many of those for me in the first 14 months) :)
It’s not uncommon for an ISBN to have been assigned multiple times to different books [0]. Thus “all books in ISBN space” may be an overstatement.
There’s also the problem of books with invalid ISBNs, i.e. where the check digit doesn’t match the rest of the ISBN, but where correcting the check digit would match a different book. These books would be outside of the ISBN space assumed by the blog post.
And possibly not even assigned at all. I looked at the lowest known ISBNs for Czech publishers and a different color stood out: no, https://books.google.cz/books?vid=ISBN9788000000015&redir_es... is not a correct ISBN, I'd say :-) (But I don't know if the book includes such obviously-fake ISBN, or the error is just in Google Books data.)
Note: The presentation reflects the contents of Anna's archive exclusively, rather than the entire ISBN catalog. There is a discernible bias towards a limited range of languages, due to Anna's collection bias to those languages. The sections marked in black represent the missing entries in the archive.
That's not entirely accurate since AA has separate databases for books they have as files, and one for books they only know the metadata of. The metadata database comes from various sources and as far as I know is pretty complete.
Black should mostly be sections that have no assigned books
I found some books which are available from dozens of online bookshops but which are not in this visualisation. Perhaps they're not yet in any library that feeds into worldcat.org, though some of them were about five years old.
Wow, that is really cool. What an amazing passion project and what an incredible resource!
Zooming in you can see the titles, the barcode and hovering get a book cover and details. Incredible, everything you could want!
Some improvement ideas: checkbox to hide the floating white panel at top left, and the thing at top right. Because I really like to "immerse" in these visualizations, those floaters lift you out of that experience to some extent, limiting fun and functionality for me a bit.
Ah, this is a perfect application for Microsift SilverLight PivotViewer, a terrific web interface we used for neuroimaging until Microsoft pulled the plug.
There is an awe inspiring TED talk by Gary W. Flake demonstrating its use.
those would be totally different isbns. to connect the related editions you'd probably need to get something like the FBR records for each work and idk if anna's archive has related books like that?
When we started Amazon, this was precisely what I wanted to do, but using Library of Congress triple classifications instead of ISBN.
It turned out to be impossible because the data provider (a mixture of Baker & Tayler (book distributors) and Books In Print) munged the triple classification into a single string, so you could not find the boundaries reliably.
Had to abandon the idea before I even really got started on it, and it would certainly have been challenging to do this sort of "flythrough" in the 1994-1995 version of "the web".
Kudos!
I've spent quite some time looking at both the LoC Classification and the LoC Subject Headings. Sadly the LoC don't make either freely available in a useful machine-readable form, though it's possible to play games with the PDF versions. I'd been impressed by a few aspects of this, one point that particularly sticks in my mind is that the state-law section of the Classification shows a very nonuniform density of classifications amongst states. If memory serves, NY and CA are by far the most complex, with PA a somewhat distant third, and many of the "flyover" states having almost absurdly simple classifications, often quite similar. I suspect that this reflects the underlying statutory, regulatory, and judical / caselaw complexity.
Another interesting historical factoid is that the classification and its alphabetic top-level segmentation apparently spring directly from Thomas Jefferson's personal library, which formed the origin of the LoC itself.
For those interested, there's a lot of history of the development and enlargement of the Classification in the annual reports of the Librarian of Congress to Congress, which are available at Hathi Trust.
Classification: <https://www.loc.gov/catdir/cpso/lcco/>
Subject headings: <https://id.loc.gov/authorities/subjects.html>
Annual reports:
- Recent: <https://www.loc.gov/about/reports-and-budgets/annual-reports...>
- Historical archive to ~1866: <https://catalog.hathitrust.org/Record/000072049>
[0]: https://en.wikipedia.org/wiki/Library_of_Congress_Classifica...
Lines of actually working code; lines of commented-out inactive code; lines of explanatory comments. HTH!
Naah, gotcha, the other "LoC"... But only got it on about the third occurrence.
Having dealt with Baker & Taylor in the past, this doesn't surprise me in the least. It was one of the most technologically backwards companies I've ever dealt with. Purchase orders and reconciliations were still managed with paper, PDFs, and emails as of early 2020 (when I closed my account). I think at one point they even had me faxing documents in.
It was clear that they were utterly gobsmacked that a team of 3 or 4 people could have done what we have done in the time that we had done it. They had apparently contemplated getting into online retail directly, but saw two big problems: (a) legal and moral pushback from publishers who relied on Ingram just being a distributor (b) the technological challenge. I think at the time their IT staff numbered about 20 or so. They just couldn't believe what they were seeing.
Good times (there weren't very many of those for me in the first 14 months) :)
There’s also the problem of books with invalid ISBNs, i.e. where the check digit doesn’t match the rest of the ISBN, but where correcting the check digit would match a different book. These books would be outside of the ISBN space assumed by the blog post.
[0] https://scis.edublogs.org/2017/09/28/the-dreaded-case-of-dup...
Note: The presentation reflects the contents of Anna's archive exclusively, rather than the entire ISBN catalog. There is a discernible bias towards a limited range of languages, due to Anna's collection bias to those languages. The sections marked in black represent the missing entries in the archive.
Black should mostly be sections that have no assigned books
Zooming in you can see the titles, the barcode and hovering get a book cover and details. Incredible, everything you could want!
Some improvement ideas: checkbox to hide the floating white panel at top left, and the thing at top right. Because I really like to "immerse" in these visualizations, those floaters lift you out of that experience to some extent, limiting fun and functionality for me a bit.
There is an awe inspiring TED talk by Gary W. Flake demonstrating its use.
https://m.youtube.com/watch?v=LT_x9s67yWA
And here is our IEEE paper from 2011.
Really sorry this is not a web standard.
https://www.dropbox.com/scl/fi/bl8zkjs3y47q3377hh3ya/Yan_Wil...
There are more cool submissions here https://software.annas-archive.li/AnnaArchivist/annas-archiv...
Mine is at https://isbnviz.pages.dev
Out of all the VR vapourware, a real life infinite library or infinite museum is the one thing that could conceivably get me dropping cash.
It would be far more interesting as a project which tried to make all legitimately available downloadable texts accessible, say as an interface to:
https://onlinebooks.library.upenn.edu/