There have been so many instances since it's been down that I tried to access IA resources and realized they were unavailable. I'm still bitter that of all the targets a hacker could've chose, it was the IA. Couldn't have happened to a better website. I plan on upping my monthly donation as soon as I can.
What makes you think the "hacker" was just a person looking for an easy target? More likely that this was a targeted attack by those that don't like IA.
Some source information about the group that has claimed responsibility:
"A group known as SN_Blackmeta claimed responsibility for the attack, with a confusing antisemitic message that the archive “belongs to the USA” as if it were a government project."
You say this as if it is an original idea. Of course the IA is working on this and have been for over 6 years. There already is a DWeb version. They have been advancing DWeb infrastructure. The IA hosts all kinds of DWeb developer events.
But it is over 50 petabytes and the IA gets a huge amount of traffic through the regular web that they need to serve quickly and efficiently to their users.
Guess what has happened over 6 years of decentralization of 50 TB? People only seed what they want or care about and there aren't enough seeders to host. They set all this up and nobody volunteers. You're a DWeb advocate and you haven't been seeding. That's a recipe for disaster if they rely on the goodness of volunteer seeders. The IA's mission is broader. DWeb will ever only compliment the IAs mission.
How does one contribute? In the article you linked:
> there is no information on how users can get involved in the decentralized version of Archive.org and who the peers are that are distributing the content.
The other link doesn't mention how people could help host data either. If there is a way, then it seems like more of a marketing issue if those willing are unaware or unable to figure out how. I can't find any actionable steps on how to contribute.
edit - it seems the dweb version was a frontend for archive.org testing serving IA content over alternative protocols. It was never finished or expanded on unfortunately. Links to it are dead but here's the github repo https://github.com/internetarchive/dweb-archive
Can confirm that issue about people only seeding what they are interested in.
I found a dataset I wanted to hoard but the authors website was gone. A dataset site had a torrent and I said great I'll just torrent and seed that and help keep the thing alive, turns out I can't find a single seeder for the torrent.
It seems to me the various efforts are dead or stalled. Anything in actual current development or production? IPFS was supposed to go in that direction and still exists, sure, but not to provide IA duplication (that is advertised.)
You're describing a network effects problem, specifically a collaborative game failure. Need some mechanism designers and big tech cos to jump in, stat!
I'm working on this, ArchiveBox v0.8 adds the beginnings of a content addressable store, with plans for bittorrent-backed instance-to-instance sharing in a later version.
I think Archive.org should still exist too (and ArchiveBox donates + submits URLs to Archive.org too), but having a self-hosted option where you can archive personal stuff that requires a login, and do P2P sharing with with fine grained permissions is a gap that should be filled.
Aiming to archive the entire internet is Archive.org's goal, aiming to archive the part of the internet YOU care about is our goal.
[I know that some percentage between 95 and 100 of crypto projects are a scam. I personally believe this one isn't, after much diligent reading. Whether it gets released or does what it claims it will do is another question, but please do spare me the kneejerk anti-crypto reactions, if you can. Just because they're almost all money-making scams, doesn't mean they're all money-making scams.]
That would be an awful lot of replication or very shitty archive. Decentralization works when each node can serve all the functions and content alone or when you don't care about completeness.
Unless I'm missing something, an archive is not something small or something that's just as good when part of it is missing.
I kind of agree, but the way the internet is going, with everyone being behind carrier-grade nat, it's not much of a decentralized network of computers anymore, not to mention all the kids with their laptops and tablets not even hosting anything :(
There are ways around this, I've experimented with setting up a cluster of ArchiveBox instances that share snapshots over Tailscale. Tailscale lets users sign up for free accounts, and you can share machines between separate accounts. A (CGNAT-compatible) decentralized invite-only network could concievably spread that way.
that will never happen. no one is going to be able to seed the amount of data that IA has. The only thing they can hope for is that a company like Google or CF provides another data center for them.
I don't know if bittorrent has improved - but 20 years ago I had a personal issue with it.
At that time our son was using it for games. He goes away to college and came home for the first school break. I get a phone call from our internet provider asking if our son was home. I was so shocked and handed the phone to our son.
Apparently at that time bittorrent was optimizing for the most efficient path to a host. Since we had relatively good connection, the mighty weight of the internet was funnelling through our tiny internet provider to our son's computer. The provider (without our knowing it) had made a deal with our son that he would only turn on bittorrent between midnight and 6 AM. I doubt other providers would be so generous.
I have been sceptical of bittorrent since that day.
All clients today (and probably back then) have options to limit bandwidth consumption including throttling, scheduling, and total data transfer caps. For serving mostly HTML and images, dedicating even 10% of a home broadband connection to serving content would allow many, many people per day to access archived pages.
I seed a torrent of the SICP lectures that originally came from IA, I'll have to see if that's still up and if there's some way of getting the other torrents from the tracker.
If you're lucky there's other seeds around, and not just the IA web seeds which (I assume?) are down too.
The Internet Archive is not, in fact, completely online (as the article explains but the title doesn’t). The Wayback Machine, which is part of it, is kind of online but (in my experience) you are going to experience HTTP 504 timeouts from time to time on the first query for a given (URL, date) pair as it seemingly goes out to slower storage. (Long delays happened in the past occasionally as well but not to the point of a 504.)
Or perhaps it was to make IA better.
"A group known as SN_Blackmeta claimed responsibility for the attack, with a confusing antisemitic message that the archive “belongs to the USA” as if it were a government project."
https://9to5mac.com/2024/10/15/internet-archive-data-breach-...
"Internet Archive Cyber Attacked by Pro-Palestinian Hackers"
https://www.cybersecurityintelligence.com/blog/internet-arch...
"Anti-Israel hacker group hacks 'Internet Archive', exposing 31 million users"
https://www.ynetnews.com/business/article/bkird2rjke
Maybe they needed this wakeup call before someone could, say, remove all of their data
https://blog.cloudflare.com/cloudflares-always-online-and-th...
But it is over 50 petabytes and the IA gets a huge amount of traffic through the regular web that they need to serve quickly and efficiently to their users.
Guess what has happened over 6 years of decentralization of 50 TB? People only seed what they want or care about and there aren't enough seeders to host. They set all this up and nobody volunteers. You're a DWeb advocate and you haven't been seeding. That's a recipe for disaster if they rely on the goodness of volunteer seeders. The IA's mission is broader. DWeb will ever only compliment the IAs mission.
https://blog.archive.org/2021/02/18/behind-the-scenes-of-the...
https://www.bleepingcomputer.com/news/technology/archiveorg-...
> there is no information on how users can get involved in the decentralized version of Archive.org and who the peers are that are distributing the content.
The other link doesn't mention how people could help host data either. If there is a way, then it seems like more of a marketing issue if those willing are unaware or unable to figure out how. I can't find any actionable steps on how to contribute.
edit - it seems the dweb version was a frontend for archive.org testing serving IA content over alternative protocols. It was never finished or expanded on unfortunately. Links to it are dead but here's the github repo https://github.com/internetarchive/dweb-archive
I found a dataset I wanted to hoard but the authors website was gone. A dataset site had a torrent and I said great I'll just torrent and seed that and help keep the thing alive, turns out I can't find a single seeder for the torrent.
I’ve always been fascinated by this post.
Deleted Comment
Bittorrent works well for popular things but fails for marginal content (unless some really dedicated individuals step in.)
What the internet archive provides is a way to have access to many many resources which you didn't know you needed in advance.
Deleted Comment
I think Archive.org should still exist too (and ArchiveBox donates + submits URLs to Archive.org too), but having a self-hosted option where you can archive personal stuff that requires a login, and do P2P sharing with with fine grained permissions is a gap that should be filled.
Aiming to archive the entire internet is Archive.org's goal, aiming to archive the part of the internet YOU care about is our goal.
[I know that some percentage between 95 and 100 of crypto projects are a scam. I personally believe this one isn't, after much diligent reading. Whether it gets released or does what it claims it will do is another question, but please do spare me the kneejerk anti-crypto reactions, if you can. Just because they're almost all money-making scams, doesn't mean they're all money-making scams.]
Unless I'm missing something, an archive is not something small or something that's just as good when part of it is missing.
At that time our son was using it for games. He goes away to college and came home for the first school break. I get a phone call from our internet provider asking if our son was home. I was so shocked and handed the phone to our son.
Apparently at that time bittorrent was optimizing for the most efficient path to a host. Since we had relatively good connection, the mighty weight of the internet was funnelling through our tiny internet provider to our son's computer. The provider (without our knowing it) had made a deal with our son that he would only turn on bittorrent between midnight and 6 AM. I doubt other providers would be so generous.
I have been sceptical of bittorrent since that day.
As the videos are present on archive.org but it is down and i was unable to find them anywhere else online ?
Also, yt-dlp is also not working: https://github.com/yt-dlp/yt-dlp/issues/10128
Example: https://ocw.mit.edu/courses/7-016-introductory-biology-fall-...
If you're lucky there's other seeds around, and not just the IA web seeds which (I assume?) are down too.
magnet:?xt=urn:btih:1814b8e2673e8a4547fd9c4f1a417b05860230b4&dn=MIT_Structure_of_Computer_Programs_1986&tr=http%3A%2F%2Fbt1.archive.org%3A6969%2Fannounce&tr=http%3A%2F%2Fbt2.archive.org%3A6969%2Fannounce&ws=https%3A%2F%2Farchive.org%2Fdownload%2F&ws=http%3A%2F%2Fia600204.us.archive.org%2F15%2Fitems%2F
The Internet Archive itself is still down.
“Lisp lore : a guide to programming the Lisp machine”
https://archive.org/details/lisploreguidetop0000brom
I discover this reference and boom the Internet Archive book is not available:(
“Wayback Machine (provisional, read-only) service.
Other Internet Archive services are temporarily offline.
Please check our official accounts, including Twitter/X, Bluesky or Mastodon for the latest information.
We apologize for the inconvenience.”