I think a bigger problem than 38% of webpages being dead is a lot of it is entities/groups/businesses now use facebook pages almost exclusively and have no other web presence outside of Facebook. In other words a Facebook account becomes a requirement to interact with them.
The same happened with forums. They're all subreddits, Facebook groups or Discord chats now. A lot of valuable information is kept hidden in those groups now, and it makes me really sad.
I love forums. I've kept the DIY Book Scanner forum online since... 2009? Recently (last two years) these damn AI scrapers have killed PHPBB over and over again. They got me kicked off my shared web hosting plan by abusing search and other forum features.
I upgraded to VPS for $500. The other admin spent 15-20 hours fixing/troubleshooting/transferring. And you know what? At the end of all this, I paid to give my data to these jerks, to keep it online for them to harvest. The forums are dead quiet.
Now I think, Discord is fine. They'll just sell the data to AI companies directly, the burden won't fall on me.
Reddit at least shows up in searches. I also think it's important not to look at the past with rose colored glasses. I think some random forum is much more likely to disappear than a subreddit.
Car forums are still alive but yea the shift from thread discussions to comment and/or video discussions really kills a lot of knowledge. It’s great to find old forum posts showing you how to work on your car. It’s tiresome skipping through videos to find what you need to or even searching Reddit.
The big thing about discord is you can chat now with people but the knowledge is not in a good format to come back to later.
I'm probably in the minority of people who appreciate that trend. Valuable information being hidden means that community comes before information. If you want to gain access to other people's knowledge you have to opt-in and interact with and understand the people who made it, and that creates an incentive to contribute back and use knowledge in an appropriate way.
The open internet seems increasingly predatory and a place where some gigantic ML company just vacuums up your stuff or resells your content for ad revenue, parasitic.
I don't mind the fact and think it's honestly a natural reaction to this that people guard their information. It's sort of like a medieval monastery version of the internet where people recognize that information is cultivated rather than just some commodity you scrape off the web.
AI will make all that even worse.
Data staying hidden behind nice UX is VERY bad news.
May be all that will lead to an equivalent of open-source, but for data.
Not only a lot of communities are hidden because of Discord (at least with Reddit they were more discoverable), the worst part is the fact that they are unsearchable or behind a paywall.
Like the "join my discord if you pay at least 3$/mo!" is pretty innocent but you are gatekeeping a community that before was pubblic.
If we are talking about something like a content creator focused about an hobby or pc problems you can see how Google will become even more useless.
Reddit was the least bad choice between it and Discord but has failed the "i want to be a social network".
I only use Facebook to stay in touch with widely dispersed family members. Nothing else. One peek a day to see what's up. Assuming you have an account, I find this makes the task much easier:
And meta keeps things endlessly. Not just a hyper compressed picture and a set of references to local files. That part of the siloed web vanishes too, just less dangly and obvious.
Are there any businessses of any notable size that are using Facebook alone? Local businesses near me have plenty of info on Google Maps. The website if they have one is usually out of date, but calling them directly answers my questions.
Also 38% of a web filled with diversity, no hidden agenda, and amateurs (in the first best of ways). This number is probably now .00001% of a much bigger, far more homogeneous web. a web 1.0 site > today's walled garden "group page".
I've been to restaurants where they only have the menu in digital and uploaded to FB. And they looked at me as if I was a weirdo when I told them I don't use FB.
Many times I recommend to my clients to use Facebook instead of their own websites. It was overkill. Often having your own website is a waste of money.
You used to be able to see a custom feed of a selected friend lists but since they removed that option the site has been completely unusable, unless perhaps you do something like remove 90% of your "friends" and groups but that would hurt usability in different ways.
Some of the interactive stuff on old BBC election coverages still almost work to this day.
Hard to imagine that with many sites now 20 years on. It's not even that it;s impossible with the technology, it's probably closer to how writing got worse after the invention of the word processor. Every thing is managed and structured now so the freedom / bubble needed to make things good in a way that can't be easily explained is gone.
Be sure to donate some quid to the Internet Archive (archive.org) to support their efforts to preserve (not just) old content, then do your best to make local copies of anything you find of value, just in case they disappear one day. A good number of mostly technical pages I have in my bookmarks file, that grew steadily and has been moved during installations for over 20 years, now point to their latest complete backup before the said page went silent. The Internet Archive is a huge boon to everyone.
I realized I was overusing bookmarks. I now save webpages (perhaps as PDF) if it contains information I want to refer to later, such as an insightful article, technical information, a humorous bit, or the like.
Bookmarks are good only for links to things for which only the most current version is worth accessing. That’s my banking websites, a shopping site, my employer’s remote desktop system, etc.
There's also https://archivebox.io which can take your bookmarks and archive them in many ways. Unfortunately back when I tried it last time it was a big buggy, I wish there was a better solution to build a nice archive of the sites I visit more often just in case.
I save webpages as PDF because they retain the images and fonts of the original page. One issue I run into is that sticky headers/footers used on websites often obscure top/bottom text of the page when exported to PDF. This can be addressed by using UBO to remove the sticky DOM elements before saving, but it's a bit of a hassle.
Others have recommended ArchiveBox, I will recommend using any bookmarking tool that fires off a web request to the Wayback Machine to archive a page when you create the bookmark.
I wish the Internet Archive would split itself into two entities: one that simply archives web sites, and the other that does everything else (e.g., edgy IP testing of ebooks and video games). That way if the "other" entity gets sued into oblivion, the web sites remain. I think what the former is doing is a critical service for humankind, and I do donate, but I worry about their future.
I have run a news website since 2019. Every hour, I have a crawler look for dead links. I replace about one link a day with a link to archive.org. The funniest ones are the day after an election when all the candidate websites go blank. The saddest are the government websites that go offline from 3am to 5am every week.
I'm surprised it's not more. 2013 was long after the days of hobbyist websites of the early net, and into the time when most new sites were business driven. Given how long businesses last I'd expect many more sites to be long gone 11 years later. I guess maybe the death of a lot of community-building spaces (angelfire, Geocities, etc) probably counts for a lot of them going.
What would be particularly interesting would be to graph how long websites last for. I suspect quite a lot of the content from the early days is still around, and this period (2008 - 2018) is the peak of sites vanishing.
I hope not all things last forever. A while back I stumbled upon my first .com, from the 90s, which was hosted on Angelfire and dutifully rehosted by archive.org and it went about how you'd imagine.
Despite being in 4th grade when my little friend and I made the webpage, things on there (while fine for the era) are just not okay by today's standards even if I understand the context for what led to it being there. It was nothing terrible, but just distasteful in a blissfully unaware way a 4th grader in the 90's would be. I realize that stuff will probably never be off my conscience and I just have to deal with it and hope nobody sees it.
I have similar material. If it's reassuring, we all were just kids/teens and learning of the world. I feel a lot for the youth after us that made the Internet more accessible and, at times, more permanent.
Everything on internet is intrisically ephemeral. Embrace that instead of fighting against it. If you want to archive stuff then make offline copies. PDF/A (especially the -1 and -2 versions) is format explicitly designed for archiving and works well for static content.
I think it is bit of a shame that mirroring is not more readily built into web stack (=http/html); if you could trivially make links that included local copy (as fallback?) this linkrot would be far lesser concern. The way how for example wikipedia links everything through archive.org is bit of a hack imho
Agree. Sometimes you just experiment with something, put up a tiny website somewhere... forget about it until you decide it's no longer relevant for whatever reason and you pull the plug on it... it's not a bad thing. But it's great to have stuff like web archives though, to keep our collective memory for worthwhile content. I specially hope that accurate accounts of events gets preserved, as it was originally written, somewhere it can't be changed. That's because rewriting history seems to be a favourite these days and preserving the original accounts as things were happening can help combat this, and even if the account were not completely accurate, it can help understand the actions of contemporary actors - i.e. you may be able to understand what they thought was true at the time, even if that was later revealed to be incorrect.
I upgraded to VPS for $500. The other admin spent 15-20 hours fixing/troubleshooting/transferring. And you know what? At the end of all this, I paid to give my data to these jerks, to keep it online for them to harvest. The forums are dead quiet.
Now I think, Discord is fine. They'll just sell the data to AI companies directly, the burden won't fall on me.
The big thing about discord is you can chat now with people but the knowledge is not in a good format to come back to later.
The open internet seems increasingly predatory and a place where some gigantic ML company just vacuums up your stuff or resells your content for ad revenue, parasitic.
I don't mind the fact and think it's honestly a natural reaction to this that people guard their information. It's sort of like a medieval monastery version of the internet where people recognize that information is cultivated rather than just some commodity you scrape off the web.
Not only a lot of communities are hidden because of Discord (at least with Reddit they were more discoverable), the worst part is the fact that they are unsearchable or behind a paywall.
Like the "join my discord if you pay at least 3$/mo!" is pretty innocent but you are gatekeeping a community that before was pubblic.
If we are talking about something like a content creator focused about an hobby or pc problems you can see how Google will become even more useless.
Reddit was the least bad choice between it and Discord but has failed the "i want to be a social network".
Deleted Comment
https://www.facebook.com/?filter=friends
You used to be able to see a custom feed of a selected friend lists but since they removed that option the site has been completely unusable, unless perhaps you do something like remove 90% of your "friends" and groups but that would hurt usability in different ways.
If a business is only on Facebook, I don't do business with them as I don't use Facebook.
A win-win in my book, as I prefer doing business with people whose ethics overlap with my own.
Dead Comment
http://news.bbc.co.uk/hi/english/static/in_depth/americas/20...
http://edition.cnn.com/SPECIALS/2001/trade.center/index.html
Don't expect many of the links to work properly, but it's still interesting to see what the web used to look like.
Hard to imagine that with many sites now 20 years on. It's not even that it;s impossible with the technology, it's probably closer to how writing got worse after the invention of the word processor. Every thing is managed and structured now so the freedom / bubble needed to make things good in a way that can't be easily explained is gone.
Bookmarks are good only for links to things for which only the most current version is worth accessing. That’s my banking websites, a shopping site, my employer’s remote desktop system, etc.
0: https://epub.press
I like the idea that in addition to saving the page, you can annotate it as well.
What would be particularly interesting would be to graph how long websites last for. I suspect quite a lot of the content from the early days is still around, and this period (2008 - 2018) is the peak of sites vanishing.
- Geocities
- University-provided FTP folder (deleted after you graduate)
- ISP-provided FTP folder (all those Earthlink, Juno, Comcast sites: probably deleted)
Despite being in 4th grade when my little friend and I made the webpage, things on there (while fine for the era) are just not okay by today's standards even if I understand the context for what led to it being there. It was nothing terrible, but just distasteful in a blissfully unaware way a 4th grader in the 90's would be. I realize that stuff will probably never be off my conscience and I just have to deal with it and hope nobody sees it.
Thankfully, even the archive occasionally takes stuff off.
I think it is bit of a shame that mirroring is not more readily built into web stack (=http/html); if you could trivially make links that included local copy (as fallback?) this linkrot would be far lesser concern. The way how for example wikipedia links everything through archive.org is bit of a hack imho