The idea is interesting and even kind of tech-funny.
But man, you really have to explain how it works a bit better. At first I thought that we should redirect 404s to your website and I was: "??".
What I understood:
Which each iteration of the website, you archive the old one on a specific subdomain. Then, you redirect all 404s of the new website to the old one. Like that: no link is broken.
Yeah, I didn't understand and jumped straight to the comments to see if there was an easy explanation here. Guess I should have clicked "How?" at the top.
Even after reading that page, I didn't understand that this was a suggestion to start regularly archiving old versions of the site and only sending visitors there if their link didn't point to anything on the current site. Instead, I thought the idea was that, for software reasons I don't understand, web developers commonly changed the subdomain name for the main site and this was just a method for reducing the number of broken links when such a change was made.
FWIW Brave does that. If you reach a page with a 404 a banner will appear on top with a button to try and navigate to the latest version of the page you're looking for at the archive.
Ignoring database defined URLS, which make this harder or at least different, you could automate this to some degree with snapshots (for filesystems that support them), using the snapshot time as the subdomain or path prefix that references an older version, and the 404 page noting the current domain, finding the next oldest snapshot, and redirecting to that next subdomain or path prefix.
404s are useful though.. how annoying would it be if you wanted to get to a specific part of the site, and you kept getting redirected to somewhere else without being informed that the part you are trying to access doesn't actually exist. 4 errors exist for a reason.
This is right - 404 pages are the right UX when you have a 'stateful' resource that can be deleted, and you need to show that the URL (or ID param within) is correct, and once pointed to a resource, but now that resource has been permanently deleted and can't be shown any more.
In a sense this information conveyed by the 404 page is now the immutable 'resource' that will stay permanently at that URL. Doing a redirect breaks this, it's lossy and usually a bad UX.
Quite the contrary: if the _resource_ still exists but in the older site, the redirection will point the browser to the other location. OP's stratagem is doing exactly what it's supposed to do (if the identifier of a page is its path and not the full URL of course)
The subpage called "How"[1] covers this though. The idea is to redirect you through all the historical versions and then to a 404 if no match is found on any of them.
doesn't this mean when they end up on your 1997 site (without realizing it, because all they did was click a link), and then try to navigate around, they're stuck in the old version of your site?
edit: maybe not, because the old site was written to assume it's running at the current site's subdomain? i guess it depends on much you've changed your URL structure since then. that thought makes me a little squeemish.
it seems like a nice approach would be to return the 404, and make your 404 page render a link that says "try an archived version?". you gotta let the user know that what they're about to see might be stale.
The home page is slightly misleading, the nginx config is of course the easy part, the hard part is to correctly archive all the previous versions of your website every time you make a change. I thought this website provided an archival service but it does not.
People who care enough about preserving history and current links probably already do that. People who don't care aren't going to start now because of this page. Especially those who have dynamic content and probably don't want to keep running a million different versions of their backend forever.
If you like something on the web then make a copy.
Doing this as a consumer and doing this as a webmaster are different processes.
As a webmaster, if it's at all possible to go static (whatever your flavor of that is), then do that. A static website is easy to host and keep forever, and it's usually easy for consumers to archive too.
Not to hate on PHP, but keeping older PHP sites around securely has become a major undertaking. You can't safely run a wordpress site that hasn't been updated in 5 years because your security vulnerabilities are exposed to the wide web. If your static site generator has security flaws.. well that doesn't affect your current build artifacts and you can still run the thing in secure ways.
A better solution might be to start the archiving process from the beginning. Having a main page that links to content stored on 2020.website.com. The following year, publish new content on 2021.website.com, etc.
Remember "Cool URIs Don't Change"? I think about it a lot.
It was written toward the end of the BOHF's reign, when a technical specialist of the web had quite a lot of sway, when their decisions about a site's information architecture and how it was run was, if not the law, at least a very heavy hand on the till.
Those days are long past. Now Ted in Marketing wants a URL and who are you not to give it to him? I remember the pain of creating vanity top-level URLs in SharePoint 2003 because some functionary wanted them, and then they would promptly forget what they demanded. Yes, I used to use 410 Gones where appropriate.
That sort of thing has not been in our hands for quite a white, even if it is probably the best thing to do. After all, has the product URL changed? Or will it be back? Or has it been discontinued? The correct HTTP response, properly and widely used, would be very helpful in moving so much of the web forward but that is not under our control. Hasn't been for a while.
First, it unduly burdens the server, in sending multiple redirects to cover the entire search space of possible versions of a URL - for a mature site, this could be a lot of redirects. It also unduly burdens the client, in following them, and the network between the two.
Second, 302 is the wrong type of redirect to use here, because it is temporary; a well-tempered user agent will treat it as such, necessitating the same cascade of redirects on followup visits. The right way to do this is with a 301, which has a semantic of permanency, and is treated as such by user agents. But it's still the wrong thing to do.
Maintaining access to older versions of websites is, again, an entirely desirable thing to do. But if you're going to do it in a way that requires work on the server (as this design also does), you're better off just having the server maintain version information and serve the latest available page at a given URL, in a 200 response, when the URL is accessed.
I don't have any opinion on whether 302 or 301 is the better choice, I'd just like to point out that using 302 seems to have been a deliberate decision:
Google treats 302s and 301s differently when it crawls. A 302 is effectively ignored and the search index isn't updated - the original URL that threw the 302, is left in. Whereas 301s result in the new redirect target URL replacing the original in the search index.
By filling up your sites with nested 302s (following this to it's conclusion, in ~10 years time), is not only a management headache, but may fall foul of Google (I'm not sure nested 302s, will send positive signals to Google) and result in your whole site being de-indexed.
So I'm aware of at least BBC using this approach, where opening really old articles reveals they have their 90s site still up. I'm also aware of at my last employer, an unmaintained wiki with open security issues that nonetheless had vital information for still in use legacy internal software was replaced by saved static HTML grabs so the information wasn't lost.
But for a lot of medium sized companies with dynamic websites, this isn't always practical. They may not have the know how to dump their 2000s drupal install to static HTML files, and don't have the IT staff to upgrade and secure it.
I wish people would would change their language regarding tech debt. The companies of your second paragraph choose not to upgrade and secure their websites. It's not something unavoidable.
They point your parent poster is making is that these companies instead choose to shut the website down, which is a legitimate alternative to updating and securing it -- unlike just leaving it up without maintenance, which is not.
Have to agree with all of the disagreements to this. The 404 is a 404 for a reason, just like 301 and 302 are different for a reason. It's not uncommon though for WordPress to do things like this, or blogs for that matter. If an author changes the title or date of their post, and the URL structure is reliant on those two pieces of data, then the URL will change. The old URL is preserved in a DB and if accessed again, 301s to the newly named resource. Others, will throw a 404 and give a cutesey Levenshtein message, "did you mean x?" at which the user can decide to go to the new resource. It's all circumstantial... It shouldn't be enforced.
Re: Google and PageRank, pretty certain they've addressed this and recognize 302 and 301s and treat them the same. Previously, this was an issue.
1. Create a 404.php (or whatever your preferred back-end is)
2. mod_rewrite real 404s to serve that script.
3. In that script have a lookup table/db/file that lists all the redirects you need.
4. Extract the requested URL from the server variables.
5. Use the lookup table to find the correct URL, and issue a 302 for it to the users browser.
It's kinda seamless, and I've been doing it for years.
But man, you really have to explain how it works a bit better. At first I thought that we should redirect 404s to your website and I was: "??".
What I understood: Which each iteration of the website, you archive the old one on a specific subdomain. Then, you redirect all 404s of the new website to the old one. Like that: no link is broken.
https://4042302.org/how/
4042302.org definitely needs better explanations.
Deleted Comment
> Here’s the simplest solution I could come up with:
> 1. Serve the current site from a subdomain (e.g., 2017.ar.al)
> 2. Make my 404s into 302s that point to the previous version of the site.
> 3. If I change the site again in the future, rinse and repeat.
> I call the technique 404 to 302.
In a sense this information conveyed by the 404 page is now the immutable 'resource' that will stay permanently at that URL. Doing a redirect breaks this, it's lossy and usually a bad UX.
> www → 2017 → 1997 → Show 404 error
[1] https://4042302.org/how/
edit: maybe not, because the old site was written to assume it's running at the current site's subdomain? i guess it depends on much you've changed your URL structure since then. that thought makes me a little squeemish.
it seems like a nice approach would be to return the 404, and make your 404 page render a link that says "try an archived version?". you gotta let the user know that what they're about to see might be stale.
Deleted Comment
People who care enough about preserving history and current links probably already do that. People who don't care aren't going to start now because of this page. Especially those who have dynamic content and probably don't want to keep running a million different versions of their backend forever.
If you like something on the web then make a copy.
As a webmaster, if it's at all possible to go static (whatever your flavor of that is), then do that. A static website is easy to host and keep forever, and it's usually easy for consumers to archive too.
Not to hate on PHP, but keeping older PHP sites around securely has become a major undertaking. You can't safely run a wordpress site that hasn't been updated in 5 years because your security vulnerabilities are exposed to the wide web. If your static site generator has security flaws.. well that doesn't affect your current build artifacts and you can still run the thing in secure ways.
It was written toward the end of the BOHF's reign, when a technical specialist of the web had quite a lot of sway, when their decisions about a site's information architecture and how it was run was, if not the law, at least a very heavy hand on the till.
Those days are long past. Now Ted in Marketing wants a URL and who are you not to give it to him? I remember the pain of creating vanity top-level URLs in SharePoint 2003 because some functionary wanted them, and then they would promptly forget what they demanded. Yes, I used to use 410 Gones where appropriate.
That sort of thing has not been in our hands for quite a white, even if it is probably the best thing to do. After all, has the product URL changed? Or will it be back? Or has it been discontinued? The correct HTTP response, properly and widely used, would be very helpful in moving so much of the web forward but that is not under our control. Hasn't been for a while.
First, it unduly burdens the server, in sending multiple redirects to cover the entire search space of possible versions of a URL - for a mature site, this could be a lot of redirects. It also unduly burdens the client, in following them, and the network between the two.
Second, 302 is the wrong type of redirect to use here, because it is temporary; a well-tempered user agent will treat it as such, necessitating the same cascade of redirects on followup visits. The right way to do this is with a 301, which has a semantic of permanency, and is treated as such by user agents. But it's still the wrong thing to do.
Maintaining access to older versions of websites is, again, an entirely desirable thing to do. But if you're going to do it in a way that requires work on the server (as this design also does), you're better off just having the server maintain version information and serve the latest available page at a given URL, in a 200 response, when the URL is accessed.
from https://4042302.org/how/ :
>We use a 302 and not a 301 (permanent redirect) because we want the latest site to have the chance to override the URL in the future.
By filling up your sites with nested 302s (following this to it's conclusion, in ~10 years time), is not only a management headache, but may fall foul of Google (I'm not sure nested 302s, will send positive signals to Google) and result in your whole site being de-indexed.
Deleted Comment
But for a lot of medium sized companies with dynamic websites, this isn't always practical. They may not have the know how to dump their 2000s drupal install to static HTML files, and don't have the IT staff to upgrade and secure it.
I think there's always been a lot to be said for good, static sites, and there still is.
Re: Google and PageRank, pretty certain they've addressed this and recognize 302 and 301s and treat them the same. Previously, this was an issue.
The actual solution is put in the work and redirect the old missing page to a relevant new one.
If I had a link from vogue the BBC etc back in 1996 point to a product page to want to redirect that now broken link with a 301.