Why can't HTML alone do includes?

This was the rabbit hole that I started down in the late 90s and still haven’t come out of. I was the webmaster of the Analog Science Fiction website and I was building tons of static pages, each with the same header and side bar. It drove me nuts. So I did some research and found out about Apache server side includes. Woo hoo! Keeping it DRY (before I knew DRY was a thing).

Yeah, we’ve been solving this over and over in different ways. For those saying that iframes are good enough, they’re not. Iframes don’t expand to fit content. And server side solutions require a server. Why not have a simple client side method for this? I think it’s a valid question. Now that we’re fixing a lot of the irritation in web development, it seems worth considering.

EvanAnderson · 9 months ago

Server-side includes FTW! When a buddy and I started making "web stuff" back in the mid-90s the idea of DRY also just made sense to us.

My dialup ISP back then didn't disable using .htaccess files in the web space they provided to end users. That meant I could turn on server-side includes! Later I figured out how to enable CGI. (I even went so far as to code rudimentary webshells in Perl just so I could explore the webserver box...)

matchagaucho · 9 months ago

I've become a fan of https://htmx.org for this reason.

A small 10KB lib that augments HTML with the essential good stuff (like dynamic imports of static HTML)

HumanOstrich · 9 months ago

Seems like overkill to bring in a framework just for inlining some static html. If that's all you're doing, a self-replacing script tag is neat:

    <script>
      function includeHTML(url) {
        const s = document.currentScript
        fetch(url).then(r => r.text()).then(h => {
          s.insertAdjacentHTML('beforebegin', h)
          s.remove()
        })
      }
    </script>

...

    <script>
      includeHTML('/footer.html')
    </script>

The `script` element is replaced with the html from `/footer.html`.

gforce_de · 9 months ago

The minified version needs ~51 kilobytes (16 compressed):

  $ curl --location --silent "https://unpkg.com/htmx.org@2.0.4" | wc -c
  50917
  
  $ curl --location --silent "https://unpkg.com/htmx.org@2.0.4" | gzip --best --stdout | wc -c
  16314

unilynx · 9 months ago

> Iframes don’t expand to fit content

Actually, that was part of the original plan - https://caniuse.com/iframe-seamless

omneity · 9 months ago

I used the seamless attribute extensively in the past, it still doesn't work the way GP intended, which is to fit in the layout flow, for example to take the full width provided by the parent, or automatically resize the height (the pain of years of my career)

It worked rather like a reverse shadow DOM, allowing CSS from the parent document to leak into the child, removing borders and other visual chrome that would make it distinguishable from the host, except you still had to use fixed CSS layouts and resize it with JS.

atoav · 9 months ago

I mean in 1996s netscape you could do this (I run the server for a website that still uses this):

    <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Frameset//EN" "http://www.w3.org/TR/html4/frameset.dtd">
    <html>
      <frameset cols="1000, *">
        <frame src="FRAMESET_navigation.html" name="navigation">
        <frame src="FRAMESET_home.html" name="in">
      </frameset>
    </html>

The thing that always bugged me about frames is that they are too clever. I don't want to reload only the frame html when I rightclick and reload. Sure the idea was to cache those separately, but come on — frames and caching are meant to solve two different problems and by munching them together they somewhat sucked at solving either.

To me includes for HTML should work in the dumbest way possible. And that means: Take the text from the include and paste it where the include was and give the browser the resulting text.

If you want to cache a nav section separately because it appears the same on every page lets add a cache attribute that solves the problem independently:

  <nav cache-id="deadbeefnav666">
    <some-content></etc>
  </nav>

To tell the browser it should load the inner html or the src of that element from cache if it has it.

Now you could convince me thst the include should allow for more, but it being dumb is a feature not a bug.

lodovic · 9 months ago

Nitpick: the HTML4 spec was released in December 1997, and HTML4.01 only in December 1999 so it probably wouldn't have run in 1996s Netscape.

codr7 · 9 months ago

The optimal solution would be using a template engine to generate static documents.

JadeNB · 9 months ago

> The optimal solution would be using a template engine to generate static documents.

This helps the creator, but not the consumer, right? That is, if I visit 100 of your static documents created with a template engine, then I'll still be downloading some identical content 100 times.

keeganpoppen · 9 months ago

macros!

econ · 9 months ago

You can message the page dimensions to the parent. To do it x domain you can load the same url into the parent with the height in the #location hash. It won't refresh that way.

dimal · 9 months ago

I know it’s possible to work around it, but that’s not the point. This is such a common use case that it seems worthwhile to pave the cowpath. We’ve paved a lot of cowpaths that are far less trodden than this one. This is practically a cow superhighway.

We’ve built an industry around solving this problem. What if, for some basic web publishing use cases, we could replace a complex web framework with one new tag?

rbanffy · 9 months ago

> Woo hoo! Keeping it DRY (before I knew DRY was a thing)

I still remember the script I wrote to replace thousands (literally) slightly different headers and footers in some large websites of the 90s. How liberating to finally have that.

RenThraysk · 9 months ago

Don't Service Workers API provide this now, essentially act like a in-browser proxy to the server.

https://developer.mozilla.org/en-US/docs/Web/API/Service_Wor...

bradly · 9 months ago

Rational or not, some of us try very hard to avoid JavaScript based solutions.

fooker · 9 months ago

> Why not have a simple client side method for this?

Like writing a line of js?

sbarre · 9 months ago

A line of JS that has to run through the Javascript interpreter in your browser rather than a simple I/O operation?

If internally this gets optimized to a simple I/O operation (which it should) then why add the JS indirection in the first place?

DemocracyFTW2 · 9 months ago

The difference between "a line of JS" and a standardized declarative solution is of course that a meek "line of $turing_complete_language" can not, in the general case, be known and trusted to do what it purports to do, and nothing else; you've basically enabled any kind of computation, and any kind of behavior. With an include tag or attribute that's different; it's behavior is described by standards, and (except for knowing what content we might be pulling in) we can 100% tell the effects from static analysis, that is, without executing the code. With "a line of JS" the only way, in the general case, to know what it does is to run it (an infinite number of times). Also, because it's not standardized, it's much harder to save to disk, to index and to archive it.

rbanffy · 9 months ago

A block of in-line JavaScript stops the renderer until it runs because its output cannot be determined before it completes.

api · 9 months ago

The web seems like it was deliberately designed to make any form of composability impossible. It’s one of the worst things about it as a platform.

I’m sure some purist argument has driven this somewhere.

PaulHoule · 9 months ago

I think of all the “hygienic macro” sorts of problems. You really ought to be able to transclude a chunk of HTML and the associated CSS into another document but you have to watch out for ‘id’ being unique never mind the same names being used for CSS classes. Figuring out the rendering intent for CSS could also be complicated: the guest CSS might be written like

   .container .style { … }

Where the container is basically the whole guest document but you still want those rules to apply…. Maybe, you want the guest text to appear in the same font as the host document but you still want colors and font weights to apply. Maybe you want to make the colors muted to be consistent with the host document, maybe the background of the host document is different and the guest text isn’t contrasts enough anymore, etc.

giantrobot · 9 months ago

I look back longingly at the promise of XML services in the early days of Web 2.0. Before the term just meant JavaScript everywhere.

All sorts of data could be linked together to display or remix by user agents.

luotuoshangdui · 9 months ago

HTML is a markup language, not a programming language. It's like asking why Markdown can't handle includes. Some Markdown editors support them (just like some server-side tools do for HTML), but not all.

franga2000 · 9 months ago

Including another document is much closer to a markup operation than a programming operation. We already include styles, scripts, images, videos, fonts...why not document fragments?

Markdown can't do most of those, so it makes more sense why it doesn't have includes, but I'd still argue it definitely should. I generally dislike LaTeX, but about the only thing I liked about it when writing my thesis was that I could have each chapter in its own file and just include all of them in the main file.

dimal · 9 months ago

This isn’t programming. It’s transclusion[0]. Essentially, iframes and images are already forms of transclusion, so why not transclude html and have the iframe expand to fit the content?

As I wrote that, I realized there could be cumulative layout shift, so that’s an argument against. To avoid that, the browser would have to download all transcluded content before rendering. In the past, this would have been a dealbreaker, but maybe it’s more feasible now with http multiplexing.

[0] https://en.m.wikipedia.org/wiki/Transclusion#Client-side_HTM...

lenkite · 9 months ago

Well, asciidoc - a markup language supports includes, so the "markup languages" analogy doesn't hold.

https://docs.asciidoctor.org/asciidoc/latest/directives/incl...

paulddraper · 9 months ago

That’s the Hyper part of HTML, and what makes it special.

It’s made to pull in external resources (as opposed to other document formats like PDF).

Scripts, stylesheets, images, objects, favicons, etc. HTML is thematically similar.

crazygringo · 9 months ago

I think this is the most likely answer.

I'm not defending it, because when I started web development this was one of the first problems I ran into as well -- how the heck do you include a common header.

But the original concept of HTML was standalone documents, not websites with reusable components like headers and footers and navbars.

That being said, I still don't understand why then the frames monstrosity was invented, rather than a basic include. To save on bandwidth or something?

actinium226 · 9 months ago

Markdown doesn't have this common HTML pattern of wanting to include a header/footer in all pages of a site.

The feature proposal was called HTML Imports [1], created as part of the Web Components effort.

> HTML Imports are a way to include and reuse HTML documents in other HTML documents

There were plans for <template> tag support and everything.

If I remember correctly, Google implemented the proposed spec in Blink but everyone else balked for various reasons. Mozilla was concerned with the complexity of the implementation and its security implications, as well as the overlap with ES6 modules. Without vendor support, the proposal was officially discontinued.

[1] https://www.w3.org/TR/html-imports/

xg15 · 9 months ago

That matches with the comment [1] on the article, citing insufficient demand, no vendor enthusiasm, etc.

The thing is that all those are non-reasons that don't really explain anything: Low demand is hard to believe if this feature is requested for 20 years straight and there are all kinds of shim implementations using scripts, backend engines, etc. (And low demand didn't stop other features that the vendors were interested in for their own reasons)

Vendor refusal also doesn't explain why they refused it, even to the point of rolling back implementations that already existed.

So I'd be interested to understand the "various reasons" in more detail.

"Security implications" also seem odd as you already are perfectly able to import HTML cross origin using script tags. Why is importing a script that does document.write() fine, but a HTML tag that does exactly the same thing hugely problematic?

(I understand the security concern that you wouldn't want to allow something like "<import src=google.com>" and get an instant clone of the Google homepage. But that issue seems trivially solvable with CORS.)

[1] https://frontendmasters.com/blog/seeking-an-answer-why-cant-...

athrowaway3z · 9 months ago

That is a bit of a large ask.

There are various specs/semantics you can choose, which prescribe the implementation & required cross-cutting complexity. Security is only relevant in some of them.

To give you some idea:

- HTML load ordering is a pretty deeply held assumption. People understand JS can change those assumptions (document.write). Adding an obscure HTML tags that does so is going to be an endless parade of bugs & edge cases.

- To keep top-to-bottom fast we could define preload semantics (Dropping the linear req-reply, define client-cache update policy when the template changes, etc). Is that added complexity truly simpler than having the server combine templates?

- <iframe> exists

In other words, to do the simplest thing 75% of people want, requires a few lines of code. Either client side or server side.

To fit the other 25% (even to 'deny' it) is endlessly complex in ways few if any can oversee.

NoahZuniga · 9 months ago

Maybe something that adds to this low demand is that: 1. Web pages that are developed from the viewpoint of the user having JS, makes it trivial to implement something that provides the same results. 2. Web pages that are developed for user agents that don't run js, probably want to have some interaction, so already have a server runtime that can provide this feature. 2b. And if it doesn't have any user interaction, its probably a static content site, and nobody is writing content in HTML, so there already is a build step that provides this feature.

mildred593 · 9 months ago

HTML imports could not include markup within the body, it could only be used to reference template elements for custom elements

brundolf · 9 months ago

JS-first developers want something that works the same way client-side and server-side, and the mainstream front-end dev community shifted to JS-first, for better or worse

uallo · 9 months ago

HTML Imports went in a similar direction but they do not do what the blog post is about. HTML should be imported and displayed in a specific place of the document. HTML Imports could not do this without JavaScript.

See https://github.com/whatwg/html/issues/2791#issuecomment-3112... for details.

thayne · 9 months ago

To be fair, it was pretty complicated. IIRC, using it required using Javascript to instantiate the template after importing it, rather than just having something like <include src="myinclude.html">.

riedel · 9 months ago

https://caniuse.com/imports says FF even had it as a config flag

paulddraper · 9 months ago

Tbf, HTML Imports were significantly more complex than includes, which this article requests.

AtlasBarfed · 9 months ago

Frames essentially could do html import

dwheeler · 9 months ago

HTML was historically an application of SGML, and SGML could do includes. You could define a new "entity", and if you created a "system" entity, you could refer to it later and have it substituted in.

    <!DOCTYPE html example [
      <!ENTITY myheader SYSTEM "myheader.html">
    ]>
    ....
    &myheader;

SGML is complex, so various efforts were made to simplify HTML, and that's one of the capabilities that was dropped along the way.

int_19h · 9 months ago

We also had a brief detour into XML with XHTML, and XML has XInclude, although it's not a required feature.

echelon · 9 months ago

It's too bad we didn't go down the XHTML/semantic web route twenty years ago.

Strict documents, reusable types, microformats, etc. would have put search into the hands of the masses rather than kept it in Google's unique domain.

The web would have been more composible and P2P. We'd have been able to slurp first class article content, comments, contact details, factual information, addresses, etc., and built a wealth of tooling.

Google / WhatWG wanted easy to author pages (~="sloppy markup, nonstandard docs") because nobody else could "organize the web" like them if it was disorganized by default.

Once the late 2010's came to pass, Google's need for the web started to wane. They directly embed lifted facts into the search results, tried to push AMP to keep us from going to websites, etc.

Google's decisions and technologies have been designed to keep us in their funnel. Web tech has been nudged and mutated to accomplish that. It's especially easy to see when the tides change.

tannhaeuser · 9 months ago

The XML subset of SGML still includes most forms of entity usage SGML has, including external general entities as described by grandparent. XInclude can include any fragment not just a complete document, but apart from that was redundant, and what remains of XInclude in HTML today (<svg href=...>) doesnt't make use of fragments and also does away with the xinclude and other namespaces. For reusing fragments OTOH, SVG has the more specific <use href=...> construct. XInclude also really worked bad in the presence of XML Schema.

j45 · 9 months ago

Neat reference, going to look into that.

The <object> tag appears to include/embed other html pages.

An embedded HTML page:

https://www.w3schools.com/tags/tag_object.asp

nephyrin · 9 months ago

Like iframe, it "includes" a full subdocument as a block element, which isn't quite what the OP is hinting at.

jazzypants · 9 months ago

Yeah, that is just a crappier version of HTML Frames [1]

1 - https://en.m.wikipedia.org/wiki/Frame_(World_Wide_Web)

timewizard · 9 months ago

Well, that is an entire attack surface, on it's own.

https://en.wikipedia.org/wiki/Billion_laughs_attack

bawolff · 9 months ago

https://en.wikipedia.org/wiki/XML_external_entity_attack would be the more relavent link.

lkuty · 9 months ago

It existed also in DTD (Document Type Definition) used with HTML 4 and below, and XML. Came fromn SGML too I guess.

Yes it did, and there are HTML 5.x DTDs for HTML versions newer than HTML 4.x ar [1], including post-HTML 5.2 review drafts until 2023; see notes at [2].

[1]: https://sgmljs.net/docs/html5.html

[2]: https://sgmljs.net/blog/blog2303.html

throwup238 · 9 months ago

Lammy · 9 months ago

Netscape 4 has this with inflow layers — `<ILAYER SRC=included.html></ILAYER>`

https://web.archive.org/web/19970630074729fw_/http://develop...

https://web.archive.org/web/19970630094813fw_/http://develop...

masswerk · 9 months ago

As far as I'm aware of it, changing the SRC-attribute was quite crash-y and the functionality was stripped soon. (I remember playing with this in beta, and then it was gone in the production version.)

blorto · 9 months ago

I always wondered why it was called ILAYER. Ty

Null-Set · 9 months ago

The name of this feature is transclusion.

https://en.wikipedia.org/wiki/Transclusion

It was part of Project Xanadu, and originally considered to be an important feature of hypertext.

Notably, mediawiki uses transclusion extensively. It sometimes feels like the wiki is the truest form of hypertext.

jes5199 · 9 months ago

Ward Cunningham (inventor of the Wiki) spent some time trying to invent a transclusion-first wiki, where everyone had their own wiki-space and used transclusion socially https://en.wikipedia.org/wiki/Federated_Wiki

it never quite took off

I think true transclusion would be more than that.

In Xanadu you could transclude just an excerpt from one document into another document.

If you wanted to do this with HTML you need an answer for the CSS. In any particular case you can solve it, making judgements about which attributes should be consistent between the host document, the guest document and the guest-embedded-in-host. The general case, however, is unclear.

For a straightforward <include ...> tag the guest document is engineered to live inside the CSS environment (descendant of the 3rd div child of a p that has class ".rodney") that the host puts it in.

Another straightforward answer is the Shadow DOM which, for the most part, lets the guest style itself without affecting the rest of the document. I think in that case the host can still put some styles in to patch the guest.

Linux-Fan · 9 months ago

Isn't this what proper framesets (not iframes) were supposed to do a long time ago (HTML 4?). At least they autoexpanded just fine and the user could even adjust the size to their preference.

There was a lot of criticism for frames [1] but still they were successfully deployed for useful stuff like Java API documentation [2].

In my opinion the whole thing didn't stay mostly because of too little flexibility for designer: Framesets were probably well enough for useful information pages but didn't account for all the designers' needs with their bulky scrollbars and limited number of subspaces on the screen. Today it is too late to revive them because framesets as-is wouldn't probably work well on mobile...

[1] <https://www.nngroup.com/articles/why-frames-suck-most-of-the...> - I love how much of it is not applicable anymore and all of these problems mentioned with frames are present in today's web in an even nastier way?

[2] <https://www.eeng.dcu.ie/~ee553/ee402notes/html/figures/JavaD...>

johannes1234321 · 9 months ago

Issue with frame set was way more fundamental: No deep linking, thus people coming via bookmarks or Google (or predecessor) were left on a page without navigation, which people then tried working around with JavaScript, which never gave it a good experience.

Nowdays it is sometimes the other way around: Pages are all JavaScript so no good experience in the first place. I have encountered difficulty trying to get a proper “link” to something multiple times. Also, given that Browsers love to reduce/hide the address bar I wonder if it is really still that important a feature.

Of course "back then" this was an important feature and one of the reasons for getting rid of frames :)

rchaud · 9 months ago

"Includes" functionality is considered to be server-side, i.e. handled outside of the web browser. HTML is client-side, and really just a markup syntax, not a programming language.

As the article says, the problem is a solved one. The "includes" issue is how every web design student learns about PHP. In most CMSes, "includes" become "template partials" and are one of the first things explained in the documentation.

There really isn't any need to make includes available through just HTML. HTML is a presentation format and doesn't do anything interesting without CSS and JS anyway.

naasking · 9 months ago

> "Includes" functionality is considered to be server-side, i.e. handled outside of the web browser. HTML is client-side, and really just a markup syntax, not a programming language.

That's not an argument that client-side includes shouldn't happen. In fact HTML already has worse versions of this via frames and iframes. A client-side equivalent of a server-side include fits naturally into what people do with HTML.

tgv · 9 months ago

I think it feels off because an HTML file can include scripts, fonts, images, videos, styles, and probably a few other things. But not HTML. It can probably be coded with a custom element (<include src=.../>). I would be surprised if there wasn't a github repo with something similar.

benstigsen · 9 months ago

I created something like this relatively recently. The downside is of course that it requires JavaScript.

https://github.com/benstigsen/include.js

cantSpellSober · 9 months ago

Well said this is many students' intro to PHP. Why not `<include src=header.html/>` though?

Some content is already loaded asynchronously such as images, content below the fold etc.

> HTML is really just a markup syntax, not a programming language

flamebait detected :) It's a declarative language, interpreted by each browser engine separately.

gyesxnuibh · 9 months ago

What's the ML in HTML stand for? I think that's probably the crux of the argument. Are we gonna evolve it past its name?

assimpleaspossi · 9 months ago

Agree with what you said, however, HTML is a document description language and not a presentation format. CSS is for presentation (assuming you meant styling).

PaulDavisThe1st · 9 months ago

They didn't mean styling.

HTML is a markup language that identifies the functional role of bits of text. In that sense, it is there to provide information about how to present the text, and is thus a presentation format.

It is also a document description language, because almost all document description languages are also a presentation format.

c-smile · 9 months ago

> "Includes" functionality is considered to be server-side

Exactly! Include makes perfect sense on server-side.

But client-side include means that the client should be able to modify original DOM at unknown moment of time. Options are

1. at HTML parse time (before even DOM is generated). This requires synchronous request to server for the inclusion. Not desirable.

2. after DOM creation: <include src=""> (or whatever) needs to appear in the DOM, chunk loaded asynchronously and then the <include> DOM element(sic!) needs to be replaced(or how?) by external fragment. This disables any existing DOM structure validation mechanism.

Having said that...

I've implemented <include> in my Sciter engine using strategy #1. It works there as HTML in Sciter usually comes from local app resources / file system where price of issuing additional "get chunk" request is negligible.

See: https://docs.sciter.com/docs/HTML/html-include

amadeuspagel · 9 months ago

This argument applies just as much to CSS and JS. Why do they include "includes" when you can just bundle on the server?

adregan · 9 months ago

For caching and sharing resources across the whole site, I suppose.

lelanthran · 9 months ago

> As the article says, the problem is a solved one.

It's "solved" only in the sense that you need to use a programming language on the server to "solve" it. If all you are doing is static pages, it's most definitely not solved.

NorwegianDude · 9 months ago

Then you just pre-build the page before publishing it. It's way cheaper as you do the work once, instead of every client being much slower because they have to do additional requests.

hearing someone assert that

> the problem is a solved one

is a sure-fire way to know that a problem is not solved

socalgal2 · 9 months ago

There are all kind of issues with HTML include as others have pointed out

If main.html includes child/include1.html and child/include1.html has a link src="include2.html" then when the user clicks the link where does it go? If it goes to "include2.html", which by the name was meant to be included, then that page is going to be missing everything else. If it goes to main.html, how does it specify this time, use include2.html, not include1.html?

You could do the opposite, you can have article1.html, article2.html, article3.html etc, each include header.html, footer.html, navi.html. Ok, that works, but now you've make it so making a global change to the structure of your articles requires editing all articles. In other words, if you want to add comments.html to every article you have to edit all articles and you're back to wanting to generate pages from articles based on some template at which point you don't need the browser to support include.

I also suspect there would be other issues, like the header wants to know the title, or the footer wants a next/prev link, which now require some way to communicate this info between includes and you're basically back to generate the pages and include not being a solution

I think if you work though the issues you'll find an HTML include would be practically useless for most use cases.

These are all solvable issues with fairly obvious solutions. For example:

> If main.html includes child/include1.html and child/include1.html has a link src="include2.html" then when the user clicks the link where does it go? If it goes to "include2.html", which by the name was meant to be included, then that page is going to be missing everything else. If it goes to main.html, how does it specify this time, use include2.html, not include1.html?

There are two distinct use cases here: snippet reuse and embeddable self-contained islands. But the latter is already handled by iframes (the behavior being your latter case). So we only need to do the former.

> These are all solvable issues with fairly obvious solutions.

No, they are a can of worms and decades of arguments and incompatibilities and versioning

> But the latter is already handled by iframes

iframes don't handle this case because the page can not adjust to the iframe's content. There have been proposals to fix this but they always run into issues.

https://github.com/domenic/cooperatively-sized-iframes/issue...

john_the_writer · 9 months ago

The include logic of include2.html missing everything else would also apply to all other includes.

If a user clicked a link with src="include.css" then it'll be rubbish.

It would be good for static data.. images, css, and static html content.

Deleted Comment