“Make” as a static site generator (2022)

A problem with this approach is that deleting a file from source/ does not delete it from build/.

In my own projects, simply rebuilding the whole site is fast enough, so I opt to remove the whole build folder before a rebuild:

https://github.com/jez/jez.github.io/blob/source/Makefile#L1...

This defeats a big part of why you’d want a build system in the first place (incremental builds), but at least if you know the page you want to regenerate you can still `make` that file directly.

If there’s a common workaround for this pattern in makefiles I’d love to learn it.

schemescape · 3 years ago

Not sure if it’s a common pattern, but my solution to this was to always run a command that deletes all “unexpected” files, using GNU Make’s “shell” function to enumerate files and the “filter-out” function to filter out “expected” outputs. Edit: I ensure this runs every time using an ugly hack: running the command as part of variable expansion via the “shell” function.

Edit to link my Makefile: https://github.com/jaredkrinke/make-blog/blob/main/Makefile

dredmorbius · 3 years ago

File deletions and renames are common problems with many revision control / build systems.

Other than the nuclear option ("make clean"), another is to have a specific rename / remove make target, so:

  make rm sourcefile

  make mv sourcefile newsourcefile

... which will handle the deletion and renaming of both the original and generated targets.

In practice for even fairly large blog and online projects, a make clean / make all cycle is reasonably quick (seconds, perhaps minutes), and is often called for when revising templates or other elements of the site design. If you're operating at a scale where rebuild time is a concern, you probably want to be using an actual CMS (content management system) in which source is managed in a database and generated dynamically on client access.

rustybolt · 3 years ago

> If there’s a common workaround for this pattern in makefiles I’d love to learn it.

"make clean"?

dmd · 3 years ago

How does that solve the problem? That forces a total rebuild, which is exactly what he said he didn't want.

linkdd · 3 years ago

Something like this should do the trick:

  rm/%.html:
    @rm -f source/%.html build/%.html

Run with:

  $ make rm/page.html

throwaway858 · 3 years ago

The shake build system (a general-purpose build system similar-to/better-than make) has a "prune" feature for exactly this purpose:

http://neilmitchell.blogspot.com/2015/04/cleaning-stale-file...

But I think the best solution (that also works with make) is to have a "make dist" target that creates a final .tar.gz archive of the result. If the rule is written properly then it won't contain any stale files. The disadvantage is for large project it may be slow, but you are not supposed to use this rule during development (where it is useless anyway), only for releases (which still can be built incrementally -- only the final .tar.gz needs to be created from scratch)

mftrhu · 3 years ago

Not sure if anyone actually uses it, but I would approach the problem with find, comm, and a sprinkle of sed:

    comm -23 <(find build -type f -iname "*.html" -printf "%P\n" | sed 's/\.html$//' | sort) \
             <(find source -type f -iname "*.md" -printf "%P\n" | sed 's/\.md$//' | sort)

The find commands get you a list of all the files (and only files - directories will have to be removed in a separate step) in each of the build and source folder, sed chops off the extension, while comm -23 compares them, printing only the files unique to the build folder, which you can then deal with as you see fit (e.g., by feeding them to xargs rm).

hawski · 3 years ago

Using comm was exactly what I did use in my little experiment of a barebones SSG.

I did save a list of generated files and compared them. This one liner is the meat of the whole solution:

  comm -23 <(awk 'NR>1' "$DSTDIR/build-info") <(find "$SRCDIR" -name "*$EXT" -type f -printf "%P\n" | tee >(gen_index) | xargs -n1 "$0" "$SRCDIR" "$DSTDIR" | sort | tee -a "$DSTDIR/build-info.new") | (cd "$DSTDIR" && xargs rm)

Full source here: https://gist.github.com/hadrianw/060944011acfcadd889d937b960...

bogwog · 3 years ago

I think the best solution is to use something like webpack or vite or whatever. These usually have their own dev server and can watch directories for changes.

My personal site is also using a custom make-like ssg, but after spending a disproportionate amount of time writing the bundling/packaging code, I decided to just switch over to one of these tools. It’s a solved problem, and it greatly reduced the complexity of my site.

Deleted Comment

Jhsto · 3 years ago

I use Nix, so I get incremental builds and your problem goes away.

schemescape · 3 years ago

You’re using Nix to drive your static site generation? If so, please share more details because that sounds intriguing!

__MatrixMan__ · 3 years ago

I was just reading this and thinking that `nix build` would do the same trick even more nicely.

Adding a pinch of m4 [1] can give you a bit more of flexibility while sticking with the same barebones approach.

I used to maintain a small website built like that some 20 years back. But I can't see the model working today, personal websites excluded. The problem is that the approach essentially enforces Web 1.0 roles: You either need every contributing user to be html-proficient, or someone willing to assume the drudgery of being the "webmaster".

[1] https://en.wikipedia.org/wiki/M4_(computer_language)

gizmo686 · 3 years ago

There is no such thing as a "pinch of m4". You start a clean project promising that you won't touch m4 this time. Then you add a small m4 invocation to save yourself from some boilerplate.

A year later, when you are trying to figure out why all instances of the word "cat" are silently disapearing from your website, you dig through 5 layers of macro expansions to discover that a junior dev tried implementing a for loop instead of copying it from the manual and messed up the quotation marks.

Having solved the immediate issue, you decide that debbuging your DSL is too hard, so you import M4 macro file you have been copying between projects. You then spend a day replacinf all usages of 'define' with your macro-creating-macro that adds comments to the output enabling your stacktrace generation script to work.

Next project, I am putting down a hard rule: no m4! (Except for maybe that one instance)

JoelMcCracken · 3 years ago

Please write more to this story

js2 · 3 years ago

I've only ever used m4 via autoconf and sendmail configuration files, so I don't know if it's m4 that has the bizarre syntax or whether it's autoconf's and sendmail's use of it. I'm not sure I've ever tried to use m4 directly for anything.

_ache_ · 3 years ago

I know that story too well. Finally, I thought that if I have to code, I should just use a programming language.

Now, I use nodeJS to replace every m4 file with mustache.js and some JS logic and I don't feel limited anymore. The complexity doesn't increase much.

tannhaeuser · 3 years ago

Rather than relying on generic text substitution using m4 or perl or whatever, I suggest using SGML, the basis and common superset of HTML and XML, which comes with easy type checked text macro (entity) expansion for free or even type-aware parametric macro expansion. Where "type" refers to the regular content type of a markup element (ie. its allowed child elements and their expected order) but also considers expansion and escaping into attributes or other context such CDATA or RCDATA. Only SGML can extend to the case of properly expanding/escaping potentially malicious user comments with custom rules such as eg. allowing span-level markup but disallowing script elements, does markdown or other Wiki syntax expansion into HTML, can import external/syndicated HTML content, produce RSS and outlines for navigation, etc. Works well for nontrivial static site preparation tasks on the command-line; cf. linked tutorial and command line reference.

[1]: https://sgmljs.net/docs/producing-html-tutorial/producing-ht...

[2]: https://sgmljs.net/docs/sgmlproc-manual.html

layer8 · 3 years ago

What is sgmljs? There doesn’t seem to be any explanation on the site.

envsubst · 3 years ago

Instead of `m4` or `sed` find and replace, the author should try `envsubst`. It's a program that replaces bash style variable references (for example `$TITLE`) with their value in the environment.

    export CURRENT="..."
    cat page.html | envsubt

karl42 · 3 years ago

I agree that `envsubst` is a good choice for this. Unfortunately, it is not part of posix, so you can't rely on it being present everywhere. But as part of gettext, it is still very common.

teo_zero · 3 years ago

The problem is that the $SOMETHING syntax is just too common if your site is a technical one, and you'll end up substituting too much.

q3k · 3 years ago

> a pinch of m4

nononononononononono for the love of everything please no

m4 isn't even a good esolang!

PaulHoule · 3 years ago

At the dawn of the age of PHP, I created a user management system (registration, verification, admin interface, …) that was based on well-established ideas (how login worked at Yahoo, Amazon, and every other process major site) but got no traction at all as a open source project. In any language that wasn’t PHP it would be necessary to write an “authentication module” which as about 50lines of cookie handling code. Multiple times I managed to out several existing apps together and make an advanced web site.

About 10 years ago the idea suddenly got traction once it was legitimized by the SAAS fad, I would tell people “don’t you know they’re going to end the free tier or go out of business or both?” and sure enough they did.

Anyhow, I bring it up because the system used M4 to interpolate variables into PHP, other languages, shell scripts, SQL scripts, etc.

Tomte · 3 years ago

You need to combine it with Perl and a collection of other special passes, of course: https://web.archive.org/web/20180309134414/http://thewml.org...

gjvc · 3 years ago

I did that once. Never again.

Just because it worked for sendmail is not sufficient justification for anything.

SoftTalker · 3 years ago

A lot of older unix software config is complicated and cryptic.

sendmail, bind, apache, older X11, sudo are examples that come to mind.

tkb · 3 years ago

I too had a small web site with M4 around 1999/2000. Why M4? Because I'd learned enough of it to be useful/dangerous when wrestling with Sendmail, and it seemed to do the trick (at least when the trick was simply "be easier than manually editing lots of HTML files every time there's a site-wide change").

I suspect I was never doing anything complicated enough to encounter the gotchas mentioned by other commenters...

mstef · 3 years ago

haha, utterson also uses m4 for templating: https://github.com/stef/utterson/tree/master

case ${file_type} in org ) # Multiline processing of org-style header/preamble syntax, boxed # between begin/end markers we have defined. We use org-mode's own # comment line syntax to write the begin/end markers. # cf. https://orgmode.org/guide/Comment-Lines.html sed -n -E \ -e '/^\#\s+shite_meta/I,/^\#\s+shite_meta/I{/\#\s+shite_meta.*/Id; s/^\#\+(\w+)\:\s+(.*)/\L\1\E,\2/Ip}' ;; md ) # Multiline processing of Jekyll-style YAML front matter, boxed # between `---` separators. sed -n -E \ -e '/^\-{3,}/,/^\-{3,}/{/^\-{3,}.*/d; s/^(\w+)\:\s+(.*)/\L\1\E,\2/Ip}' ;; html ) # Use HTML meta tags and parse them, according to this convention: # <meta name="KEY" content="VALUE"> # cf. https://developer.mozilla.org/en-US/docs/Learn/HTML/Introduction_to_HTML/The_head_metadata_in_HTML sed -n -E \ -e 's;^\s?<meta\s+name="?(\w+)"?\s+content="(.*)">;\L\1\E,\2;Ip' ;; esac

p4bl0 · 3 years ago

My personal website (https://pablo.rauzy.name/) used to be generated using a simple Makefile.

Then I added features like news and an RSS feed, a way to automatically list my research publications and course materials, a list of books filterable with tags, etc. So now it still is a Makefile but the Makefile itself is a bit simpler than it used to be, but it calls a few Bash scripts that in particular make use of the awesome xml2 and 2xml utilities to be able to manipulate HTML in a line-oriented manner using the core utils (grep and sed mostly).

On top of that I have a few git hooks that call make automatically when needed, in particular on the remote server where the website is hosted so that the public version is rebuilt when I push updates the repository there.

It's been working like a charm for years! My git history goes back to 2009.

EDIT: I just had a look at the first commits…

    beccad7 (FIRST_VERSION) Initial commit
    d1cc6d7 adding link to Google Reader shared items
    6ccfd0c fix typo
    d337959 adding link to Identi.ca account

… 15 years have passed indeed.

e12e · 3 years ago

> xml2 and 2xml utilities

Seems somewhat abandoned?

https://github.com/cryptorick/xml2

https://manpages.debian.org/unstable/xml2/2xml.1.en.html

naniwaduni · 3 years ago

One of the nifty parts about having a static site generated offline from trusted inputs is that it doesn't matter whether the generator components are "abandoned" or complete.

enriquto · 3 years ago

s/abandoned/crystalized/g

account42 · 3 years ago

FFS not everything needs constant churn.

If converting markup to/from line format is your thing to put awk, perl, and other line-oriented tools to use, there's also the ESIS format understood by traditional SGML tools and used by SGML formal test suites even.

withinboredom · 3 years ago

Maybe it’s just mature? Sometimes projects actually become “finished” and don’t need any updating.

jez · 3 years ago

bradley_taunt · 3 years ago

I was instantly inspired by Karl's work on his "blog.sh" shell script[0] that he mentions in this article. I took it and tweaked it to create my own minimalist SSG called "barf"[1]. That wouldn't exist if Karl didn't share his wonderful work publicly!

[0]: https://github.com/karlb/karl.berlin/blob/master/blog.sh [1]: https://barf.bt.ht

adityaathalye · 3 years ago

Ah, a fellow person of culture. Mine is called shite [1], which makes my site [2]. The name alludes to the software quality :)

What I like most about it is I haven't had to upgrade anything, and don't expect to forever. And a close second; it "hot reloads" without javascript.

[1] https://github.com/adityaathalye/shite

[2] https://evalapply.org

gnyman · 3 years ago

Haha I sense a trend for these home grown static site generators :-)

Yours are much more advanced, but a few years back I made a minimal PHP static page generator and named it...

PHP keep It Stupid Simple, or in short P.I.S.S.

https://blog.nyman.re/2020/10/11/introducing-piss-a.html

m000 · 3 years ago

rambambram · 3 years ago

I like it that (almost) every dev blog I come across on HN has an RSS feed.

For every interesting article that I read here I follow the feed. Whether you have a Wordpress site, a Bear Blog, a Micro blog, a blog on Havenweb, or a feed on your self-built site, I add them to the 'Really Social Sites' module of Hey Homepage.

Ultimately, I would like to publish this list of blogs, just like Kagi now does with their Small Web initiative. But I guess curating is key to adding quality. And when I think about curating, starting some kind of online magazine seems only natural.

MuffinFlavored · 3 years ago

I'm trying to understand (as a dev) if there is something "wrong with me" for not wanting to have my own blog. Where do people get the "entitlement" (I mean that in the best way possible) to share with other people/assume other people care what they are working on? It feels like a competition sometimes. "I need to work on something as cool as possible so I'll get some likes/impressions on my blog".

Collaboration is obviously cool and only works with making it all public, I just don't know where "I'm doing this because I think it's cool" and "I'm going to put effort in to share it with others to get reactions"

jaw · 3 years ago

I have a blog, but I mostly assume people _don't_ care what I'm doing or thinking. Some of my posts have probably never been read by anybody. I still personally find it worthwhile for a few reasons:

- The mere possibility that someone will see it pushes me to put more thought and effort into what I write. Sometimes this reveals weaknesses in my ideas that I would have glossed over if I were just writing private notes for myself; sometimes it leads me to actually change my opinions. It also means the blog posts are easier for me to understand / get value out of than notes are if I come back and reread them years later.

- It creates opportunities for people to connect with me which can pay off at unexpected times. Occasionally people have reached out to me to say a post helped them or resonated with them, or to give a thoughtful reply or ask a question. Those sorts of interactions are really satisfying even if they're rare. (One time, I was interviewing for a dev job and the interviewer asked a question about a post I'd written on the philosophy of John Rawls, and how it could connect to software engineering. I found that absolutely delightful.)

- It's just nice to have an outlet when I feel like writing about something.

rakoo · 3 years ago

I don't have a blog myself but am this close to creating one.

Some guy said that it's a progression.

You start using the web by being a casual reader. At some point you get more comfortable in public spaces and start replying small comments like you would reply to someone afk.

Then you start reading more and more about specific subjects, amassing knowledge, and your replies have more content. They start being organized. They have a structure, to guide future readers and show them how you came up with your conclusion. They have links to sources. They leave open doors for the parts you don't know.

Then you start writing more and more comments, with more and more content, as a result of your experience.

Then comes a moment where you realize you're going to write the same thing for the nth time, and being a good engineer with a focus on DRY, you want to write your thoughts once and for all and link to it every time. This is the moment you start writing a blog that you actually maintain: you write not because you feel the need to write more, but because you want to write less and direct people to it rather than repeating yourself.

I don't think there's something wrong with you. I also think there's nothing wrong with people sharing _interesting_ stuff, whether they do it ultimately for shallow likes or for ... you know... just sharing _interesting_ stuff.

On a side note, I get the "entitlement" from nobody. I take it. I also mean that in the best way possible. Nobody's asking for my software, my (future) articles, my point of view, etc. Still, I make stuff and sometimes share stuff. I think it can be a net value for some people (definitely not for everyone). This is only the reasoning behind it, the main motivator was me realizing I matter as a human being and I have only one life to live. I learned that because of experiencing a 'dark night of the soul' a couple of years back. Luckily I got through. And to be honest, if it wasn't for the internet - made up of personal websites and real people sharing their own experience on forums - that taught me everything there is to know about Cluster B disordered personalities (just an example, cough nothing personal cough), I don't think I would be sitting here typing this lengthy response.

I realized I can not sit back, enjoy the decline of the internet, and only complain about it. I would love to see the web have a lot of personal websites and blogs about every kind of subject, so I started to build a website software. The web/internet, and all the information shared and made easily accessible, made me able to save myself. I was probably helped more by some random dude who put up a website fifteen years ago with everything he knew about certain stuff than I was helped by anything else.

mixmastamyk · 3 years ago

Odd take. If you spend several hours figuring something out, it’s quite neighborly to write it up for the next person. “Shoulders of giants” and all that.

I’m certainly grateful for their help, and even written up a few of my own.

eep_social · 3 years ago

I think the bloggers are a classic vocal minority, nothing to feel weird about.

calderknight · 3 years ago

"WWW - let's share what we know"

qudat · 3 years ago

There’s also https://prose.sh which is similar to bear blog.

Sounds familiar, I might have seen it here on HN. I like their 'Discover' page with an overview of interesting posts from others!

marcodiego · 3 years ago

A friend of mine described using make to generate scientific papers. He explained that if he changed a single test file, the entire paper could be regenerated including running tests and generating graphs the changed test with a single command.

danielvaughn · 3 years ago

It's a neat idea, though I have to point out that if you're already pushing to Github, you could just push the source and Github will publish your markdown as a hosted page: https://pages.github.com/

But that makes you dependent on GitHub for more than just dumb hosting - better make sure you can run the site generation locally from the start.

Fair point, the makefile is nice and portable.

I love the code [1]. Mine [2] is a bit over engineered because I wanted hot-reloading (without JS), and it was a delightful yak shave.

But the basic idea is the same --- heredocs for templating, using a plaintext -> html compiler (pandoc in my case), an intermediate CSV for index generation. Also some handy sed-fu [3] to lift out front matter. Classic :)

Very nice!

[1] https://github.com/karlb/karl.berlin/blob/master/blog.sh

[2] https://github.com/adityaathalye/shite

[3] I'm doing this: https://github.com/adityaathalye/shite/blob/master/bin/templ...

rcarmo · 3 years ago

I found his GEMINI approach quite funny - it strips out most of the formatting with a regexp.

There is a bit of a limitation, though - I organize posts by namespace and with the date in the URL, and make can’t really handle that directly.

> I found his GEMINI approach quite funny - it strips out most of the formatting with a regexp.

Do you mean the regexp in https://github.com/karlb/karl.berlin/blob/master/blog.sh#L4 ? It doesn't remove the formatting, just HTML comments (because they would show up on the page, otherwise) and rel="me" attributes (because they don't work with md2gemini). Feel free to read the blog post about adding Gemini support for more details: https://www.karl.berlin/gemini-blog.html

Huh, I previously skim-read the code and didn't notice the GEMINI regex detail. I wonder why they're doing that.

Re: namespace organisation. I thought about that a lot, and decided to adopt namespace-only convention for symmetry between text file layout, html file layout, and url scheme.

I've treated Date/time as metadata, which I can use to organise index pages. If I get to years worth of posts, then I'll group them by year/month or something reasonable. Likewise tags. I debated tags _and_ categories. But I decided on "everything is a post with tags, and categories will emerge based on topical coverage + post format".

maccard · 3 years ago

Seeing these sorts of scripts is exactly why we don't write our own, and use something like esbuild and vite.

Well, I have my reasons and you have yours!

For example,

A) Most importantly, I wanted to tinker and have fun!

B) I already use Bash at work and stuff, so it's easy for me.

C) I am generally averse to fast-changing dependencies, and giant dependency trees, so that rules out most scripting languages.

Besides, if you peruse the README, you will see that my code guarantee is "works on my machine". Your mileage will vary tremendously :)