Then I added features like news and an RSS feed, a way to automatically list my research publications and course materials, a list of books filterable with tags, etc. So now it still is a Makefile but the Makefile itself is a bit simpler than it used to be, but it calls a few Bash scripts that in particular make use of the awesome xml2 and 2xml utilities to be able to manipulate HTML in a line-oriented manner using the core utils (grep and sed mostly).
On top of that I have a few git hooks that call make automatically when needed, in particular on the remote server where the website is hosted so that the public version is rebuilt when I push updates the repository there.
It's been working like a charm for years! My git history goes back to 2009.
EDIT: I just had a look at the first commits…
beccad7 (FIRST_VERSION) Initial commit
d1cc6d7 adding link to Google Reader shared items
6ccfd0c fix typo
d337959 adding link to Identi.ca account
One of the nifty parts about having a static site generated offline from trusted inputs is that it doesn't matter whether the generator components are "abandoned" or complete.
If converting markup to/from line format is your thing to put awk, perl, and other line-oriented tools to use, there's also the ESIS format understood by traditional SGML tools and used by SGML formal test suites even.
This defeats a big part of why you’d want a build system in the first place (incremental builds), but at least if you know the page you want to regenerate you can still `make` that file directly.
If there’s a common workaround for this pattern in makefiles I’d love to learn it.
Not sure if it’s a common pattern, but my solution to this was to always run a command that deletes all “unexpected” files, using GNU Make’s “shell” function to enumerate files and the “filter-out” function to filter out “expected” outputs. Edit: I ensure this runs every time using an ugly hack: running the command as part of variable expansion via the “shell” function.
File deletions and renames are common problems with many revision control / build systems.
Other than the nuclear option ("make clean"), another is to have a specific rename / remove make target, so:
make rm sourcefile
or
make mv sourcefile newsourcefile
... which will handle the deletion and renaming of both the original and generated targets.
In practice for even fairly large blog and online projects, a make clean / make all cycle is reasonably quick (seconds, perhaps minutes), and is often called for when revising templates or other elements of the site design. If you're operating at a scale where rebuild time is a concern, you probably want to be using an actual CMS (content management system) in which source is managed in a database and generated dynamically on client access.
But I think the best solution (that also works with make) is to have a "make dist" target that creates a final .tar.gz archive of the result. If the rule is written properly then it won't contain any stale files. The disadvantage is for large project it may be slow, but you are not supposed to use this rule during development (where it is useless anyway), only for releases (which still can be built incrementally -- only the final .tar.gz needs to be created from scratch)
Not sure if anyone actually uses it, but I would approach the problem with find, comm, and a sprinkle of sed:
comm -23 <(find build -type f -iname "*.html" -printf "%P\n" | sed 's/\.html$//' | sort) \
<(find source -type f -iname "*.md" -printf "%P\n" | sed 's/\.md$//' | sort)
The find commands get you a list of all the files (and only files - directories will have to be removed in a separate step) in each of the build and source folder, sed chops off the extension, while comm -23 compares them, printing only the files unique to the build folder, which you can then deal with as you see fit (e.g., by feeding them to xargs rm).
I think the best solution is to use something like webpack or vite or whatever. These usually have their own dev server and can watch directories for changes.
My personal site is also using a custom make-like ssg, but after spending a disproportionate amount of time writing the bundling/packaging code, I decided to just switch over to one of these tools. It’s a solved problem, and it greatly reduced the complexity of my site.
I was instantly inspired by Karl's work on his "blog.sh" shell script[0] that he mentions in this article. I took it and tweaked it to create my own minimalist SSG called "barf"[1]. That wouldn't exist if Karl didn't share his wonderful work publicly!
Adding a pinch of m4 [1] can give you a bit more of flexibility while sticking with the same barebones approach.
I used to maintain a small website built like that some 20 years back. But I can't see the model working today, personal websites excluded. The problem is that the approach essentially enforces Web 1.0 roles: You either need every contributing user to be html-proficient, or someone willing to assume the drudgery of being the "webmaster".
There is no such thing as a "pinch of m4". You start a clean project promising that you won't touch m4 this time. Then you add a small m4 invocation to save yourself from some boilerplate.
A year later, when you are trying to figure out why all instances of the word "cat" are silently disapearing from your website, you dig through 5 layers of macro expansions to discover that a junior dev tried implementing a for loop instead of copying it from the manual and messed up the quotation marks.
Having solved the immediate issue, you decide that debbuging your DSL is too hard, so you import M4 macro file you have been copying between projects. You then spend a day replacinf all usages of 'define' with your macro-creating-macro that adds comments to the output enabling your stacktrace generation script to work.
Next project, I am putting down a hard rule: no m4! (Except for maybe that one instance)
I've only ever used m4 via autoconf and sendmail configuration files, so I don't know if it's m4 that has the bizarre syntax or whether it's autoconf's and sendmail's use of it. I'm not sure I've ever tried to use m4 directly for anything.
Rather than relying on generic text substitution using m4 or perl or whatever, I suggest using SGML, the basis and common superset of HTML and XML, which comes with easy type checked text macro (entity) expansion for free or even type-aware parametric macro expansion. Where "type" refers to the regular content type of a markup element (ie. its allowed child elements and their expected order) but also considers expansion and escaping into attributes or other context such CDATA or RCDATA. Only SGML can extend to the case of properly expanding/escaping potentially malicious user comments with custom rules such as eg. allowing span-level markup but disallowing script elements, does markdown or other Wiki syntax expansion into HTML, can import external/syndicated HTML content, produce RSS and outlines for navigation, etc. Works well for nontrivial static site preparation tasks on the command-line; cf. linked tutorial and command line reference.
Instead of `m4` or `sed` find and replace, the author should try `envsubst`. It's a program that replaces bash style variable references (for example `$TITLE`) with their value in the environment.
I agree that `envsubst` is a good choice for this. Unfortunately, it is not part of posix, so you can't rely on it being present everywhere. But as part of gettext, it is still very common.
At the dawn of the age of PHP, I created a user management system (registration, verification, admin interface, …) that was based on well-established ideas (how login worked at Yahoo, Amazon, and every other process major site) but got no traction at all as a open source project. In any language that wasn’t PHP it would be necessary to write an “authentication module” which as about 50lines of cookie handling code. Multiple times I managed to out several existing apps together and make an advanced web site.
About 10 years ago the idea suddenly got traction once it was legitimized by the SAAS fad, I would tell people “don’t you know they’re going to end the free tier or go out of business or both?” and sure enough they did.
Anyhow, I bring it up because the system used M4 to interpolate variables into PHP, other languages, shell scripts, SQL scripts, etc.
I too had a small web site with M4 around 1999/2000. Why M4? Because I'd learned enough of it to be useful/dangerous when wrestling with Sendmail, and it seemed to do the trick (at least when the trick was simply "be easier than manually editing lots of HTML files every time there's a site-wide change").
I suspect I was never doing anything complicated enough to encounter the gotchas mentioned by other commenters...
I like it that (almost) every dev blog I come across on HN has an RSS feed.
For every interesting article that I read here I follow the feed. Whether you have a Wordpress site, a Bear Blog, a Micro blog, a blog on Havenweb, or a feed on your self-built site, I add them to the 'Really Social Sites' module of Hey Homepage.
Ultimately, I would like to publish this list of blogs, just like Kagi now does with their Small Web initiative. But I guess curating is key to adding quality. And when I think about curating, starting some kind of online magazine seems only natural.
I'm trying to understand (as a dev) if there is something "wrong with me" for not wanting to have my own blog. Where do people get the "entitlement" (I mean that in the best way possible) to share with other people/assume other people care what they are working on? It feels like a competition sometimes. "I need to work on something as cool as possible so I'll get some likes/impressions on my blog".
Collaboration is obviously cool and only works with making it all public, I just don't know where "I'm doing this because I think it's cool" and "I'm going to put effort in to share it with others to get reactions"
I have a blog, but I mostly assume people _don't_ care what I'm doing or thinking. Some of my posts have probably never been read by anybody. I still personally find it worthwhile for a few reasons:
- The mere possibility that someone will see it pushes me to put more thought and effort into what I write. Sometimes this reveals weaknesses in my ideas that I would have glossed over if I were just writing private notes for myself; sometimes it leads me to actually change my opinions. It also means the blog posts are easier for me to understand / get value out of than notes are if I come back and reread them years later.
- It creates opportunities for people to connect with me which can pay off at unexpected times. Occasionally people have reached out to me to say a post helped them or resonated with them, or to give a thoughtful reply or ask a question. Those sorts of interactions are really satisfying even if they're rare. (One time, I was interviewing for a dev job and the interviewer asked a question about a post I'd written on the philosophy of John Rawls, and how it could connect to software engineering. I found that absolutely delightful.)
- It's just nice to have an outlet when I feel like writing about something.
I don't have a blog myself but am this close to creating one.
Some guy said that it's a progression.
You start using the web by being a casual reader. At some point you get more comfortable in public spaces and start replying small comments like you would reply to someone afk.
Then you start reading more and more about specific subjects, amassing knowledge, and your replies have more content. They start being organized. They have a structure, to guide future readers and show them how you came up with your conclusion. They have links to sources. They leave open doors for the parts you don't know.
Then you start writing more and more comments, with more and more content, as a result of your experience.
Then comes a moment where you realize you're going to write the same thing for the nth time, and being a good engineer with a focus on DRY, you want to write your thoughts once and for all and link to it every time. This is the moment you start writing a blog that you actually maintain: you write not because you feel the need to write more, but because you want to write less and direct people to it rather than repeating yourself.
I don't think there's something wrong with you. I also think there's nothing wrong with people sharing _interesting_ stuff, whether they do it ultimately for shallow likes or for ... you know... just sharing _interesting_ stuff.
On a side note, I get the "entitlement" from nobody. I take it. I also mean that in the best way possible. Nobody's asking for my software, my (future) articles, my point of view, etc. Still, I make stuff and sometimes share stuff. I think it can be a net value for some people (definitely not for everyone). This is only the reasoning behind it, the main motivator was me realizing I matter as a human being and I have only one life to live. I learned that because of experiencing a 'dark night of the soul' a couple of years back. Luckily I got through. And to be honest, if it wasn't for the internet - made up of personal websites and real people sharing their own experience on forums - that taught me everything there is to know about Cluster B disordered personalities (just an example, cough nothing personal cough), I don't think I would be sitting here typing this lengthy response.
I realized I can not sit back, enjoy the decline of the internet, and only complain about it. I would love to see the web have a lot of personal websites and blogs about every kind of subject, so I started to build a website software. The web/internet, and all the information shared and made easily accessible, made me able to save myself. I was probably helped more by some random dude who put up a website fifteen years ago with everything he knew about certain stuff than I was helped by anything else.
Odd take. If you spend several hours figuring something out, it’s quite neighborly to write it up for the next person. “Shoulders of giants” and all that.
I’m certainly grateful for their help, and even written up a few of my own.
A friend of mine described using make to generate scientific papers. He explained that if he changed a single test file, the entire paper could be regenerated including running tests and generating graphs the changed test with a single command.
It's a neat idea, though I have to point out that if you're already pushing to Github, you could just push the source and Github will publish your markdown as a hosted page: https://pages.github.com/
I love the code [1]. Mine [2] is a bit over engineered because I wanted hot-reloading (without JS), and it was a delightful yak shave.
But the basic idea is the same --- heredocs for templating, using a plaintext -> html compiler (pandoc in my case), an intermediate CSV for index generation. Also some handy sed-fu [3] to lift out front matter. Classic :)
case ${file_type} in
org )
# Multiline processing of org-style header/preamble syntax, boxed
# between begin/end markers we have defined. We use org-mode's own
# comment line syntax to write the begin/end markers.
# cf. https://orgmode.org/guide/Comment-Lines.html
sed -n -E \
-e '/^\#\s+shite_meta/I,/^\#\s+shite_meta/I{/\#\s+shite_meta.*/Id; s/^\#\+(\w+)\:\s+(.*)/\L\1\E,\2/Ip}'
;;
md )
# Multiline processing of Jekyll-style YAML front matter, boxed
# between `---` separators.
sed -n -E \
-e '/^\-{3,}/,/^\-{3,}/{/^\-{3,}.*/d; s/^(\w+)\:\s+(.*)/\L\1\E,\2/Ip}'
;;
html )
# Use HTML meta tags and parse them, according to this convention:
# <meta name="KEY" content="VALUE">
# cf. https://developer.mozilla.org/en-US/docs/Learn/HTML/Introduction_to_HTML/The_head_metadata_in_HTML
sed -n -E \
-e 's;^\s?<meta\s+name="?(\w+)"?\s+content="(.*)">;\L\1\E,\2;Ip'
;;
esac
Huh, I previously skim-read the code and didn't notice the GEMINI regex detail. I wonder why they're doing that.
Re: namespace organisation. I thought about that a lot, and decided to adopt namespace-only convention for symmetry between text file layout, html file layout, and url scheme.
I've treated Date/time as metadata, which I can use to organise index pages. If I get to years worth of posts, then I'll group them by year/month or something reasonable. Likewise tags. I debated tags _and_ categories. But I decided on "everything is a post with tags, and categories will emerge based on topical coverage + post format".
Then I added features like news and an RSS feed, a way to automatically list my research publications and course materials, a list of books filterable with tags, etc. So now it still is a Makefile but the Makefile itself is a bit simpler than it used to be, but it calls a few Bash scripts that in particular make use of the awesome xml2 and 2xml utilities to be able to manipulate HTML in a line-oriented manner using the core utils (grep and sed mostly).
On top of that I have a few git hooks that call make automatically when needed, in particular on the remote server where the website is hosted so that the public version is rebuilt when I push updates the repository there.
It's been working like a charm for years! My git history goes back to 2009.
EDIT: I just had a look at the first commits…
… 15 years have passed indeed.Seems somewhat abandoned?
https://github.com/cryptorick/xml2
https://manpages.debian.org/unstable/xml2/2xml.1.en.html
In my own projects, simply rebuilding the whole site is fast enough, so I opt to remove the whole build folder before a rebuild:
https://github.com/jez/jez.github.io/blob/source/Makefile#L1...
This defeats a big part of why you’d want a build system in the first place (incremental builds), but at least if you know the page you want to regenerate you can still `make` that file directly.
If there’s a common workaround for this pattern in makefiles I’d love to learn it.
Edit to link my Makefile: https://github.com/jaredkrinke/make-blog/blob/main/Makefile
Other than the nuclear option ("make clean"), another is to have a specific rename / remove make target, so:
or ... which will handle the deletion and renaming of both the original and generated targets.In practice for even fairly large blog and online projects, a make clean / make all cycle is reasonably quick (seconds, perhaps minutes), and is often called for when revising templates or other elements of the site design. If you're operating at a scale where rebuild time is a concern, you probably want to be using an actual CMS (content management system) in which source is managed in a database and generated dynamically on client access.
"make clean"?
http://neilmitchell.blogspot.com/2015/04/cleaning-stale-file...
But I think the best solution (that also works with make) is to have a "make dist" target that creates a final .tar.gz archive of the result. If the rule is written properly then it won't contain any stale files. The disadvantage is for large project it may be slow, but you are not supposed to use this rule during development (where it is useless anyway), only for releases (which still can be built incrementally -- only the final .tar.gz needs to be created from scratch)
I did save a list of generated files and compared them. This one liner is the meat of the whole solution:
Full source here: https://gist.github.com/hadrianw/060944011acfcadd889d937b960...My personal site is also using a custom make-like ssg, but after spending a disproportionate amount of time writing the bundling/packaging code, I decided to just switch over to one of these tools. It’s a solved problem, and it greatly reduced the complexity of my site.
Deleted Comment
[0]: https://github.com/karlb/karl.berlin/blob/master/blog.sh [1]: https://barf.bt.ht
What I like most about it is I haven't had to upgrade anything, and don't expect to forever. And a close second; it "hot reloads" without javascript.
[1] https://github.com/adityaathalye/shite
[2] https://evalapply.org
Yours are much more advanced, but a few years back I made a minimal PHP static page generator and named it...
PHP keep It Stupid Simple, or in short P.I.S.S.
https://blog.nyman.re/2020/10/11/introducing-piss-a.html
I used to maintain a small website built like that some 20 years back. But I can't see the model working today, personal websites excluded. The problem is that the approach essentially enforces Web 1.0 roles: You either need every contributing user to be html-proficient, or someone willing to assume the drudgery of being the "webmaster".
[1] https://en.wikipedia.org/wiki/M4_(computer_language)
A year later, when you are trying to figure out why all instances of the word "cat" are silently disapearing from your website, you dig through 5 layers of macro expansions to discover that a junior dev tried implementing a for loop instead of copying it from the manual and messed up the quotation marks.
Having solved the immediate issue, you decide that debbuging your DSL is too hard, so you import M4 macro file you have been copying between projects. You then spend a day replacinf all usages of 'define' with your macro-creating-macro that adds comments to the output enabling your stacktrace generation script to work.
Next project, I am putting down a hard rule: no m4! (Except for maybe that one instance)
Now, I use nodeJS to replace every m4 file with mustache.js and some JS logic and I don't feel limited anymore. The complexity doesn't increase much.
[1]: https://sgmljs.net/docs/producing-html-tutorial/producing-ht...
[2]: https://sgmljs.net/docs/sgmlproc-manual.html
nononononononononono for the love of everything please no
m4 isn't even a good esolang!
About 10 years ago the idea suddenly got traction once it was legitimized by the SAAS fad, I would tell people “don’t you know they’re going to end the free tier or go out of business or both?” and sure enough they did.
Anyhow, I bring it up because the system used M4 to interpolate variables into PHP, other languages, shell scripts, SQL scripts, etc.
Just because it worked for sendmail is not sufficient justification for anything.
sendmail, bind, apache, older X11, sudo are examples that come to mind.
I suspect I was never doing anything complicated enough to encounter the gotchas mentioned by other commenters...
For every interesting article that I read here I follow the feed. Whether you have a Wordpress site, a Bear Blog, a Micro blog, a blog on Havenweb, or a feed on your self-built site, I add them to the 'Really Social Sites' module of Hey Homepage.
Ultimately, I would like to publish this list of blogs, just like Kagi now does with their Small Web initiative. But I guess curating is key to adding quality. And when I think about curating, starting some kind of online magazine seems only natural.
Collaboration is obviously cool and only works with making it all public, I just don't know where "I'm doing this because I think it's cool" and "I'm going to put effort in to share it with others to get reactions"
- The mere possibility that someone will see it pushes me to put more thought and effort into what I write. Sometimes this reveals weaknesses in my ideas that I would have glossed over if I were just writing private notes for myself; sometimes it leads me to actually change my opinions. It also means the blog posts are easier for me to understand / get value out of than notes are if I come back and reread them years later.
- It creates opportunities for people to connect with me which can pay off at unexpected times. Occasionally people have reached out to me to say a post helped them or resonated with them, or to give a thoughtful reply or ask a question. Those sorts of interactions are really satisfying even if they're rare. (One time, I was interviewing for a dev job and the interviewer asked a question about a post I'd written on the philosophy of John Rawls, and how it could connect to software engineering. I found that absolutely delightful.)
- It's just nice to have an outlet when I feel like writing about something.
Some guy said that it's a progression.
You start using the web by being a casual reader. At some point you get more comfortable in public spaces and start replying small comments like you would reply to someone afk.
Then you start reading more and more about specific subjects, amassing knowledge, and your replies have more content. They start being organized. They have a structure, to guide future readers and show them how you came up with your conclusion. They have links to sources. They leave open doors for the parts you don't know.
Then you start writing more and more comments, with more and more content, as a result of your experience.
Then comes a moment where you realize you're going to write the same thing for the nth time, and being a good engineer with a focus on DRY, you want to write your thoughts once and for all and link to it every time. This is the moment you start writing a blog that you actually maintain: you write not because you feel the need to write more, but because you want to write less and direct people to it rather than repeating yourself.
On a side note, I get the "entitlement" from nobody. I take it. I also mean that in the best way possible. Nobody's asking for my software, my (future) articles, my point of view, etc. Still, I make stuff and sometimes share stuff. I think it can be a net value for some people (definitely not for everyone). This is only the reasoning behind it, the main motivator was me realizing I matter as a human being and I have only one life to live. I learned that because of experiencing a 'dark night of the soul' a couple of years back. Luckily I got through. And to be honest, if it wasn't for the internet - made up of personal websites and real people sharing their own experience on forums - that taught me everything there is to know about Cluster B disordered personalities (just an example, cough nothing personal cough), I don't think I would be sitting here typing this lengthy response.
I realized I can not sit back, enjoy the decline of the internet, and only complain about it. I would love to see the web have a lot of personal websites and blogs about every kind of subject, so I started to build a website software. The web/internet, and all the information shared and made easily accessible, made me able to save myself. I was probably helped more by some random dude who put up a website fifteen years ago with everything he knew about certain stuff than I was helped by anything else.
I’m certainly grateful for their help, and even written up a few of my own.
But the basic idea is the same --- heredocs for templating, using a plaintext -> html compiler (pandoc in my case), an intermediate CSV for index generation. Also some handy sed-fu [3] to lift out front matter. Classic :)
Very nice!
[1] https://github.com/karlb/karl.berlin/blob/master/blog.sh
[2] https://github.com/adityaathalye/shite
[3] I'm doing this: https://github.com/adityaathalye/shite/blob/master/bin/templ...
There is a bit of a limitation, though - I organize posts by namespace and with the date in the URL, and make can’t really handle that directly.
Do you mean the regexp in https://github.com/karlb/karl.berlin/blob/master/blog.sh#L4 ? It doesn't remove the formatting, just HTML comments (because they would show up on the page, otherwise) and rel="me" attributes (because they don't work with md2gemini). Feel free to read the blog post about adding Gemini support for more details: https://www.karl.berlin/gemini-blog.html
Re: namespace organisation. I thought about that a lot, and decided to adopt namespace-only convention for symmetry between text file layout, html file layout, and url scheme.
I've treated Date/time as metadata, which I can use to organise index pages. If I get to years worth of posts, then I'll group them by year/month or something reasonable. Likewise tags. I debated tags _and_ categories. But I decided on "everything is a post with tags, and categories will emerge based on topical coverage + post format".
For example,
A) Most importantly, I wanted to tinker and have fun!
B) I already use Bash at work and stuff, so it's easy for me.
C) I am generally averse to fast-changing dependencies, and giant dependency trees, so that rules out most scripting languages.
Besides, if you peruse the README, you will see that my code guarantee is "works on my machine". Your mileage will vary tremendously :)