Even back in the 1990s, CGI programs written in C were lightning fast. It just was (is) an error prone environment. Any safer modern alternative like the article's Go program or Nim or whatever not making database connections will be very fast & low latency to localhost - really similar to a CLI utility where you fork & exec. It's not free, but it's not that expensive compared to network latencies then or now.
People/orgs do tend to get kind of addicted to certain technologies that can interact poorly with the one-shot model, though. E.g., high start up cost Python interpreters with a lot of imports are still pretty slow, and people get addicted to that ecosystem and so need multi-shot/persistent alternatives.
The one-shot model in early HTTP was itself a pendulum swing from other concerns, e.g. ftp servers not having enough RAM for 100s of long-lived, often mostly idle logins.
You know, CGI with pre-forking (for latency hiding) and a safer language (like Rust) would be a great system to work on. Put the TLS termination in a nice multi-threaded web server (or in a layer like CloudFront).
No lingering state, very easy to dump a core and debug, nice mostly-linear request model (no callback chains, etc.) and trivially easy to scale. You're just reading from stdin and writing to stdout. Glorious. Websockets adds a bit of complexity but almost none.
The big change in how we build things was the rise of java. Java was too big, too bloated, too slow, etc. so people rapidly moved into multi-threaded application servers, all to avoid the cost of fork() and the dangers of C. We can Marie Kondo this shit and get back to things that are simple if we want to.
I don't even like Rust and this sounds like heaven to me. Maybe someone will come up with a way to make writing the kind of web-tier backend code in Rust easy by hiding a lot of the tediousness and/or complexity in a way that makes this appealing to node/js, php and python programmers.
This is not to disagree, but to agree adding some detail... :-)
Part of the Java rise was C/C++ being error prone and syntax similarity with such, but this was surely intermingled with a full scale marketing assault by Sun Microsystems who at the time had big multi-socket SMP servers they wanted to sell with Solaris/etc. and part of that was the Solaris/Java threading. Really for a decade or two prior to that the focus was on true MMU-based hardware-enforced isolation with OS kernel clean-up (more like CHERI these days) not the compiler-enforced stuff like Rust does.
I think you could have something more ergonomic than Perl/Python ever was and as practically fast as C/Rust with Nim (https://nim-lang.org/). E.g., I just copied that guy's benchmark with a Nim stdlib std/cgi and got over 275M CGI/day to localhost on a 2016 CPU doing only 2 requesters & 2 http server threads. With some nice DSL easily written if you don't like any current ones you could get the "coding overhead" down to a tiny footprint. In fairness I did zero SQLite whatever, but also he was using a computer over 4x bigger and probably a GHz faster with some IPC lift as well. So, IF you had the network bandwidth (hint - usually you don't!), you could probably support billions of hits/day off a single server.
To head off some lazy complaints, GC is just not an issue with a single threaded Nim program whose lifetime is hoped/expected to be short anyway. In many cases (just as with CLI utilities!) you could probably just let the OS reap memory, but, of course, it always "all depends" on a lot of context. Nim does reference counting anyway whereas most "fighting the GC" is actually fighting a "separate GC thread" (Java again, Go, D, etc.) trashing CPU caches or consuming DIMM bandwidth and so on. For this use, you probably would care more about a statically linked binary so you don't pay ld.so shared library set up overhead on every `exec`.
I got my start in the CGI era, and it baked into me an extremely strong bias against running short-lived subprocesses for things.
We invented PHP and FastCGI mainly to get away from the performance hit of starting a new process just to handle a web request!
It was only a few years ago that I realized that modern hardware means that it really isn't prohibitively expensive to do that any more - this benchmark gets to 2,000/requests a second, and if you can even get to a few hundred requests a second it's easy enough to scale across multiple instances these days.
I have seen AWS Lambda described as the CGI model reborn and that's a pretty fair analogy.
I think you might have found that CGI scripts deployed as statically-linked C binaries, with some attention given to size, you might've not been so disappointed.
The "performance hit of starting a new process" is bigger if the process is a dynamically-linked php interpreter with gobs of shared libraries to load, and some source file, reading parsing compiling whatever, and not just by a little bit, always has been, so what the author is doing using go, I think, would still have been competitive 25 years ago if go had been around 25 years ago.
Opening an SQLite database is probably (surprisingly?) competitive to passing a few sockets through a context switch, across all server(ish) CPUS of this era and that, but both are much faster than opening a socket and authenticating to a remote mysql process, and programs that are not guestbook.cgi often have many more resource acquisitions which is why I think FastCGI is still pretty good for new applications today.
That's likely true - but C is a scary language to write web-facing applications in because it's so easy to have things like buffer overflows or memory leaks.
CGI never was prohibitively expensive for low load and for high load a persistent process (e. g. FastCGI) is still better. CGI may be allows to handle 2k rps but FastCGI app doing the same job should handle more. You would need to start an additional server process (and restart it on upgrade) but it's worth to do if performance matters.
> We invented PHP and FastCGI mainly to get away from the performance hit of starting a new process just to handle a web request!
Yes! Note that the author is using a technology that wasn't available when I too was writing cgi_bin programs in the 00's: Go. It produces AOT compiled executables but is also significantly easier to develop in and safer than trying to do the same with C/C++ in the 00's. Back then we tended to use Perl (now basically dead). Perl and Python would incur significant interpreter startup and compilation costs. Java was often worse in practice.
> I have seen AWS Lambda described as the CGI model reborn and that's a pretty fair analogy.
Yes, it's almost exactly identical to managed FastCGI. We're back to the challenges of deployment: can't we just upload and run an executable? But of course so many technologies make things much, much more complicated than that.
I know of two large telecoms that internally develop with Perl, and a telecom product sold by Oracle that heavily relies on Perl. For text munging etc. it is still used, though I grant that other languages like Python are more popular.
> These days, we have servers with 384 CPU threads. Even a small VM can have 16 CPUs. The CPUs and memory are much faster as well.
With this hardware, if you reach for Kestrel you can easily do a few trillion requests per day. The development experience would be nearly identical - You can leverage the string interpolation operator for a PHP-like experience. LINQ and String.Join() open the door to some very terse HTML template syntax for tables and other nested elements.
The hard part is knowing how to avoid certain landmines in the ecosystem (MVC/Blazor/EF/etc.). The whole thing can live in one top-level program file that is ran on the CLI, but you need to know the magic keywords - "Minimal APIs" - or you will find yourself in the middle of the wrong documentation.
The nice thing about CGI is that you don't have to reinvent isolation primitives for multi-tenant use cases. A bug in one request doesn't corrupt another request, due to process isolation. An infinite loop in one request doesn't DoS other requests, due to preemptive scheduling. You can kill long-running requests with rlimit. You can use per-tenant cgroups to fairly allocate resources like memory, CPU and disk/network I/O. You can use namespaces/jails and privilege separation to restrict what a request has access to.
While all you say is true, it bears note that it didn't need to be decisive. The current mob branch of tcc is such that a `#!/bin/tcc -run` "script" is about 1.3x faster than perl</dev/null on two CPUs I tried.
Besides your two slower examples, Julia and Java VMs and else thread PHP also have really big start up times. As I said up top, people just get addicted to "big environments". Lisp culture would do that with images and this is part of where the "emacs is bloated" meme came from.
Anyway, at the time getline wasn't even standardized (that was 2008 POSIX - and still not in Windows; facepalm emoji), but you could write a pretty slick little library for CGI in a few hundred..few thou' lines of C. Someone surely even did.
But things go by "reputation" and people learn what their friends tell them to, by and large. So, CGI was absolutely the thing that made the mid to late 90s "Perl's moment".
Tomcat/Jakarta EE/JSP is a surprisingly solid stack. I only tried it once. Everything mostly just worked, and worked pretty well. You get to write pages PHP-style (interspersed HTML and code) but with the full power of Java instead of a hack language like PHP. Of course that paradigm may not suit everyone, but you also don't have to handle requests that way as you can also install pure Java routes. It supports websockets. You can share data between requests since it's a single-process multi-threaded model, so you can write something with real-time communication. You can also not do that; JSP code (and of course local variables) is scoped to a request. Deployment is very easy: drop the new webapp (a single file) in the webapps directory, by any method you like e.g. scp, and when Tomcat notices the new file is there, it transparently loads the new app and unloads the old one. You do have to watch out for classloader leaks that would prevent the old app being garbage-collected, though - downside of a single-process model.
It depends on the application usage pattern. For heavily used applications, sure, it's an excellent choice.
But imagine having to host 50 small applications each serving a couple of hundreds requests per day. In that case, the memory overhead of Tomcat with 50 war files is much bigger than a simple Apache/Nginx server with a CGI script.
The other issue with Tomcat is that a single bad actor can more easily compromise the server.
Not saying that can't happen with CGI, but since Tomcat is a shared environment, it's much more susceptible to it.
This is why shared, public Tomcat hosting never became popular compared to shared CGI hosting. A rogue CGI program can be managed by the host accounting subsystem (say, it runs too long, takes up too much memory, etc.), plus all of the other guards that can be put on processes.
The efficiency of CGI, specifically for compiled executables, is that the code segments are shared in virtual memory, so forking a new one can be quite cheap. While forking a new Perl or PHP process shares that, they still need to repeatedly go through the parsing phase.
The middle ground of "p-code" can work well, as those files are also shared in the buffer cache. The underlying runtime can map the p-code files into the process, and those are shared across instances also.
So, the fork startup time, while certainly not zero, can be quite efficient.
Because a lot of production software is half-baked. If you have to hand over an application to an operations team you need documentation, instrumentation, useful logging, error handling and a ton of other things. Instead software is now stuffed into containers that never receive security updates, because containers make things secure apparently. Then the developers can just dump whatever works into a container and hide the details.
To be fair most of that software is also way more complex today. There are a ton of dependencies and integrations and keeping track of them is a lot of work.
I did work with an old school C programmer that complained that a system we deployed was a ~2GB war file, running on Tomcat and requiring at least 8GB of memory and still crashed constantly. He had on multiple occasions offered to rewrite the how thing in C, which he figured would be <1MB and requiring at most 50MB of RAM to run. Sadly the customer never agreed, I would have loved to see if it had worked out as he predicted.
Ops here, I mean you still can if you use something like Golang or Java/.Net self-contained. However, the days of "Just transfer over PHP files" ignore the massive setup that Ops had to do to get web server into state where those files could just be transferred over and care/feeding required to keep the web server in that state.
Not to mention endless frustration any upgrades would cause since we had to get all teams onboard with "Hey, we are upgrading PHP 5, you ready?" and there was always that abandoned app that couldn't be shut down because $BusinessReasons.
Containers have greatly helped with those frustration points and languages self-hosting HTTP have really made stuff vastly better for us Ops folks.
Docker helps with this nowadays. Of course you need to understand setting things up the first time you do it but once you know, it can apply to any tech stack.
I develop and deploy Flask + Rails + Django apps regularly and the deploy process is the same few Docker Compose commands. All of the images are stored the same with only tiny differences in the Dockerfile itself.
It has been a tried and proven model for ~10 years. The core fundamentals have held up, there's new features but when I look at Dockerfiles I've written in 2015 vs today you can still see a lot of common ideas.
Perhaps. Over SSH? With a password or with a key? Do all employees share the same private key or do keys need to get added and removed when employees come and go. Is there one server or three (Are all deployment instructions done manually in triplicate?). When tomcat itself is upgraded, do you just eat the downtime? What about the system package upgrades or the OS? Which file should be copied over - whatever a particular Dev feels is the latest?
I've created a visualizer for apache requests with the workers, queues and whatnot [0]. You can load the demo to view real traffic comic from HN earlier this year.
This was something that I've been suspecting for some time. We're moving towards complicated architecture while having possibility to use good ol' tech with newest CPUs.
I've been asked about architecture of a stocks ticker that would serve millions of clients to show them on their phone the current stock price. First thought was streams, Kafka, pubsub etc but then I came up with static files on a server.
AFAICT the latency of any non-trivial web API is determined by the latency of DB queries, ML model queries, and suchlike. The rest is trivial in comparison, even when using slow languages like Python.
If all you need is to return rarely-changing data, especially without wasting time on authorization, you can easily approach the limits of your NIC.
People/orgs do tend to get kind of addicted to certain technologies that can interact poorly with the one-shot model, though. E.g., high start up cost Python interpreters with a lot of imports are still pretty slow, and people get addicted to that ecosystem and so need multi-shot/persistent alternatives.
The one-shot model in early HTTP was itself a pendulum swing from other concerns, e.g. ftp servers not having enough RAM for 100s of long-lived, often mostly idle logins.
No lingering state, very easy to dump a core and debug, nice mostly-linear request model (no callback chains, etc.) and trivially easy to scale. You're just reading from stdin and writing to stdout. Glorious. Websockets adds a bit of complexity but almost none.
The big change in how we build things was the rise of java. Java was too big, too bloated, too slow, etc. so people rapidly moved into multi-threaded application servers, all to avoid the cost of fork() and the dangers of C. We can Marie Kondo this shit and get back to things that are simple if we want to.
I don't even like Rust and this sounds like heaven to me. Maybe someone will come up with a way to make writing the kind of web-tier backend code in Rust easy by hiding a lot of the tediousness and/or complexity in a way that makes this appealing to node/js, php and python programmers.
Part of the Java rise was C/C++ being error prone and syntax similarity with such, but this was surely intermingled with a full scale marketing assault by Sun Microsystems who at the time had big multi-socket SMP servers they wanted to sell with Solaris/etc. and part of that was the Solaris/Java threading. Really for a decade or two prior to that the focus was on true MMU-based hardware-enforced isolation with OS kernel clean-up (more like CHERI these days) not the compiler-enforced stuff like Rust does.
I think you could have something more ergonomic than Perl/Python ever was and as practically fast as C/Rust with Nim (https://nim-lang.org/). E.g., I just copied that guy's benchmark with a Nim stdlib std/cgi and got over 275M CGI/day to localhost on a 2016 CPU doing only 2 requesters & 2 http server threads. With some nice DSL easily written if you don't like any current ones you could get the "coding overhead" down to a tiny footprint. In fairness I did zero SQLite whatever, but also he was using a computer over 4x bigger and probably a GHz faster with some IPC lift as well. So, IF you had the network bandwidth (hint - usually you don't!), you could probably support billions of hits/day off a single server.
To head off some lazy complaints, GC is just not an issue with a single threaded Nim program whose lifetime is hoped/expected to be short anyway. In many cases (just as with CLI utilities!) you could probably just let the OS reap memory, but, of course, it always "all depends" on a lot of context. Nim does reference counting anyway whereas most "fighting the GC" is actually fighting a "separate GC thread" (Java again, Go, D, etc.) trashing CPU caches or consuming DIMM bandwidth and so on. For this use, you probably would care more about a statically linked binary so you don't pay ld.so shared library set up overhead on every `exec`.
There's websocketd, which makes your program just a matter of reading from stdin and writing stdout. http://websocketd.com/
We invented PHP and FastCGI mainly to get away from the performance hit of starting a new process just to handle a web request!
It was only a few years ago that I realized that modern hardware means that it really isn't prohibitively expensive to do that any more - this benchmark gets to 2,000/requests a second, and if you can even get to a few hundred requests a second it's easy enough to scale across multiple instances these days.
I have seen AWS Lambda described as the CGI model reborn and that's a pretty fair analogy.
The "performance hit of starting a new process" is bigger if the process is a dynamically-linked php interpreter with gobs of shared libraries to load, and some source file, reading parsing compiling whatever, and not just by a little bit, always has been, so what the author is doing using go, I think, would still have been competitive 25 years ago if go had been around 25 years ago.
Opening an SQLite database is probably (surprisingly?) competitive to passing a few sockets through a context switch, across all server(ish) CPUS of this era and that, but both are much faster than opening a socket and authenticating to a remote mysql process, and programs that are not guestbook.cgi often have many more resource acquisitions which is why I think FastCGI is still pretty good for new applications today.
Yes! Note that the author is using a technology that wasn't available when I too was writing cgi_bin programs in the 00's: Go. It produces AOT compiled executables but is also significantly easier to develop in and safer than trying to do the same with C/C++ in the 00's. Back then we tended to use Perl (now basically dead). Perl and Python would incur significant interpreter startup and compilation costs. Java was often worse in practice.
> I have seen AWS Lambda described as the CGI model reborn and that's a pretty fair analogy.
Yes, it's almost exactly identical to managed FastCGI. We're back to the challenges of deployment: can't we just upload and run an executable? But of course so many technologies make things much, much more complicated than that.
With this hardware, if you reach for Kestrel you can easily do a few trillion requests per day. The development experience would be nearly identical - You can leverage the string interpolation operator for a PHP-like experience. LINQ and String.Join() open the door to some very terse HTML template syntax for tables and other nested elements.
The hard part is knowing how to avoid certain landmines in the ecosystem (MVC/Blazor/EF/etc.). The whole thing can live in one top-level program file that is ran on the CLI, but you need to know the magic keywords - "Minimal APIs" - or you will find yourself in the middle of the wrong documentation.
I just did a `time perl -e ''` (starting perl, executing an empty program), it took 5ms. 33ms with python3, 77ms with ruby.
Besides your two slower examples, Julia and Java VMs and else thread PHP also have really big start up times. As I said up top, people just get addicted to "big environments". Lisp culture would do that with images and this is part of where the "emacs is bloated" meme came from.
Anyway, at the time getline wasn't even standardized (that was 2008 POSIX - and still not in Windows; facepalm emoji), but you could write a pretty slick little library for CGI in a few hundred..few thou' lines of C. Someone surely even did.
But things go by "reputation" and people learn what their friends tell them to, by and large. So, CGI was absolutely the thing that made the mid to late 90s "Perl's moment".
I wouldn't be surprised to learn that Perl startup time has drifted. Need benchmark.
One shared JVM for maximum performance!
It can also share db connection pools, caches, etc. among those applications!
Wow!
But imagine having to host 50 small applications each serving a couple of hundreds requests per day. In that case, the memory overhead of Tomcat with 50 war files is much bigger than a simple Apache/Nginx server with a CGI script.
Not saying that can't happen with CGI, but since Tomcat is a shared environment, it's much more susceptible to it.
This is why shared, public Tomcat hosting never became popular compared to shared CGI hosting. A rogue CGI program can be managed by the host accounting subsystem (say, it runs too long, takes up too much memory, etc.), plus all of the other guards that can be put on processes.
The efficiency of CGI, specifically for compiled executables, is that the code segments are shared in virtual memory, so forking a new one can be quite cheap. While forking a new Perl or PHP process shares that, they still need to repeatedly go through the parsing phase.
The middle ground of "p-code" can work well, as those files are also shared in the buffer cache. The underlying runtime can map the p-code files into the process, and those are shared across instances also.
So, the fork startup time, while certainly not zero, can be quite efficient.
Because a lot of production software is half-baked. If you have to hand over an application to an operations team you need documentation, instrumentation, useful logging, error handling and a ton of other things. Instead software is now stuffed into containers that never receive security updates, because containers make things secure apparently. Then the developers can just dump whatever works into a container and hide the details.
To be fair most of that software is also way more complex today. There are a ton of dependencies and integrations and keeping track of them is a lot of work.
I did work with an old school C programmer that complained that a system we deployed was a ~2GB war file, running on Tomcat and requiring at least 8GB of memory and still crashed constantly. He had on multiple occasions offered to rewrite the how thing in C, which he figured would be <1MB and requiring at most 50MB of RAM to run. Sadly the customer never agreed, I would have loved to see if it had worked out as he predicted.
Not to mention endless frustration any upgrades would cause since we had to get all teams onboard with "Hey, we are upgrading PHP 5, you ready?" and there was always that abandoned app that couldn't be shut down because $BusinessReasons.
Containers have greatly helped with those frustration points and languages self-hosting HTTP have really made stuff vastly better for us Ops folks.
I develop and deploy Flask + Rails + Django apps regularly and the deploy process is the same few Docker Compose commands. All of the images are stored the same with only tiny differences in the Dockerfile itself.
It has been a tried and proven model for ~10 years. The core fundamentals have held up, there's new features but when I look at Dockerfiles I've written in 2015 vs today you can still see a lot of common ideas.
[0]: https://www.ibrahimdiallo.com/reqvis
Note: works best on desktop browsers for now.
I've been asked about architecture of a stocks ticker that would serve millions of clients to show them on their phone the current stock price. First thought was streams, Kafka, pubsub etc but then I came up with static files on a server.
I wonder how much would it cost though
If all you need is to return rarely-changing data, especially without wasting time on authorization, you can easily approach the limits of your NIC.