I'd be fine with this, even totally on-board, if C weren't so awful with respect to text. You don't even have to worry too much about free()ing your malloc()s if you design around short-lived processes. But this is just asking for security concerns among the tangled web of string and input processing your bespoke C routines are likely to develop into.
Pair it with a better, more modern, and safer native-compiled language and get the same effect. Zig, Nim, Go, hell even Carp.
I just wish there were better tools for navigating C codebases.
There’s been more than one time where I’m in some large auto tools based project trying to figure something out and there’s a call out to some dependencies I have no idea of.
Also many of the projects lack and sort of documentation or source code commenting. These aren’t someones pet project either. One of them was from a notable name in the open source community and the other one was a de-facto driver in a certain hardware space.
The strbuf library that's part of git.git is a pleasure to work with. It's C-string compatible (just a char /size_t pair), guarantees that the "char " member is always NULL-delimited, but can also be used for binary data. It also guarantees that the "buf" member is never NULL: https://github.com/git/git/blob/v2.34.0/strbuf.h#L6-L70
You will have to define "good". My string library[1][2] is "good" for me because:
1. It's compatible with all the usual string functions (doesn't define a new type `string_t` or similar, uses existing `char *`).
2. It does what I want: a) Works on multiple strings so repeated operations are easy, and b) Allocates as necessary so that the caller only has to free, and not calculate how much memory is needed beforehand.
The combination of the above means that many common string operations that I want to do in my programs are both easy to do and easy to visually inspect for correctness in the caller.
Others will say that this is not good, because it still uses and exposes `char *`.
No? Asking for code nav and you get three answers. Asking for this and you get crickets. In the 90s I worked at a place where we embedded TCL into all the apps, and rolled our own templating systems. I had to do a little string stuff in C after few years of go, and it sucked. Ugg. buf[len] = ‘\0’;
Using go, I thought I was getting back to low level stuff but this C experience made me appreciate strings in Go. Web servers in C are crazy bad idea, especially if they are spitting out html. Lisp would be better. Node would be better. Go would be better.
Considering that it's a stack that uses OpenBSD, my first thought would be Perl, although it's not a language that one could call “modern”, heh. It's included into the base system and has rich libraries for text processing, (Fast)?CGI, HTML, and all that.
I wrote my first web app in 2000 using C/mysql. It was Insanely fast but very awkward to implement. I used C because it was (and still is) the only language I knew well.
At least if you are going to use C, you (should) know to be extremely paranoid about how you process anything received from the user. That doesn't remove the risk but at least you are focused on it.
One of the most rewarding parts of back-end web application development is re-writing the same database routines and same JSON export routines over and over. Then changing the requirements and starting over. It's what makes web application developers such well-balanced folks, right?
Unfortunately, in the flux of user requirements—each addition or modification of a table column changing select routines, insertions, validation, exporting, regression tests, and even (especially?) front-end JavaScript—we make mistakes. What BCHS tools beyond the usual can help in this perennial test of our patience?
well there are still many large software written in C, e.g. nginx, lighttpd, even linux kernel.
I checked BCHS a few years back, the key piece is that it's Openbsd, if it's Linux it might have caught on, due to linux's popularity, good or bad. This could be useful for embedded device for example, but not so many embedded devices running OpenBSD, if any at all.
The C library used (https://github.com/kristapsdz/kcgi) is portable and working on linux as well. Putting this behind nginx as fastcgi seems very well doable.
Why would you ever choose C anymore? The killer feature of C++ is “you don’t pay for what you don’t use”. There’s virtually no reason ever not to use C++.
I can't speak to why you'd want to use C in a web stack, but I can weigh in in the more general sense:
A while ago I thought I'd try my hand at the Cryptopals challenges, and I figured, hey all the security guys know C (and python, but ugh) so I'll use this as an opportunity to really learn C. Prior to starting that project, I "knew" C, in the sense that I took CS036 which was taught in that small subset of C++ that might as well be C.
So I jumped in and it felt really liberating, at first. You want to compare chars? It's just a number, use >= and <= ! Cast it back and forth to different types! Implement base64 by just using a character array of the encoding alphabet and just walk through it like an array and do magic casts! No stupid boilerplate like namespaces or static class main{}!
Then by about the 2nd set where you have to start attacking AES ECB I realized I was spending more time debugging my mallocs/frees and my fucking off-by-one errors than I was spending on actually learning crypto. I stuck with it until I think part way through the third set but by that point I couldn't take it any more.
So I bailed out of C and never looked back. I can see how a certain type of programmer (who is more practiced with malloc/fastidious with their array indices and loop bounds than I am) can really enjoy C for a certain type of work. But I can actually say now, hand on heart, that I know C; and I don't like it.
The last time I looked at c++ (mid-1990's) it generated hugely bloated binaries and was generally slower at runtime than plain C. Is that still true today?
Is there any language (other than assembly) that is faster at runtime than C today?
this is very cool. I only have a passing familiarity with Tcl, but I've been building my own toy web framework and this is a fantastic reference! they made a lot of the same choices I made API-wise but the way they went about it is worth studying.
I'd like to point out that Wapp doesn't necessarily need to be run as a plain-old CGI application, I've had success running it with it's own built in web-server behind NGINX, for example.
It seems like the database libraries they recommend for security, ksql and sqlbox, mitigate the risk with process separation and RBAC, so the CGI process doesn't have full access to the database.
It's definitely contrary to modern assumptions about web app security, but it's interesting to see web apps that are secure because they use OS security features as they were designed to be used, rather than web apps that do things that are insecure from an OS-perspective, like handling requests from multiple users in the same process, but are secure because they do it with safe programming languages.
ksql exports "ksql_exec", while sqlbox exports "sqlbox_exec" -- both of those allow execution of arbitrary SQL.
So no, the web apps cannot be made secure via OS support alone, because the OS security features are not adequate for high-level problems. Any sort of code exploit allows attacker to trivially access the entire database -- either to read anything, or to overwrite anything.
"pledge" and "unveil" can prevent new processes from being spawned, but they cannot prevent authentication bypass, database dumpling or database deletion.
Yes, but they are...better built than your quick social network poll application thingy with customer's special sauce that you had 5 days to specify, develop and deploy.
C is a tremendous tool, but I don't think it's the best for customer facing web apps.
Funny to reflect that there was a time not so long ago when writing web apps (CGI usually) in C wasn't at all unusual (shortly before Perl became much more popular for this). And today, it is indeed kind of crazy.
Depends on your definition of "not so long ago" - it's certainly most of the history of the web. The point when Perl, PHP, and Java started to become the dominant web app technologies is about as far from the present day as that point was from the moon landing.
I remember writing CGI scripts in Perl in 1993 ( the year before Netscape ). I am not sure when CGI even became a thing but it could not have been long before that.
Not only was “not so long ago” kind of at the very beginning of meaningful web history but it was also for a very brief moment in time ( if we are talking pre-Perl ). Pre-Perl CGI may have never been a thing though as Perl is older than CGI.
I recall PHP being the next wave after Perl. One could argue it never lost its place even if it now has many neighbours.
Not a Perl advocate by the way though it did generate some pretty magical “dynamic” web pages from text files and directory listings back in the day. Similar story with PHP.
How about a web-facing she'll that allows arbitrary code execution ? [0]
There's nothing fundamentally insecure about allowing C or any arbitrary code to execute on behalf of a user -- this is basically what cloud computing (especially "serverless") is.
As you identify, though, you need a Controlled Interface (CI) which accounts for this model for all resources and all kinds of resources and many tools do not (yet) allow for it.
The big difference is that with bash (python, perl, php etc..) exploits, all you need is to upgrade a package, and you are secure. No need to touch any of the application code.
Compare it with C, where the bugs are likely unique per app, and require non-trivial effort to detect and fix.
Execution of user-specific code by serverless services requires non-trivial isolation, and is predicated on "each user has its own separated area" to work. This is not the case with most websites. Take HN for example -- there is a shared state (list of posts) and app-specific logic of who can edit the posts (original owner or moderator). No OS-based service can enforce this for you.
Writing C might be challenging for some, but as others have mentioned, one can use some other language which gives a statically linked binary to place in the httpd chroot. It won’t be BCHS then.
For uptime.is I’ve used a stack which I’ve started calling BLAH because of LISP instead of C.
People love to talk all sorts of trash on this kind of stack but it's really quite solid for what it does. If anyone was ever curious what a sizeable codebase in this kind of code would even look like, check out the source code for undeadly.org [1]. Yeah these people may be crazy but they're also OpenBSD developers and we really love to see what we can get away with using nothing other than what's available in the base distribution. I think a lot of what you see being written for production ends up being very similar to this kind of approach, maybe just utilizing rust or golang as the web application backend language if that's what is the more comfortable thing. Nothing but the base system and a single binary, not relying on an entire interpreter stack, sure can be smooth.
There's other examples of this kind of approach, too, writing straight C Common Gateway Interface web applications in public-facing production use - What comes to mind is the version control system web frontend that the people who write wireguard use, cgit [2] - If it's really so crazy then how come the openbsd and wireguard people - presumably better hackers than you - are just out there doing it?
Other places you see C web application interfaces include in embedded devices (SCADA, etc) and even the web interfaces for routers, which unfortunately ARE crazy because check out all the security problems! Good thing people at our favorite good old research operating system have done the whole pledge(2)[3] syscall to try and mitigate things when those applications go awry - understanding this part of the whole stack is probably key to seeing how any of it makes any sense at all in 2022. It sure would be nicer if those programs just crashed instead of opening up wider holes. Maybe we can hope these mitigations and a higher code quality for limited-resource device constraints all become more widespread.
> If it's really so crazy then how come the openbsd and wireguard people - presumably better hackers than you - are just out there doing it?
Probably precisely because they're better? I can see why people who are struggling with malloc and off-by-ones (https://news.ycombinator.com/item?id=29990985) would think it's crazy.
I think the correct pronunciation is “Breaches”. Using C in this place as other have mentioned is very, very likely to lead to security issues. Even C++, with its better string handling would be a step up.
Pair it with a better, more modern, and safer native-compiled language and get the same effect. Zig, Nim, Go, hell even Carp.
I love how trollish it is not to talk about Rust in that context.
There’s been more than one time where I’m in some large auto tools based project trying to figure something out and there’s a call out to some dependencies I have no idea of.
Also many of the projects lack and sort of documentation or source code commenting. These aren’t someones pet project either. One of them was from a notable name in the open source community and the other one was a de-facto driver in a certain hardware space.
Then there are tools like SourceGraph, CppDepend among others.
https://clang.llvm.org/docs/JSONCompilationDatabase.html
That being said, I love seeing a push for simple stacks like this.
*gasp!* Such lack of symmetry... it disturbs something deep in my soul.
https://github.com/antirez/sds
However the moment you call into other C libraries, they naturally only expect a char *.
You will have to define "good". My string library[1][2] is "good" for me because:
1. It's compatible with all the usual string functions (doesn't define a new type `string_t` or similar, uses existing `char *`).
2. It does what I want: a) Works on multiple strings so repeated operations are easy, and b) Allocates as necessary so that the caller only has to free, and not calculate how much memory is needed beforehand.
The combination of the above means that many common string operations that I want to do in my programs are both easy to do and easy to visually inspect for correctness in the caller.
Others will say that this is not good, because it still uses and exposes `char *`.
[1] https://github.com/lelanthran/libds/blob/master/src/ds_str.h
[2] Currently the only bug I know of is the quadratic runtime in many of the functions. I intend to fix this at some point.
Using go, I thought I was getting back to low level stuff but this C experience made me appreciate strings in Go. Web servers in C are crazy bad idea, especially if they are spitting out html. Lisp would be better. Node would be better. Go would be better.
Otherwise, yes using anything safer, where lack of bounds checking isn't considered a feature is a much better option.
At least if you are going to use C, you (should) know to be extremely paranoid about how you process anything received from the user. That doesn't remove the risk but at least you are focused on it.
Will generate Rust and typescript if ya want.
Unfortunately, in the flux of user requirements—each addition or modification of a table column changing select routines, insertions, validation, exporting, regression tests, and even (especially?) front-end JavaScript—we make mistakes. What BCHS tools beyond the usual can help in this perennial test of our patience?
;)
https://learnbchs.org/kwebapp.html
I checked BCHS a few years back, the key piece is that it's Openbsd, if it's Linux it might have caught on, due to linux's popularity, good or bad. This could be useful for embedded device for example, but not so many embedded devices running OpenBSD, if any at all.
Deleted Comment
I can't speak to why you'd want to use C in a web stack, but I can weigh in in the more general sense:
A while ago I thought I'd try my hand at the Cryptopals challenges, and I figured, hey all the security guys know C (and python, but ugh) so I'll use this as an opportunity to really learn C. Prior to starting that project, I "knew" C, in the sense that I took CS036 which was taught in that small subset of C++ that might as well be C.
So I jumped in and it felt really liberating, at first. You want to compare chars? It's just a number, use >= and <= ! Cast it back and forth to different types! Implement base64 by just using a character array of the encoding alphabet and just walk through it like an array and do magic casts! No stupid boilerplate like namespaces or static class main{}!
Then by about the 2nd set where you have to start attacking AES ECB I realized I was spending more time debugging my mallocs/frees and my fucking off-by-one errors than I was spending on actually learning crypto. I stuck with it until I think part way through the third set but by that point I couldn't take it any more.
So I bailed out of C and never looked back. I can see how a certain type of programmer (who is more practiced with malloc/fastidious with their array indices and loop bounds than I am) can really enjoy C for a certain type of work. But I can actually say now, hand on heart, that I know C; and I don't like it.
Is there any language (other than assembly) that is faster at runtime than C today?
In what way C makes you pay for what you don't use?
Deleted Comment
[1] Wapp - A Web-Application Framework for TCL:
https://wapp.tcl.tk/home
[2] EuroTcl2019: Wapp - A framework for web applications in Tcl (Richard Hipp):
https://www.youtube.com/watch?v=nmgOlizq-Ms
(They do have "pledge" but even in the most restricted case, this still leaves full access to database)
It's definitely contrary to modern assumptions about web app security, but it's interesting to see web apps that are secure because they use OS security features as they were designed to be used, rather than web apps that do things that are insecure from an OS-perspective, like handling requests from multiple users in the same process, but are secure because they do it with safe programming languages.
So no, the web apps cannot be made secure via OS support alone, because the OS security features are not adequate for high-level problems. Any sort of code exploit allows attacker to trivially access the entire database -- either to read anything, or to overwrite anything.
"pledge" and "unveil" can prevent new processes from being spawned, but they cannot prevent authentication bypass, database dumpling or database deletion.
C is a tremendous tool, but I don't think it's the best for customer facing web apps.
Not only was “not so long ago” kind of at the very beginning of meaningful web history but it was also for a very brief moment in time ( if we are talking pre-Perl ). Pre-Perl CGI may have never been a thing though as Perl is older than CGI.
I recall PHP being the next wave after Perl. One could argue it never lost its place even if it now has many neighbours.
Not a Perl advocate by the way though it did generate some pretty magical “dynamic” web pages from text files and directory listings back in the day. Similar story with PHP.
By 1999 I was already using our own version of mod_tcl and unfortunely fixing exploits every now and then in our native libs called by Tcl.
There's nothing fundamentally insecure about allowing C or any arbitrary code to execute on behalf of a user -- this is basically what cloud computing (especially "serverless") is.
As you identify, though, you need a Controlled Interface (CI) which accounts for this model for all resources and all kinds of resources and many tools do not (yet) allow for it.
[0] https://rkeene.dev/js-repl/?arg=bash
Compare it with C, where the bugs are likely unique per app, and require non-trivial effort to detect and fix.
Execution of user-specific code by serverless services requires non-trivial isolation, and is predicated on "each user has its own separated area" to work. This is not the case with most websites. Take HN for example -- there is a shared state (list of posts) and app-specific logic of who can edit the posts (original owner or moderator). No OS-based service can enforce this for you.
Deleted Comment
For uptime.is I’ve used a stack which I’ve started calling BLAH because of LISP instead of C.
There's other examples of this kind of approach, too, writing straight C Common Gateway Interface web applications in public-facing production use - What comes to mind is the version control system web frontend that the people who write wireguard use, cgit [2] - If it's really so crazy then how come the openbsd and wireguard people - presumably better hackers than you - are just out there doing it?
Other places you see C web application interfaces include in embedded devices (SCADA, etc) and even the web interfaces for routers, which unfortunately ARE crazy because check out all the security problems! Good thing people at our favorite good old research operating system have done the whole pledge(2)[3] syscall to try and mitigate things when those applications go awry - understanding this part of the whole stack is probably key to seeing how any of it makes any sense at all in 2022. It sure would be nicer if those programs just crashed instead of opening up wider holes. Maybe we can hope these mitigations and a higher code quality for limited-resource device constraints all become more widespread.
[1] http://undeadly.org/src/ [2] https://git.zx2c4.com/cgit/ [3] https://learnbchs.org/pledge.html
Probably precisely because they're better? I can see why people who are struggling with malloc and off-by-ones (https://news.ycombinator.com/item?id=29990985) would think it's crazy.
pkg_add sqlite3
Can't get away.
Berkeley DB with a header date of 1994 :) In base, and of course it still works.
Sqlite was removed from base, again, in 6.1 (2019) --https://www.openbsd.org/faq/upgrade61.html
with this BSDCAN '18 pdf briefly explaining the issues (unmaintainable) -- https://www.openbsd.org/papers/bsdcan18-mandoc.pdf
We like seeing what we can get away with using what's available in the base distribution and a few well-chosen, well-audited packages
Deleted Comment
I think the correct pronunciation is “Breaches”. Using C in this place as other have mentioned is very, very likely to lead to security issues. Even C++, with its better string handling would be a step up.
Database stuff took a good deal of doing, but with little in terms of abstraction, it was also quite fast.
I would like to see a rennescance of using different protocols than HTTP and different content markup than HTML.