Writing C for Curl - Readit News

> We count about 40% of our security vulnerabilities to date to have been the direct result of us using C instead of a memory-safe language alternative. This is however a much lower number than the 60-70% that are commonly repeated, originating from a few big companies and projects.

There has been discussion in an Arch Linux internal channel about the accuracy of these classifications. We noticed many advisories contain a "This bug is not considered a C mistake. It is not likely to have been avoided had we not been using C."-disclaimer, but was unclear what the agenda was and how "C mistake" is defined.

It was brought up because this disclaimer was also present in the CVE-2025-0665 advisory[0], which is essentially a double-free but on file descriptor level. The impact is extremely low (it's more "libcurl causing unsoundness in your process rather than can-be-exploited-into-RCE"), but it's a direct result of how C manages resources. This kind of bug can also occur in Python, but you're unlikely to find this kind of bug in Rust.

Could this bug have occurred with a programming language that isn't C? Yes. Could this bug have been avoided by using a programming language that isn't C? Also yes.

[0]: https://curl.se/docs/CVE-2025-0665.html

uecker · 5 months ago

The question is: Could such bugs be avoided in C using the right tools and strategies. And the answer is also often: yes.

This is why a large component of the argument for switching to other languages is usually that is impossible to avoid such bugs in C even for experts. But I think this argument, while having some small amount of truth to it, also is partially deceptive. One can not simply look at number of CVEs and conclude this, one needs to compare apples to apples and then I find the reality looks differently, e.g. if a simple mitigation for the bug in C could not be used for some reason, but this reason would also prevent the use of another language in the first place, then using this as argument is misleading.

pornel · 5 months ago

> Could such bugs be avoided in C using the right tools and strategies

"right tools and strategies" is very open-ended, almost tautological — if you didn't catch the bug, then obviously you haven't used the right tools and the right strategies! In reality, the tools and strategies have flaws and limitations that turn such problem into "yes but actually no".

Static analysis of C code has fundamental limits, so there are bugs it can't find, and there are non-trivial bugs that it can't find without also finding false positives. False positives make developers needlessly tweak code that was correct, and leads to fatigue that makes them downplay and ignore the reports. The more reliable tools require catching problems at run-time, but problems like double-free often happen only in rare code paths that are hard to test for, and fuzzers can't reach all code either.

kpcyrd · 5 months ago

The blogpost claims they are already running "all the tools", can you please be more specific which one they are missing? Maybe the tool to avoid this kind of lifetime issue just happens to be rustc?

pjmlp · 5 months ago

It would help if since lint was created in 1979, the large majority of C developers actually used the right tools and strategies.

In practice only when forced down MISRA like processes people seem to care, versus how relevant secure programming is seen in other programming language communities since 1960's.

Secure programming was part of Burroughs and Multics design, so why is the answer from a systems language community designed a decade later, and after 40+ years since the Morris worm, "we don't use right tools and strategies over here"?

im3w1l · 5 months ago

For all it's many faults, even C++ fstreams is not vulnerable to double freeing (and as a partial reply to @Galanwe, they way they avoid issues is runtime checking).

kllrnohj · 5 months ago

In C++ you can also make the FD-equivalent of std::unique_ptr, like Android does with unique_fd: https://cs.android.com/android/platform/superproject/main/+/...

It doesn't guarantee the issue never happens, like Rust would, but it does make it dramatically less likely to occur.

Also I think in general people vastly under-appreciate how severe an issue EBADF actually is. Outside of extremely specific, single-thread-only scenarios, that error is essentially the kernel telling you that heap corruption occurred, but almost nobody treats it with that level of severity

pjmlp · 5 months ago

Already in early 1990s, with C++ARM as the first standard, there were plenty of advantages using C++ instead of plain C.

RAII, streams instead of stdio patters, compilers had collection classes for common types (string, array, ...) with bounds checking configuration,....

Galanwe · 5 months ago

> It was brought up because this disclaimer was also present in the CVE-2025-0665 advisory[0], which is essentially a double-free but on file descriptor level

I don't see how Rust would have prevented calling close() two times with the same eventfd.

Munksgaard · 5 months ago

The same way Rust prevents calling close() two times on a file.

technion · 5 months ago

The standard for rust is that close() gets called automatically when the file descriptor goes out of scope. I believe you could choose to do it manually but that's unusual coding.

coliveira · 5 months ago

The problem is not as much C, but coding practices that make it seem like we're still in the 1970s. Codebases like curl use C at a very low level. But C has functions, has structures, has a lot of functionality to allow you to write at a higher level, instead of chasing pointers at each while statement. Code that handles pointers could be abstracted in the same way people will have to do in other languages.

pjmlp · 5 months ago

Bell Labs 1970s, I advise learning about what already existed elsewhere in systems programming languages.

sgarland · 5 months ago

Pointers are not a difficult concept.

zxilly · 5 months ago

What about performance overheads? I think the reason a lot of people write C is that they have direct control over the generated assembly to maximise performance.

The guidelines feel out of sync with the directions I've seen people push coding styles over the years

"Identifiers should be short" when I've mostly seen people decry how annoying it is to find yourself in a codebase where everything is abbreviated C-style (htons, strstr, printf, wchar_t, _wfopen, fgetws, wcslen)

There's a case for more verbosity and if you look at modern Curl code it reflects that as well, new identifiers aren't short

https://github.com/curl/curl/blob/master/lib/vquic/vquic.c

"Functions should be short" where I've mostly seen very negative feedback on codebases written following the trend of Uncle Bob's short functions. Complaints that hiding code in 10 levels of function calls isn't helpful, and that following rabbit holes is tedious even with modern editors

"Code should be narrow", "we enforce a strict 80 column maximum line length" I don't think I've seen that take lately. I remember seeing a few posts fly by about the number 80 specifically

You want to prevent dragging your eyes. For my IDE on default settings on a 1080p monitor, half of a 15" screen fits 100 characters

If you take away 20 columns to fit your text on less of the screen do you really get any benefits

What about the cascading effects on the code, like worse names, split lines, ...

In the end it's semi-interesting but we're all building sheds and these are mostly debates on what color the shed should be

tuetuopay · 5 months ago

Everything is a balance. IMHO, the "identifiers should be short" "functions should be short" and such are knee-jerk reactions to overly long things that are common in some other languages (looking at you, Java). Like the practice of indicating the type, pointer, etc. Stuff ike `pWcharInputBuffer` and such.

There is a balance between `*p` and `inputPointerToMiddleOfBufferThatFrobnicates`.

dahauns · 5 months ago

>Everything is a balance.

Very true, or as I like to put it: everything is a tradeoff.

Over decades of programming, I'm fairly certain my preferences for things like function/identifier length could be plotted along a damped oscillation curve. :)

dfox · 5 months ago

> https://github.com/curl/curl/blob/master/lib/vquic/vquic.c

When somebody says "short identifiers" in relation to C, this is exactly the style meant by that, not the cryptic style of C standard library.

creatonez · 5 months ago

> "Code should be narrow", "we enforce a strict 80 column maximum line length" I don't think I've seen that take lately. I remember seeing a few posts fly by about the number 80 specifically

To be fair, this is more doable in C than most other languages. No namespacing, no generics, etc. means you're not using as many columns.

I'm still not convinced, though. It's a crunch. Would rather just set a 120 to 160 column limit and make identifiers as descriptive as they should be. And I'd use prefix namespacing all over the place anyways -- fuzzy autocomplete can make it convenient.

GuB-42 · 5 months ago

For short identifiers, I think you missed an important detail.

> Also related: (in particular local) identifiers and names should be short.

The general idea is that the more distant the identifier is, the more descriptive is should be. Because you don't have as much context, and it is also a hint: if you see a long, descriptive name, it is more likely to be global.

And descriptive doesn't mean long. You still need to try making your descriptive names as short as possible. For example "timeSinceMidnightInSeconds" can be shortened to "secondsSinceMidnight" without loss of information: seconds are a unit of time, no need to repeat it.

stzsch · 5 months ago

Seems similar to the linux kernel coding style: https://www.kernel.org/doc/html/latest/process/coding-style....

Almondsetat · 5 months ago

There are 2 hard problems in computer science: cache invalidation, naming things, and off-by-1 errors.

I don't thing good names will be solved soon

sebstefan · 5 months ago

timhh · 5 months ago

When I counted I got about 55% which is pretty close to the standard 2/3.

https://blog.timhutt.co.uk/curl-vulnerabilities-rust/

johnisgood · 5 months ago

> Code should be easy to read. It should be clear. No hiding code under clever constructs, fancy macros or overloading.

I highly agree with this. I do not always want highly abstracted code, and some programming languages aiming to replace C are much more difficult to read, that said, Rust is supposed to replace C++, not C, right?

Thank you for the article!

Zambyte · 5 months ago

I have been playing around a lot with Zig lately, and though it's still in beta, it really feels like it has the best chance at being a true C successor. While Rust feels like they started with C++ and worked on making it harder to write incorrectly, Zig feels like they started with C and worked on making it easier to write correctly.

They also have a few pillars that they call the "Zen" of Zig[0], of which three out of the first five are directly related to readability.

[0] https://ziglang.org/documentation/0.14.0/#Zen

Arch-TK · 5 months ago

I used to think rust was like C++ but harder to write badly but I don't think it's anything like C++ now that I've spent a couple of years writing it.

Rust is its own thing, it has none of the extensive baggage of C++, and doesn't appear to be set to reach that level of baggage at any point in time soon if ever. It's a much cleaner, clearer, and easier to reason about programming language.

voidUpdate · 5 months ago

I really want to start trying to learn zig, but right now I feel like it's not quite finished enough. When it hits 1.0, I'll probably give it a more serious look

masklinn · 5 months ago

> Rust is supposed to replace C++, not C, right?

Rust is intended as a systems langage. To the extent that it’s “supposed” to replace anything, it’s both.

MrMcCall · 5 months ago

The reality is that we spend FAR more time reading code than writing it. That is why readability is far more important than clever, line saving constructs.

The key to further minimizing the mental load of reacquainting yourself with older existing code is to decide on a set of code patterns and then be fastidious in using them.

And then, if you want to want to be able to easily write a parser for your own code (without every detail in the spec), it's even more important.

And now that I have read TFA, I see he wrote:

> We have tooling that verify basic code style compliance.

His experience and dilligence has led him to the mountaintop, that being we must make ourselves mere cogs in a larger machine, self-limiting ourselves for the greater good of our future workload and production quality.

baumschubser · 5 months ago

> The reality is that we spend FAR more time reading code than writing it. That is why readability is far more important than clever, line saving constructs.

In JS sometimes chain two or three inline-arrow-functions specifically for readability. When you read code, you often search for the needle of "the real thing" in a haystack of data formatting, API response prepping, localization, exception handling etc.

Sometimes those shorthand constructs help me to skip the not-so-relevant parts instead of mentally climbing down and up every sort and rename function.

That being said, I would not want this sentiment formalized in code guidelines :) And JS is not C except both have curly braces.

> That is why readability is far more important than clever, line saving constructs.

Yes, I agree, that is why I am put off by some supposed C replacements that are trying to be clever with their abstractions or constructs.

veltas · 5 months ago

> So many people will now joke and say something about wide screens being available

And this is a silly point because I want to be able to put 2-3 files side-by-side, on that big monitor. Who are all these people asking for long code that means I don't get more than one file on screen at a time?

It's not even just that. The reason newspapers have multiple columns rather than lengthy lines is because it's strictly easier to read shorter lines.

I don't think anyone disagrees with that, but 80 characters is clearly waaay too restrictive. I think 120 is much more reasonable.

dspillett · 5 months ago

There are many, usually non-technical people though some devs & such too, who maximise everything then complain about how much space is wasted on the right of their fancy screen.

I have a 32" screen running at "standard" pixel pitch (matching the 24" 1080p screen I have in portrait next to it) which I sometimes use full-screen but usually have split 50/50, 33/66, 25/75, or 33/33/33, depending on what I'm doing. One of our testers doesn't understand, can't see benefit I get from the flexibility ("why not just have two monitors?" has been asked several times). It seems to actively annoy her that such a wide screen exists. If she ever saw the ultra-wide my friend uses for gaming I think she'd have a seizure.

Admittedly when sat this monitor plus the other in portrait is in total a bit wide (so the other screen is usually relegated to just being mail/chat windows that I only interact with when something pings for my attention) and a touch too tall. It is much more comfortable when I use the desk raised so I stand, which is how I work >⅔ of the time.

kwon-young · 5 months ago

Curl is one of the very few projects I managed to contribute to with a very simple PR.

At the time, I was a bit lost with their custom testing framework, but was very imprest by the ease of contributing to one of the most successful open-source project out there.

I now understand why. It is because of their rules around testing and readability (and the friendly attitude of Daniel Stenberg) that a novice like me managed to do it.

kobzol · 5 months ago

Great post!

I have some random guesses as to why the 40% vs 60-70% memory issues percentage:

- 180k is not that much code. The 60-70% number comes from Google and Microsoft, and they are dealing with way larger codebases. Of course, the size of the codebase in theory shouldn't affect the percentage, but I suspect in practice it does, as the larger the codebase is, the harder it is to enforce invariants and watch for all possible edge cases.

- A related aspect to that is that curl is primarily maintained by one person (you), or at most a handful of contributors. Of course many more people contribute to it, but there is a single maintainer who knows the whole codebase perfectly and can see behind all (or most) corners. For larger codebases with hundreds of people working on them, that is probably not the case.

- Curl is used by clients a lot (probably it's used more by clients than servers, for whatever definition of these words) over which you have no control and monitoring. That means that some UB or vulnerabilities that were triggered "in the wild", on the client side, might not ever be found. For Google/Microsoft, if we're talking about Chrome, Windows, web services etc., which are much more controled and monitored by their companies, I suspect that they are able to detect a larger fraction of vulnerabilities and issues than we are able to detect in curl.

- You write great code, love what you're doing and take pride in a job done well (again, if we scale this to a large codebase with hundreds of developers, it's quite hard to achieve the same level of quality and dedication there).

(sent this as a comment directly on the post, but it seems like it wasn't approved)

janoelze · 5 months ago

This is remarkably clear writing — you sense how it was formed by thousands upon thousands of hours spent communicating, really cool.