One thing I've been wondering recently about Fil-C - why now? And I don't mean that in a dismissive way at all, I'm genuinely curious about the history. Was there some relatively recent fundamental breakthrough or other change that prevented a Fil-C-like approach from being viable before? Was it a matter of finding the right approach/implementation (i.e., a "software" problem), or is there something about modern hardware which makes the approach impractical otherwise? Something else?
I wrote a bounds checking patch to GCC (mentioned in a link from the article) back in 1995. It did full bounds checking of C & C++ while being compatible with existing libraries and ABIs, making it a bit more practical than Fil-C to deploy in the real world. You only had to recompile your application, if you trusted the libraries (although the bounds checking obviously didn't extend into the libraries unless you recompiled them). It didn't do the GC thing, but instead detected use after free at the point of use.
> Was there some relatively recent fundamental breakthrough or other change that prevented a Fil-C-like approach from being viable before?
The provenance model for C is very recent (and still a TS, not part of the standard). Prior to that, there was a vague notion that the C abstract machine has quasi-segmented memory (you aren't really allowed to do arithmetic on a pointer to an "object" to reach a different "object") but this was not clearly stated in usable terms.
Also in practical terms, you have a lot more address space to "waste" in 64 bit. It would have been frivolous in 32 and downright offending in 16 bit code.
The memory model always had segmented memory in mind and safe C approaches are not new. The provenance model makes this more precise, but the need for this was to deal with corner cases such as pointer-to-integer roundtrips or access to the representation bytes of a pointer. Of course, neither GCC nor clang get this right, to the extend that those compiler are internally inconsistent and miscompile even code that did not any clarification to be considered correct.
Beyond the Git history, is there any write-up of the different capability designs you've gone with?
I'm interested in implementing a safe low-level language with less static information around than C has (e.g. no static pointer-int distinction), but I'd rather keep around the ability to restrict capabilities to only refer to subobjects than have the same compatibility guarantees Invisicaps provide, so I was hoping to look into Monocaps (or maybe another design, if there's one that might fit better).
That's a really interesting timeline! Sounds like it's been stewing for a lot longer than I expected. Was there anything in particular around 2018 that changed your opinion on the idiotic-ness of the premise?
If a hypothetical time machine allowed you to send the InvisiCaps idea back to your 2004-era self, do you think the approach would have been feasible back then as well?
Long long ago, in 2009, Graydon was my official on-boarding mentor when I joined the Mozilla Javascript team. Rust already existed then but, as he notes, was quite different then. For one thing, it was GC'd, like Fil-C. Which I like -- I write a lot of my C/C++ code using Boehm GC, have my own libraries designed knowing GC is there, etc.
This has obviously been 'rust'ling some feathers, as it challenges some of the arguments laid past; but once the dust settles, it is a major net benefit to the community.
I hope you get financed and can support other platforms than linux again.
> This has obviously been 'rust'ling some feathers,
I'm a Rust user and a fan. But memory safe C is actually an exciting prospect. I was hoping that the rise of Rust would encourage others to prioritize memory safety and come up with approaches that are much more ergonomic to the developers.
> as it challenges some of the arguments laid past
Genuinely curious. What are the assumptions you have in mind that Fil-C challenges? (This isn't a rhetorical question. I'm just trying to understand memory safety concepts better.)
> but once the dust settles, it is a major net benefit to the community.
Agreed, this is big! If Fil-C can fulfill its promise to make old C code memory safe, it will be a massive benefit to the world. God knows how many high-consequnce bugs and vulnerabilities hide in those.
"But almost all programs have paths that crash, and perhaps the density of crashes will be tolerable."
This is a very odd statement. Mature C programs written by professional coders (Redis is a good example) basically never crash in the experience of users. Crashing, in such programs, is a rare occurrence mostly obtained by attackers on purpose, looking for code paths that generate a memory error that - if the program is used as it should - are never reached.
This does not mean that C code never segfaults: it happens, especially when developed without care and the right amount of testing. But the code that is the most security sensitive, like C Unix servers, is high quality and crashes are mostly a security problem and a lot less a stability problem.
Notice that it says "almost all programs" and not "almost all _C_ programs".
I think if you understand the meaning of "crash" to include any kind of unhandled state that causes the program to terminate execution then it includes things like unwrapping a None value in Rust or any kind of uncaught exception in Python.
That interpretation makes sense to me in terms of the point he's making: Fil-C replaces memory unsafety with program termination, which is strictly worse than e.g. (safe) Rust which replaces memory unsafety with a compile error. But it's also true that most programs (irrespective of language, and including Rust) have some codepaths in which programs can terminate where the assumed variants aren't upheld, so in practice that's often an acceptable behaviour, as long as the defect rate is low enough.
Of course there is also a class of programs for which that behaviour is not acceptable, and in those cases Fil-C (along with most other languages, including Rust absent significant additional tooling) isn't appropriate.
> Rust which replaces memory unsafety with a compile error
Rust uses panics for out-of-bounds access protection.
The benefit of dynamic safety checking is that it's more precise. There's a large class of valid programs that are not unsafe that will run fine in Fil-C but won't compile in Rust.
I don't think it's odd statement. It's not about segfaults, but use-after-free (and similar) bugs, which don't crash in C, but do crash in Fil-C. With Fil-C, if there is such a bug, it will crash, but if the density of such bugs is low enough, it is tolerable: it will just crash the program, but will not cause an expensive and urgent CVE ticket. The bug itself may still need to be fixed.
The paragraph refers to detecting such bugs during compilation versus crashing at runtime. The "almost all programs have paths that crash" means all programs have a few bugs that can cause crashes, and that's true. Professional coders do not attempt to write 100% bug-free code, as that wouldn't be efficient use of the time. Now the question is, should professional coders convert the (existing) C code to eg. Rust (where likely the compiler detects the bug), or should he use Fil-C, and so safe the time to convert the code?
Doesn't Fil-C use a garbage collector to address use-after-free? For a real use-after-free to be possible there must be some valid pointer to the freed allocation, in which case the GC just keeps it around and there's no overt crash.
> it will just crash the program, but will not cause an expensive and urgent CVE ticket.
Unfortunately, security hysteria also treats any crash as "an expensive and urgent CVE ticket". See, for instance, ReDoS, where auditors will force you to update a dependency even if there's no way for a user to provide the vulnerable input (for instance, it's fixed in the configuration file).
I think what you've written is pretty much what the "almost all programs have paths that crash" was intended to convey.
I think "perhaps the density of crashes will be tolerable" means something like "we can reasonably hope that the crashes from Fil-C's memory checks will only be of the same sort, that aren't reached when the program is used as it should be".
I think the point is that Fil-C makes programs crash which didn't crash before because use-after-free didn't trigger a segfault. If anything, I'd cite Redis as an example that you can build a safe C program if you go above and beyond in engineering effort... most software doesn't, sadly.
Redis uses a whole lot of fiddly data structures that turn out to involve massive amounts of unsafe code even in Rust. You'd need to use something like Frama-C to really prove it safe beyond reasonable doubt. (Or the Rust equivalents that are currently in the works, and being used in an Amazon-funded effort to meticulously prove soundness of the unsafe code in libstd.) Compiling it using Fil-C is a nice academic exercise but not really helpful, since the whole point of those custom data structures is peak performance.
It is a question of probability and effort. My personal estimation rule for my type of projects is it takes 3 times longer from my prototype to something I‘m comfortable having others use it and another factor to get to an early resemblance of a product. A recent interview I read an AI expert said each 9 in terms of error probability is the same effort.
Most software written does not serve a serious nation level user base but caters to so a relatively small set of users. The effort spent eradicating errors needs to be justified by the effort of workarounds, remediation work and customer impact. Will not be fixed can a rationale decision.
I think the focus should be on tools with high surface area that enforce security bounadaries. Especially those where performance is not so important. Like sudo, openssh, polkit, PAM modules. It would be make a lot more sense than these half-baked rust rewrites that just take away features. (I'm biased I personally had a backup script broken by uutils) I think rewrites in rust need 100% bit for bit feature parity before replacing the battletested existing tools in c userland. I say this as someone who writes rust security tools for linux.
A lot of my programs crash, and that’s a deliberate choice. If you call one of them like “./myprog.py foo.txt”, and foo.txt doesn’t exist, it’ll raise a FileNotFound exception and fail with a traceback. Thing is, that’s desirable here. If could wrap that in a try/except block, but I’d either be adding extraneous info (“print(‘the file does not exist’); raise”) or throwing away valuable info by swallowing the traceback so the user doesn’t see the context of what failed.
My programs can’t do anything about that situation, so let it crash.
Same logic for:
* The server in the config file doesn’t exist.
* The given output file has bad permissions.
* The hard drive is full.
Etc. And again, that’s completely deliberate. There’s nothing I can do in code to fix those issues, so it’s better to fail with enough info that the user can diagnose and fix the problem.
That was in Python. I do the same in Rust, again, deliberately. While of course we all handle the weird cases we’re prepared to handle, I definitely write most database calls like “foo = db.exec(query)?” because if PostgreSQL can’t execute the query, the safest option is to panic instead of trying foolhardily to get back the last known safe state.
And of course that’s different for different use cases. If you’re writing a GUI app, it makes much more sense to pop up a dialog and make the user go fix the issue before retrying.
That strategy is ok if the expected user is a fellow developer. For anyone else, a back trace is spilling your guts on the floor and expecting the user to clean up. A tool for wider usage should absolutely detect the error condition and explain the problem as succinctly as possible with all the information necessary for a human to be able to solve the issue. Back trace details are extraneous information only useful for a software developer familiar with the codebase. There's of course a difference when talking about unexpected incorrect state, something like a fatal filesystem or allocation error that shouldn't happen unless the environment is in an invalid state, a nice human error then is not as necessary.
> Mature C programs written by professional coders (Redis is a good example) basically never crash in the experience of users
That is a very difficult assertion to validate. It might well be true! But so many conversations about memory safety and C/C++ devolve to assertions with “get gud” at one extreme and “change platforms to one that avoids certain errors” at the other.
Without data, even iffy data, those groups talk past each other. Are memory-error CVE counts on C projects the data we need here? Is there some other quantitative measure of real world failures that occur due to memory unsafety?
This is all by way of saying that I’d love to see some numbers there. That’s not on you, or meant to question your claim. As you implied, errors in code don’t always translate to errors in behavior for users.
It just always sucks to talk about this because broad-spectrum quantitative data on software error rates and their causes is lacking.
Keep in mind he's limited his assertion to UX. That narrow point is almost certainly true in the case of his C codebase.
But read the rest-- he literally wrote how security researchers find memory safety errors in C codebases!
Dollars to donuts he came up with this UX-on-accident vs. security-researcher-on-purpose bug dichotomy in his head as a response to some actual CVE in his own C codebase.
In short, he's agreeing with the research that led to programming languages like Rust in the first place. And even though he's found an odd way to agree, there's no assertion to validate here (at least wrt security).
I heard this argument about Rust vs. C that Rust might be memory safe, but the reason why memory safety issues are so prominent in C programs, is that basically every other kind of problem has been fixed throughout its lifetime, so these are the only kind of issues that remain. Both in terms of security and stability.
This is very much not the case for programs that are much newer, even if they are written in Rust they still need years of maturation before they reach the quality of older C programs, as Rust programs suffer from non-memory safety issues just as much. That's why just rewriting things in Rust isn't a panacea.
The perfect example of this the Rust coreutils drama that has been going on.
I don't agree with that assessment at all. The reason memory safety issues are so prominent is that they are extremely likely to be exploitable. Of course you can write exploitable bugs in any language, but most bug classes are unlikely to be exploitable. A bug that always crashes is about a trillion times less severe than a bug that allows someone else to take control of your computer.
I can only quote (from the top of my head) the Android team's findings, that having a C++ codebase extended with Rust cut down significantly on the number of memory safety-related issues. The reasoning was that since the stable C++ codebase was no longer actively changed, only patched, and new features were implemented in Rust, the C++ codebase could go through this stabilization phase where almost all safety issues are found.
How many "mature C programs" try to recover in a usable way when malloc() returns NULL? That's a crash - a well-behaved one (no UB involved) hence not one that would be sought by most attackers other than a mere denial of service - but still a crash.
On 64-bit systems (esp Linux ones) malloc almost never returns NULL but keeps overallocating (aka overcommiting). You don't get out of memory errors / kills until you access it.
One divide when it comes to using Fil-C is C as an application (git) vs C as a library from another language (libgit2).
Suppose we assume that many C applications aren’t performance sensitive and can easily take a 2-4x performance hit without noticing. Browsers and OS internals being obvious exceptions. The ideal candidates are like the ones djb writes, and he’s already a convert to Fil-C. sudo, sshd, curl - all seem like promising candidates.
But as far as I can tell, Fil-C doesn’t work for C libraries that can be called from elsewhere. Even if it could be made to work, the reason other languages like Python or Node use C libraries is for speed. If they were ok with it being 2-4x slower, they would just write ordinary Python or Javascript.
C (and C++) are fundamentally important because of their use in performance sensitive contexts like operating systems, browsers and libraries. If we’re restricting Fil-C to pure C/C++ applications that aren’t performance sensitive, that might still be very important and useful, but it’s a small slice of the large C/C++ pie.
Also, it’s a great tool for an existing C application, certainly. A performance hit in exchange for security is a reasonable trade off while making a battle hardened application work. But for a new application, would someone choose Fil-C over other performant GC languages like Go or Java or C#? I’d be keen to hear why.
Still, I want to stress - this is a great project and it’ll generate a lot of value.
> If they were ok with it being 2-4x slower, they would just write ordinary Python or Javascript.
Python and JavaScript are much more than 4x slower than C/C++ for workloads that are git-like (significant amount of compute, not just I/O bound)
> C (and C++) are fundamentally important because of their use in performance sensitive contexts like operating systems, browsers and libraries
That's a fun thing to say but it isn't really true. C/C++ are fundamentally important for lots of reasons. In many cases, folks choose C and C++ because that's the only way to get access to the APIs you need to get the job done.
Yeah I suppose that’s the real question - what is the size of the target market of Fil-C? I believe it’s a small % because many C applications are performance sensitive. You say performance is only one consideration among many so the % of C applications that would benefit from your compiler are much higher.
You’re the guy actually building this, so you would have talked to potential customers more than me. You’re more likely to be correct, but I would be interested to know what these applications are.
Also, do you hope people pick Fil-C for a new code base over other performant GC languages like Go, Java and C#? These have less than a 2x overhead for CPU bound tasks.
Why can't it work? You need to assume that the C library is only ever passed well-behaved pointers and callbacks in order to avoid invoking UB that it can't know about - but other than that it's just a matter of marshaling from the usual C ABI to the Fil-C ABI, which should be doable.
I’m assuming the calling program is a GC language like Python or Node (the most popular run times by far), but the same holds with other popular languages like Ruby. Why would a GC language call out to slow code, that runs its own separate GC. Now you have two GCs running, neither of which knows about the other. I’m not declaring it’s impossible, I’m asking why someone would want to do this.
An example: GitHub’s entire business revolves around calling libgit2 (C) from Ruby. Are they more likely to slow down libgit2 and make it substantially more complex by running 2 GCs side by side, or are they going to risk accept any potential unsafety in regular C? It’s 100% the latter, I’ll bet on that.
Yes, safety got more important, and it's great to support old C code in a safe way. The performance drop and specially the GC of Fil-C do limit the usage however. I read there are some ideas for Fil-C without GC; I would love to hear more about that!
But all existing programming languages seem to have some disadvange: C is fast but unsafe. Fil-C is C compatible but requires GC, more memory, and is slower. Rust is fast, uses little memory, but us verbose and hard to use (borrow checker). Python, Java, C# etc are easy to use, concise, but, like Fil-C, require tracing GC and so more memory, and are slow.
I think the 'perfect' language would be as concise as Python, statically typed, not require tracing GC like Swift (use reference counting), support some kind of borrow checker like Rust (for the most performance critical sections). And leverage the C ecosystem, by transpiling to C. And so would run on almost all existing hardware, and could even be used in the kernel.
These might all be slower than well written C or rust, but they're not nearly the same magnitude of slow. Java is often within a magnitude of C/C++ in practice, and threading is less of a pain. Python can easily be 100x slower, and until very recently, threading wasn't even an option for more CPU due to the GIL so you needed extra complexity to deal with that
There's also Golang, which is in the same ballpark as java and c
You are right, languages with tracing GC are fast. Often, they are faster than C or Rust, if you measure peak performance of a micro-benchmark that does a lot of memory management. But that is only true if you just measure the speed of the main thread :-) Tracing garbage collection does most of the work in separate threads, and so is often not visible in benchmarks. Memory usage is also not easily visible, but languages with tracing GC need about twice the amount of memory than eg. C or Rust. (When using an area allocator in C, you can get faster, at the cost of memory usage.)
Yes, Python is specially slow, but I think it's probably more because it's dynamically typed, and not not compiled. I found PyPy is quite fast.
> The performance drop and specially the GC of Fil-C do limit the usage however. I read there are some ideas for Fil-C without GC; I would love to hear more about that!
I love how people assume that the GC is the reason for Fil-C being slower than C and that somehow, if it didn't have a GC, it wouldn't be slower.
Well I didn't mean GC is the reason for Fil-C being slower. I mean the performance drop of Fil-C (as described in the article) limits the usage, and the GC (independently) limits the usage.
I understand raw speed (of the main thread) of Fil-C can be faster with tracing GC than Fil-C without. But I think there's a limit on how fast and memory efficient Fil-C can get, given it necessarily has to do a lot of things at runtime, versus compile time. Energy usage, and memory usage or a programming language that uses a tracing GC is higher than one without. At least, if memory management logic can be done at compile time.
For Fil-C, a lot of the memory management logic, and checks, necessarily needs to happen at runtime. Unless if the code is annotated somehow, but then it wouldn't be pure C any longer.
Nim fits most of those descriptors, and it’s become my favorite language to use. Like any language, it’s still a compromise, but it sits in a really nice spot in terms of compromises, at least IMO. Its biggest downsides are all related to its relative “obscurity” (compared to the other mentioned languages) and resulting small ecosystem.
The advantage of Fil-C is that it's C, not some other language. For the problem domain it's most suited to, you'd do C/C++, some other ultra-modern memory-safe C/C++ system, or Rust.
I agree. Nim is memory safe, concise, and fast. In my view, Nim lacks a very clear memory management strategy: it supports ARC, ORC, manual (unsafe) allocation, move semantics. Maybe supporting viewer options would be better? Usually, adding things that are lacking is easier than removing features, specially if the community is small and if you don't want to alienate too many people.
Yes, they might lose the meaningless benchmarks game that gets thrown around, what matters is are they fast enough for the problem that is being solved.
If everyone actually cared about performance above anything else, we wouldn't have an Electron crap crisis.
Seems like Windows is trying to address the Electron problem by adopting React Native for their WinAppSDK. RN is not just a cross-platform solution, but a framework that allows Windows to finally tap into the pool of devs used to that declarative UI paradigm. They appear to be standardizing on TypeScript, with C++ for the performance-critical native parts. They leverage the scene graph directly from WinAppSDK. By prioritizing C++ over C# for extensions and TS for the render code, they might actually hit the sweet spot.
I don't know I think what matters is that performance is close to the best you can reasonably get in any other language.
People don't like leaving performance on the table. It feels stupid and it lets competitors have an easy advantage.
The Electron situation is not because people don't care about performance; it's because they care more about some other things (e.g. not having to do 4x the work to get native apps).
> And leverage the C ecosystem, by transpiling to C
I heavily doubt that this would work on arbitrary C compilers reliably as the interpretation of the standard gets really wonky and certain constructs that should work might not even compile. Typically such things target GCC because it has such a large backend of supported architectures. But LLVM supports a large overlapping number too - thats why it’s supported to build the Linux kernel under clang and why Rust can support so many microcontrollers. For Rust, that’s why there’s the rust codegen gcc effort which uses GCC as the backend instead of LLVM to flush out the supported architectures further. But generally transpiration is used as a stopgap for anything in this space, not an ultimate target for lots of reasons, not least of which that there’s optimizations that aren’t legal in C that are in another language that transpilation would inhibit.
> Rust is fast, uses little memory, but us verbose and hard to use (borrow checker).
It’s weird to me that my experience is that it was as hard to pick up the borrow checker as the first time I came upon list comprehension. In essence it’s something new I’d never seen before but once I got it it went into the background noise and is trivial to do most of the time, especially since the compiler infers most lifetimes anyway. Resistance to learning is different than being difficult to learn.
Well "transpiling to C" does include GCC and clang, right? Sure, trying to support _all_ C compilers is nearly impossible, and not what I mean. Quite many languages support transpiling to C (even Go and Lua), but in my view that alone is not sufficient for a C replacement in places like the Linux kernel: for this to work, tracing GC can not be used. And this is what prevents Fil-C and many other languages to be used in that area.
Rust borrow checker: the problem I see is not so much that it's hard to learn, but requires constant effort. In Rust, you are basically forced to use it, even if the code is not performance critical. Sure, Rust also supports reference counting GC, but that is more _verbose_ to use... It should be _simpler_ to use in my view, similar to Python. The main disadvantage of Rust, in my view, is that it's verbose. (Also, there is a tendency to add too many features, similar to C++, but that's a secondary concern).
There are surprisingly many languages that support transpiling to C: Python (via Cython), Go (via TinyGo), Lua (via eLua), Nim, Zig, Vlang. The main advantage (in my view) is to support embedded systems, which might not match your use case.
I suppose /some/ performance loss is inevitable. But this could be quite a game changer. As more folks play with it, performing benchmarks, etc -- it should reveal which C idioms incur the most/least performance hits under Fil-C. So with some targetted patching of C code, we may end up with a rather modest price for the memory safety
And I'm not done optimizing. The perf will get better. Rust and Yolo-C will always be faster, but right now we can't know what the difference will be.
Top optimization opportunities:
- InvisiCaps 2.0. While implementing the current capability model, when I was about 3/4 of the way done with the rewrite, I realized that if I had done it differently I would have avoided two branch+compares on every pointer load. That's huge! I just haven't had the appetite for doing yet another rewrite recently. But I'll do it eventually.
- ABI. Right now, Fil-C uses a binary interface that relies on lowering to what ELF is capable of. This introduces a bunch of overhead on every global variable access and every function call. All of this goes away if Fil-C gets its own object file format. That's a lot of work, but it will happen in Fil-C gets more adoption.
- Better abstract interpreter. Fil-C already has an abstract interpreter in the compiler, but it's not nearly as smart as it could be. For example, it doesn't have octagon domain yet. Giving it octagon domain will dramatically improve the performance of loops.
- More intrinsics. Right now, a lot of libc functions that are totally memory safe but are implemented in assembly are implemented in plain Fil-C instead right now, just because of how the libc ports happened to work out. Like, say you call some <math.h> function that takes doubles and returns doubles - it's going to be slower in Fil-C today because you'll end up in the generic C code version compiled with Fil-C. No good reason for this! It's just grunt work to fix!
- The calling convention itself is trash right now - it involves passing things through a thread-local buffer. It's less trashy than the calling convention I started out with (that allocated everything in the heap lmao), but still. There's nothing fundamentally preventing a Fil-C register-based calling convention, but it would take a decent amount of work to implement.
There are probably other perf optimization opportunities that I'm either forgetting right now or that haven't been found yet. It's still early days!
I've always been firmly in the 'let it crash' camp for bugs, the sooner and the closer to the offending piece of code you can generate a crash the better. Maybe it would be possible to embed Fil-C in a test-suite combined with a fuzzing like tool that varies input to try really hard to get a program to trigger an abend. As long as it is possible to fuzz your way to a crash in Fil-C that would be a sign that there is more work to do.
That way 'passes Fil-C' would be a bit like running code under valgrind and move the penalty to the development phase rather than the runtime. Is this feasible or am I woolgathering, and is Fil-C only ever going to work by using it to compile the production code?
graydon points in that direction, but since you're here: how feasible is a hypothetical Fil-Unsafe-Rust? would you need to compile the whole program in Fil-Rust to get the benefits of Fil-Unsafe-Rust?
If you are not writing anything performance sensitive, you shouldn't be using C in the first place. Even if Fil-C greatly reduces its overhead, I can't see it ever being a good idea for actual release builds.
As a Linux user of two decades, memory safety has never been a major issues that I would be willing to trade performance for. It doesn't magically make my application work it just panics instead of crashes, same end result for me. It just makes it so the issue can not be exploited by an attacker. Which is good but like Linux has been already safe enough to be the main choice to run on servers so meh. The whole memory safety cult is weird.
I guess Fil-C could have a place in the testing pipeline. Run some integration tests on builds made with it and see if stuff panics.
That said, Fil-C is a super cool projects. I don't mean to throw any shades at it.
People with Linux servers keep getting hacked so idk if I buy the argument “if it’s in use it’s good enough”. That’s like saying “everyone else runs Pentium 2, why would I upgrade to Pentium 3?”
Getting a "not available in your state" page, does anyone have an archive? I've only recently tried out fil-c and hope to use it in some work projects.
https://www.doc.ic.ac.uk/~phjk/BoundsChecking.html
The provenance model for C is very recent (and still a TS, not part of the standard). Prior to that, there was a vague notion that the C abstract machine has quasi-segmented memory (you aren't really allowed to do arithmetic on a pointer to an "object" to reach a different "object") but this was not clearly stated in usable terms.
Here’s a rough timeline:
- 2004-2018: I had ideas of how to do it but I thought the whole premise (memory safe C) was idiotic.
- 2018-2023: I no longer thought the premise was idiotic but I couldn’t find a way to do it that would result in fanatical compatibility.
- 2023-2024: early Fil-C versions that were much less compatible and much less performant
- end of 2024: InvisiCaps breakthrough that gives current fanatical compatibility and “ok” performance.
It’s a hard problem. Lots of folks have tried to find a way to do it. I’ve tried many approaches before finding the current one.
I'm interested in implementing a safe low-level language with less static information around than C has (e.g. no static pointer-int distinction), but I'd rather keep around the ability to restrict capabilities to only refer to subobjects than have the same compatibility guarantees Invisicaps provide, so I was hoping to look into Monocaps (or maybe another design, if there's one that might fit better).
If a hypothetical time machine allowed you to send the InvisiCaps idea back to your 2004-era self, do you think the approach would have been feasible back then as well?
Long long ago, in 2009, Graydon was my official on-boarding mentor when I joined the Mozilla Javascript team. Rust already existed then but, as he notes, was quite different then. For one thing, it was GC'd, like Fil-C. Which I like -- I write a lot of my C/C++ code using Boehm GC, have my own libraries designed knowing GC is there, etc.
This has obviously been 'rust'ling some feathers, as it challenges some of the arguments laid past; but once the dust settles, it is a major net benefit to the community.
I hope you get financed and can support other platforms than linux again.
I'm a Rust user and a fan. But memory safe C is actually an exciting prospect. I was hoping that the rise of Rust would encourage others to prioritize memory safety and come up with approaches that are much more ergonomic to the developers.
> as it challenges some of the arguments laid past
Genuinely curious. What are the assumptions you have in mind that Fil-C challenges? (This isn't a rhetorical question. I'm just trying to understand memory safety concepts better.)
> but once the dust settles, it is a major net benefit to the community.
Agreed, this is big! If Fil-C can fulfill its promise to make old C code memory safe, it will be a massive benefit to the world. God knows how many high-consequnce bugs and vulnerabilities hide in those.
This is a very odd statement. Mature C programs written by professional coders (Redis is a good example) basically never crash in the experience of users. Crashing, in such programs, is a rare occurrence mostly obtained by attackers on purpose, looking for code paths that generate a memory error that - if the program is used as it should - are never reached.
This does not mean that C code never segfaults: it happens, especially when developed without care and the right amount of testing. But the code that is the most security sensitive, like C Unix servers, is high quality and crashes are mostly a security problem and a lot less a stability problem.
I think if you understand the meaning of "crash" to include any kind of unhandled state that causes the program to terminate execution then it includes things like unwrapping a None value in Rust or any kind of uncaught exception in Python.
That interpretation makes sense to me in terms of the point he's making: Fil-C replaces memory unsafety with program termination, which is strictly worse than e.g. (safe) Rust which replaces memory unsafety with a compile error. But it's also true that most programs (irrespective of language, and including Rust) have some codepaths in which programs can terminate where the assumed variants aren't upheld, so in practice that's often an acceptable behaviour, as long as the defect rate is low enough.
Of course there is also a class of programs for which that behaviour is not acceptable, and in those cases Fil-C (along with most other languages, including Rust absent significant additional tooling) isn't appropriate.
Rust uses panics for out-of-bounds access protection.
The benefit of dynamic safety checking is that it's more precise. There's a large class of valid programs that are not unsafe that will run fine in Fil-C but won't compile in Rust.
The paragraph refers to detecting such bugs during compilation versus crashing at runtime. The "almost all programs have paths that crash" means all programs have a few bugs that can cause crashes, and that's true. Professional coders do not attempt to write 100% bug-free code, as that wouldn't be efficient use of the time. Now the question is, should professional coders convert the (existing) C code to eg. Rust (where likely the compiler detects the bug), or should he use Fil-C, and so safe the time to convert the code?
Unfortunately, security hysteria also treats any crash as "an expensive and urgent CVE ticket". See, for instance, ReDoS, where auditors will force you to update a dependency even if there's no way for a user to provide the vulnerable input (for instance, it's fixed in the configuration file).
I think "perhaps the density of crashes will be tolerable" means something like "we can reasonably hope that the crashes from Fil-C's memory checks will only be of the same sort, that aren't reached when the program is used as it should be".
It's provably safer than rust, e.g.
Most software written does not serve a serious nation level user base but caters to so a relatively small set of users. The effort spent eradicating errors needs to be justified by the effort of workarounds, remediation work and customer impact. Will not be fixed can a rationale decision.
My programs can’t do anything about that situation, so let it crash.
Same logic for:
* The server in the config file doesn’t exist.
* The given output file has bad permissions.
* The hard drive is full.
Etc. And again, that’s completely deliberate. There’s nothing I can do in code to fix those issues, so it’s better to fail with enough info that the user can diagnose and fix the problem.
That was in Python. I do the same in Rust, again, deliberately. While of course we all handle the weird cases we’re prepared to handle, I definitely write most database calls like “foo = db.exec(query)?” because if PostgreSQL can’t execute the query, the safest option is to panic instead of trying foolhardily to get back the last known safe state.
And of course that’s different for different use cases. If you’re writing a GUI app, it makes much more sense to pop up a dialog and make the user go fix the issue before retrying.
That is a very difficult assertion to validate. It might well be true! But so many conversations about memory safety and C/C++ devolve to assertions with “get gud” at one extreme and “change platforms to one that avoids certain errors” at the other.
Without data, even iffy data, those groups talk past each other. Are memory-error CVE counts on C projects the data we need here? Is there some other quantitative measure of real world failures that occur due to memory unsafety?
This is all by way of saying that I’d love to see some numbers there. That’s not on you, or meant to question your claim. As you implied, errors in code don’t always translate to errors in behavior for users.
It just always sucks to talk about this because broad-spectrum quantitative data on software error rates and their causes is lacking.
Keep in mind he's limited his assertion to UX. That narrow point is almost certainly true in the case of his C codebase.
But read the rest-- he literally wrote how security researchers find memory safety errors in C codebases!
Dollars to donuts he came up with this UX-on-accident vs. security-researcher-on-purpose bug dichotomy in his head as a response to some actual CVE in his own C codebase.
In short, he's agreeing with the research that led to programming languages like Rust in the first place. And even though he's found an odd way to agree, there's no assertion to validate here (at least wrt security).
Edit: clarifications
This is very much not the case for programs that are much newer, even if they are written in Rust they still need years of maturation before they reach the quality of older C programs, as Rust programs suffer from non-memory safety issues just as much. That's why just rewriting things in Rust isn't a panacea.
The perfect example of this the Rust coreutils drama that has been going on.
Wrong, dereferencing a NULL pointer is UB.
Suppose we assume that many C applications aren’t performance sensitive and can easily take a 2-4x performance hit without noticing. Browsers and OS internals being obvious exceptions. The ideal candidates are like the ones djb writes, and he’s already a convert to Fil-C. sudo, sshd, curl - all seem like promising candidates.
But as far as I can tell, Fil-C doesn’t work for C libraries that can be called from elsewhere. Even if it could be made to work, the reason other languages like Python or Node use C libraries is for speed. If they were ok with it being 2-4x slower, they would just write ordinary Python or Javascript.
C (and C++) are fundamentally important because of their use in performance sensitive contexts like operating systems, browsers and libraries. If we’re restricting Fil-C to pure C/C++ applications that aren’t performance sensitive, that might still be very important and useful, but it’s a small slice of the large C/C++ pie.
Also, it’s a great tool for an existing C application, certainly. A performance hit in exchange for security is a reasonable trade off while making a battle hardened application work. But for a new application, would someone choose Fil-C over other performant GC languages like Go or Java or C#? I’d be keen to hear why.
Still, I want to stress - this is a great project and it’ll generate a lot of value.
Python and JavaScript are much more than 4x slower than C/C++ for workloads that are git-like (significant amount of compute, not just I/O bound)
> C (and C++) are fundamentally important because of their use in performance sensitive contexts like operating systems, browsers and libraries
That's a fun thing to say but it isn't really true. C/C++ are fundamentally important for lots of reasons. In many cases, folks choose C and C++ because that's the only way to get access to the APIs you need to get the job done.
You’re the guy actually building this, so you would have talked to potential customers more than me. You’re more likely to be correct, but I would be interested to know what these applications are.
Also, do you hope people pick Fil-C for a new code base over other performant GC languages like Go, Java and C#? These have less than a 2x overhead for CPU bound tasks.
An example: GitHub’s entire business revolves around calling libgit2 (C) from Ruby. Are they more likely to slow down libgit2 and make it substantially more complex by running 2 GCs side by side, or are they going to risk accept any potential unsafety in regular C? It’s 100% the latter, I’ll bet on that.
But all existing programming languages seem to have some disadvange: C is fast but unsafe. Fil-C is C compatible but requires GC, more memory, and is slower. Rust is fast, uses little memory, but us verbose and hard to use (borrow checker). Python, Java, C# etc are easy to use, concise, but, like Fil-C, require tracing GC and so more memory, and are slow.
I think the 'perfect' language would be as concise as Python, statically typed, not require tracing GC like Swift (use reference counting), support some kind of borrow checker like Rust (for the most performance critical sections). And leverage the C ecosystem, by transpiling to C. And so would run on almost all existing hardware, and could even be used in the kernel.
These might all be slower than well written C or rust, but they're not nearly the same magnitude of slow. Java is often within a magnitude of C/C++ in practice, and threading is less of a pain. Python can easily be 100x slower, and until very recently, threading wasn't even an option for more CPU due to the GIL so you needed extra complexity to deal with that
There's also Golang, which is in the same ballpark as java and c
Yes, Python is specially slow, but I think it's probably more because it's dynamically typed, and not not compiled. I found PyPy is quite fast.
https://benchmarksgame-team.pages.debian.net/benchmarksgame/...
I love how people assume that the GC is the reason for Fil-C being slower than C and that somehow, if it didn't have a GC, it wouldn't be slower.
Fil-C is slower than C because of InvisiCaps. https://fil-c.org/invisicaps
The GC is is crazy fast and fully concurrent/parallel. https://fil-c.org/fugc
Removing the GC is likely to make Fil-C slower, not faster.
I understand raw speed (of the main thread) of Fil-C can be faster with tracing GC than Fil-C without. But I think there's a limit on how fast and memory efficient Fil-C can get, given it necessarily has to do a lot of things at runtime, versus compile time. Energy usage, and memory usage or a programming language that uses a tracing GC is higher than one without. At least, if memory management logic can be done at compile time.
For Fil-C, a lot of the memory management logic, and checks, necessarily needs to happen at runtime. Unless if the code is annotated somehow, but then it wouldn't be pure C any longer.
Yes, they might lose the meaningless benchmarks game that gets thrown around, what matters is are they fast enough for the problem that is being solved.
If everyone actually cared about performance above anything else, we wouldn't have an Electron crap crisis.
https://microsoft.github.io/react-native-windows/docs/new-ar...
People don't like leaving performance on the table. It feels stupid and it lets competitors have an easy advantage.
The Electron situation is not because people don't care about performance; it's because they care more about some other things (e.g. not having to do 4x the work to get native apps).
I heavily doubt that this would work on arbitrary C compilers reliably as the interpretation of the standard gets really wonky and certain constructs that should work might not even compile. Typically such things target GCC because it has such a large backend of supported architectures. But LLVM supports a large overlapping number too - thats why it’s supported to build the Linux kernel under clang and why Rust can support so many microcontrollers. For Rust, that’s why there’s the rust codegen gcc effort which uses GCC as the backend instead of LLVM to flush out the supported architectures further. But generally transpiration is used as a stopgap for anything in this space, not an ultimate target for lots of reasons, not least of which that there’s optimizations that aren’t legal in C that are in another language that transpilation would inhibit.
> Rust is fast, uses little memory, but us verbose and hard to use (borrow checker).
It’s weird to me that my experience is that it was as hard to pick up the borrow checker as the first time I came upon list comprehension. In essence it’s something new I’d never seen before but once I got it it went into the background noise and is trivial to do most of the time, especially since the compiler infers most lifetimes anyway. Resistance to learning is different than being difficult to learn.
Rust borrow checker: the problem I see is not so much that it's hard to learn, but requires constant effort. In Rust, you are basically forced to use it, even if the code is not performance critical. Sure, Rust also supports reference counting GC, but that is more _verbose_ to use... It should be _simpler_ to use in my view, similar to Python. The main disadvantage of Rust, in my view, is that it's verbose. (Also, there is a tendency to add too many features, similar to C++, but that's a secondary concern).
Top optimization opportunities:
- InvisiCaps 2.0. While implementing the current capability model, when I was about 3/4 of the way done with the rewrite, I realized that if I had done it differently I would have avoided two branch+compares on every pointer load. That's huge! I just haven't had the appetite for doing yet another rewrite recently. But I'll do it eventually.
- ABI. Right now, Fil-C uses a binary interface that relies on lowering to what ELF is capable of. This introduces a bunch of overhead on every global variable access and every function call. All of this goes away if Fil-C gets its own object file format. That's a lot of work, but it will happen in Fil-C gets more adoption.
- Better abstract interpreter. Fil-C already has an abstract interpreter in the compiler, but it's not nearly as smart as it could be. For example, it doesn't have octagon domain yet. Giving it octagon domain will dramatically improve the performance of loops.
- More intrinsics. Right now, a lot of libc functions that are totally memory safe but are implemented in assembly are implemented in plain Fil-C instead right now, just because of how the libc ports happened to work out. Like, say you call some <math.h> function that takes doubles and returns doubles - it's going to be slower in Fil-C today because you'll end up in the generic C code version compiled with Fil-C. No good reason for this! It's just grunt work to fix!
- The calling convention itself is trash right now - it involves passing things through a thread-local buffer. It's less trashy than the calling convention I started out with (that allocated everything in the heap lmao), but still. There's nothing fundamentally preventing a Fil-C register-based calling convention, but it would take a decent amount of work to implement.
There are probably other perf optimization opportunities that I'm either forgetting right now or that haven't been found yet. It's still early days!
I've always been firmly in the 'let it crash' camp for bugs, the sooner and the closer to the offending piece of code you can generate a crash the better. Maybe it would be possible to embed Fil-C in a test-suite combined with a fuzzing like tool that varies input to try really hard to get a program to trigger an abend. As long as it is possible to fuzz your way to a crash in Fil-C that would be a sign that there is more work to do.
That way 'passes Fil-C' would be a bit like running code under valgrind and move the penalty to the development phase rather than the runtime. Is this feasible or am I woolgathering, and is Fil-C only ever going to work by using it to compile the production code?
A lot of remarkably unusual stuff has been shoved into the format without breaking the tooling, so wondering what the restrictions are.
graydon points in that direction, but since you're here: how feasible is a hypothetical Fil-Unsafe-Rust? would you need to compile the whole program in Fil-Rust to get the benefits of Fil-Unsafe-Rust?
As a Linux user of two decades, memory safety has never been a major issues that I would be willing to trade performance for. It doesn't magically make my application work it just panics instead of crashes, same end result for me. It just makes it so the issue can not be exploited by an attacker. Which is good but like Linux has been already safe enough to be the main choice to run on servers so meh. The whole memory safety cult is weird.
I guess Fil-C could have a place in the testing pipeline. Run some integration tests on builds made with it and see if stuff panics.
That said, Fil-C is a super cool projects. I don't mean to throw any shades at it.
Then why are all of the IO-bound low level pieces of Linux userland written in C?
Take just one example: udevd. I have a Fil-C version. There is zero observable difference in performance.