Back in the 90s, I implemented precompiled headers for my C++ compiler (Symantec C++). They were very much like modules. There were two modes of operation:
1. all the .h files were compiled, and emitted as a binary that could be rolled in all at once
2. each .h file created its own precompiled header. Sounds like modules, right?
Anyhow, I learned a lot, mostly that without semantic improvements to C++, while it made compilation much faster, it was too sensitive to breakage.
This experience was rolled into the design of D modules, which work like a champ. They were everything I wanted modules to be. In particular,
The semantic meaning of the module is completely independent of wherever it is imported from.
Anyhow, C++ is welcome to adopt the D design of modules. C++ would get modules that have 25 years of use, and are very satisfying.
Yes, I do understand that the C preprocessor macros are a problem. My recommendation is, find language solutions to replace the preprocessor. C++ is most of the way there, just finish the job and relegate the preprocessor to the dustbin.
> just finish the job and relegate the preprocessor to the dustbin.
Yup, I think this is the core of the problem with C++. The standards committee has drawn a bad line that makes encoding the modules basically impossible. Other languages with good module systems and fast incremental builds don't allow for preprocessor style craziness without some pretty strict boundaries. Even languages that have gotten it somewhat wrong (such as rust with it's proc macros) have bound where and how that sort of metaprogramming can take place.
Even if the preprocessor isn't dustbined, it should be excluded from the module system. Metaprogramming should be a feature of the language with clear interfaces and interactions. For example, in Java the annotation processor is ultimately what triggers code generation capabilities. No annotation, no metaprogramming. It's not perfect, but it's a lot better than the C/C++'s free for all macro system.
Or the other option is the go route. Don't make the compiler generate code, instead have the build system be responsible for code generation (calling code generators). That would be miles better as it'd allow devs to opt in to that slowdown when they need it.
I can't think of a C++ project I've worked on that didn't rely on being able to include C headers and have things usually just work. Are there ways of banning C macros from "modular" C++ without breaking that? (Many would find it unacceptable if you had to go through every C dependency and write/generate some sort of wrapper.)
D resolved this problem by creating D versions of the C system headers.
Yes, this was tedious, but we do it for each of our supported platforms.
But we can't do it for various C libraries. This created a problem for us, as it is indeed tedious for users. We created a repository where people shared their conversions, but it was still inadequate.
The solution was to build a C compiler into the D compiler. Now, you can simply "import" a C .h file. It works surprisingly well. Sure, some things don't work, as C programmers cannot resist put some really crazy stuff in the .h files. The solution to that problem turned out be we discovered that the D compiler was able to create D modules from C code. Then, the user could tweak by hand the nutburger bits.
Did anyone reach out to you for input during the modules standardization process? D seems like the most obvious prior art, but the modules standardization process seems like it was especially cursed
Nobody from C++ reached out to me for the modules.
Herb Sutter, Andrei Alexandrescu and myself once submitted an official proposal for "static if" for C++, based on the huge success it has had in D. We received a vehement rejection. It demotivated me from submitting further proposals. ("static if" replaces the C preprocessor #if/#ifdef/#ifndef constructions.)
C++ has gone on to adopt many features of D, but usually with modifications that make them less useful.
I've often wondered how the evolution of C and C++ might have been different if a more capable preprocessor (in particular, with more flexible recursive expansion and a better grammar for pattern matching) had caught on. The C++ template engine can be used to work around some of those limits, but always awkwardly, not least due to the way you need Knuth's arrow notation to express the growth in compiler error message volume with template complexity. By the time C++ came out we already had tools like m4 and awk with far more capability than cpp. It's pretty ridiculous that everything else about computing has radically changed since 1970 except the preprocessor and its memory-driven constraints.
Including or importing a templated class/function should not require bringing in the definition. That's why #includes and imports are so expensive, as we have to parse the entire definition to determine if template instantiations will work.
For normal functions or classes, we have forward declarations. Something similar needs to exist for templates.
I think modularization of templates is really hard. Best thing I can think of is a cache e.g. for signatures. But then again this is basically what the mangling already does anyways in my understanding.
The sensible way to speed up compilation 5x was implemented almost 10 years ago, worked amazingly well, and was completely ignored. I don't expect progress from the standards committees. Here it is if you're interested: https://github.com/yrnkrn/zapcc
The next major advance to be completely ignored by standards committees will be the 100% memory safe C/C++ compiler, which is also implemented and works amazingly well: https://github.com/pizlonator/fil-c
> The sensible way to speed up compilation 5x was implemented almost 10 years ago, worked amazingly well, and was completely ignored. I don't expect progress from the standards committees. Here it is if you're interested: https://github.com/yrnkrn/zapcc
Tools like ccache have been around for over two decades, and all you need to do to onboard them is to install the executable and set an environment flag.
What value do you think something like zapcc brings that tools like ccache haven't been providing already?
> What value do you think something like zapcc brings that tools like ccache haven't been providing already?
It avoid instantiating the same templates over and over in every translation unit, instead caching the first instantiation of each. ccache doesn't do this: it only caches complete object files, but does not avoid repeated instantiation costs in each object file.
ccache works at a translation unit level which means it isn't any better than just make-style incremental rebuilds when you aren't throwing away the build directory - it still needs to rebuild the whole translation unit from scratch if a single line in some header changes.
From my perspective there is a pretty big difference between a persistent compiler daemon and a simple cache that constantly restarts the compiler over and over again.
> The next major advance to be completely ignored by standards committees will be the 100% memory safe C/C++ compiler, which is also implemented and works amazingly well: https://github.com/pizlonator/fil-c
You’d be surprised! “Lots of software packages work in Fil-C with zero or minimal changes, including big ones like openssl, CPython, SQLite, and many others.”
It wraps all of the typical API surface used by Linux code.
I’m told it has found real bugs in well-known packages as well, as it will trap on unsafe but otherwise benign accesses (like reading past one the end of a stack buffer).
what is this sorcery. I was reading HN for years, this is the first time I see someone brings up a memory safe C++. how is that not even on the headlines ? what's the catch, build times ? do I have to sell my house to get it?
EDIT: Oh, found the tradeoff:
hollerith on Feb 21, 2024 | prev | next [–]
>Fil-C is currently about 200x slower than legacy C according to my tests
The catch is performance. It's not 200x slower though! 2x-4x is the actual range you can expect. There are many applications where that could be an acceptable tradeoff for achieving absolute memory safety of unmodified C/C++ code.
But also consider that it's one guy's side project! If it was standardized and widely adopted I'm certain the performance penalty could be reduced with more effort on the implementation. And I'm also sure that for new C/C++ code that's aware of the performance characteristics of Fil-C that we could come up with ways to mitigate performance issues.
Latest version of Fil-C with -O1 is just around 50-100% slower than ASAN, very acceptable in my book. I'm actually more "bothered" by its compilation time (took roughly 8x time of clang with ASAN).
> The sensible way to speed up compilation 5x was implemented almost 10 years ago, worked amazingly well, and was completely ignored. I don't expect progress from the standards committees. Here it is if you're interested: https://github.com/yrnkrn/zapcc
Of course it was completely ignored. Did you expect the standards committee to enforce caching in compilers? That's just not its job.
> The next major advance to be completely ignored by standards committees will be the 100% memory safe C/C++ compiler, which is also implemented and works amazingly well: https://github.com/pizlonator/fil-c
Again—do you expect the standards committee to enforce usage of this compiler or what? The standards commitee doesn't "standardize" compilers...
Both zapcc and Fil-C could benefit from the involvement of the standards committee. While both are very compatible, there are certain things that they can't fully support and it would be useful to standardize small language changes for their benefit (and for the benefit of other implementations of the same ideas). Certainly more useful than anything else the standards committees have done in the past 10 years. They would also benefit from the increased exposure that standardization would bring, and the languages would benefit from actual solutions to the problems of security and compile time that C/C++ developers face every day.
Why should anyone use zapcc instead of ccache? It certainly sounds expensive to save all compiler's internal data, if that is what it does.
I'm sure you must be aware, these compiler tools do not constitute a language innovation. I'd also imagine that both are not productions ready in any sense, and would be very difficult to debug if they were not working correctly.
Zapcc can, and frequently does, speed up the compilation of single files by 20x or more in the incremental case. CCache can't do that. And it's by far the common case when compiling iteratively during development. A speedup that large is transformational to your development workflow.
ccache only caches the individual object files produced by compiling a .cpp file.
C++ build times are actually dominated by redundant parsing of headers included in multiple .cpp files. And also redundant template instantiations in different files. This redundancy still exists when using ccache.
By caching the individual language constructs, you eliminate the redundancy entirely.
All that the C++ committee needed to do was just introduce "import" as "this is the same as include except no context can leak into it".
Would have been dirt simple to migrate existing codebases over to using it (find and replace include with import, mostly), and initial implementations of it on the compiler side could have been nearly identical to what's already there, while offering some easy space for optimizing it significantly.
Instead they wanted to make an entirely new thing that's impossible to retrofit into existing projects so its basically DOA
That could be said of every additional c++ feature since c++0x.
The committee has taken backwards compatibility, backwards - refusing to introduce any nuance change in favor of a completely new modus operandi. Which never jives with existing ways of doing things because no one wants to fix that 20 year old codebase.
My experience has been that everyone _wants_ to fix the 20 year old codebase! But it’s hard to justify putting engineers on nebulous refactoring projects that don’t directly add value.
What do you mean by "no context can leak into it"? Do you mean it shouldn't export transitive imports?
As in `#include <vector>` also performs `#include <iterator>` but `import vector` would only import vector, requiring you to `import iterator`, if you wanted to assign `vec.begin()` to a variable?
Or is it more like it shouldn't matter in which order you do an import and that preprocessor directives in an importing file shouldn't affect the imported file?
#define private public
#import <iostream> // muahaha
Or any such nonsense. Nothing I define with the preprocessor before importing something should effect how that something is interpreted, which means not just #define’s, but import ordering too. (Importing a before b should be the same as importing b before a.) Probably tons of other minutiae, but “not leaking context into the import” is a pretty succinct way of putting it.
Modules provide more than just speed. Compile time benefits are great and the article is right about build-time bloat being the bane of every developer. But modules serve a deeper purpose.
Explicit sub-unit encapsulation. True isolation. No more weird forward declarations, endlessly nested ifdef guards, or insane header dependency graphs. Things just exist as they are separate, atomic, deterministic and reliable.
Modules probably need a revision, and yes, adoption has been slow, but once you start using modules you will never go back. The clarity of explicitly declared interfaces and the freedom from header hell fundamentally changes how you think about organizing C++ code.
Start a new project with modules if you don’t believe me. Try them. That is the largest barrier right now - downstream use. They are not supported because they are not used and they are not used because they are not well supported. But once you use them you will start to eagerly await every new compiler release in hopes of them receiving the attention they deserve.
Use a feature that doesn't really work, after so many years, hoping compilers will actually make it work if people use it - seems like a disastrous plan. People have tried to use modules, and have generally found that they fail, and have dumped them.
It's unlikely at this point modules in their current form will ever be anything but a legacy feature in C++. Maybe someday a new implementation will arise, just like noexcept replaced the failed "throws" declaration.
Thing is (correct.me if Im wrong), that if you use modules, all of your code need to use modules (e.g. you cant have mixed #include <vector> and import <vector>; in your project). Which rules out a lot of 3rd party code you might want to depend on.
you wrong
You can simply use modules with includes.
If you will #include vector inside your purview then you will just get a copy of the vector in each translation unit. Not good, but works. On the other hand. If you include a vector inside the global module fragment, then the number of definitions will be actually 1, even if you include it twice in different modules.
The only thing that forward declaration buys you is avoiding to include a header, which is not that expensive in C, but thanks to templates something as innocent as including a header can become infinitely expensive in C++.
The standardization process here feels similar to what happened with JavaScript modules. Introduced in ES2015, but the language standard only had syntax and some invariants. It had no notion of how to load modules, or how they might be delivered, or how a program that had a module at its root might be started. But there was a similar urgency, of "we must have this in ES2015".
I made it one of my first projects after joining the Chrome team to fix that gap, which we documented at [1]. (This reminds me of the article's "The only real way to get those done is to have a product owner...".)
You could even stretch the analogy to talk about how standard JS modules compete against the hacked-together solutions of AMD or CommonJS modules, similar to C++ modules competing against precompiled headers.
That said, the C++ modules problem seems worse than the JavaScript one. The technical design seems harder; JS's host/language separation seems cleaner than C++'s spec/compiler split. Perhaps most importantly, organizationally there was a clear place (the WHATWG) where all the browsers were willing to get together to work on a standard for JS module loading. Whereas it doesn't seem like there's as much of a framework for collaboration between C++ compiler writers.
I did C++ for over 10 years, and now have been doing rust for about 4. On the whole, I like rust much better, but I really miss header files.
Modules are horrible for build times. If you change an implementation (i.e something that would not normally involve editing a header) the amount of rebuilding that happens is crazy, compared to any C++ project that was set up with a minimal amount of care.
I often hear about a lot of advantages of D. So I don't understand why it is so unpopular.
Probably I need to give it a chance, but I'm unsure that I will find a real job with the D stack.
While I can see compiler authors not wanting to have to turn the compiler into a build system, I'd really appreciate if they did do that. Having to create makefiles or other build artifacts is such a waste of energy for most applications.
Shifting more of this to the compiler is logical, because import rules are defined by the language, not by the build system. In fact, today build systems function by asking the compiler to output a list of transitive includes, so it knows which files to rebuild when someone touches a header.
But oth, nothing is better at dealing with edge cases and weird things than writing your build scripts in an actual programming language. All that an integrated C++ build system would do is add a new std::build API, and for the compiler to discover a build.cpp file, compile and run that to perform the build. All the interesting "build system stuff" stuff would be implemented as regular stdlib code.
See Zig for a working example of that idea (which also turned out to be a viable cmake replacement for C and C++ projects)
It could even be reduced to compiler vendors agreeing on a convention to compile and run a build.cpp file. Everything else could live in 3rd party code shipped with a project.
Can't include _any_ header downstream if you import std, it is also unknown how you're gonna export and share modules across dependencies you have no indention of 'porting' to modules...
1. all the .h files were compiled, and emitted as a binary that could be rolled in all at once
2. each .h file created its own precompiled header. Sounds like modules, right?
Anyhow, I learned a lot, mostly that without semantic improvements to C++, while it made compilation much faster, it was too sensitive to breakage.
This experience was rolled into the design of D modules, which work like a champ. They were everything I wanted modules to be. In particular,
The semantic meaning of the module is completely independent of wherever it is imported from.
Anyhow, C++ is welcome to adopt the D design of modules. C++ would get modules that have 25 years of use, and are very satisfying.
Yes, I do understand that the C preprocessor macros are a problem. My recommendation is, find language solutions to replace the preprocessor. C++ is most of the way there, just finish the job and relegate the preprocessor to the dustbin.
Yup, I think this is the core of the problem with C++. The standards committee has drawn a bad line that makes encoding the modules basically impossible. Other languages with good module systems and fast incremental builds don't allow for preprocessor style craziness without some pretty strict boundaries. Even languages that have gotten it somewhat wrong (such as rust with it's proc macros) have bound where and how that sort of metaprogramming can take place.
Even if the preprocessor isn't dustbined, it should be excluded from the module system. Metaprogramming should be a feature of the language with clear interfaces and interactions. For example, in Java the annotation processor is ultimately what triggers code generation capabilities. No annotation, no metaprogramming. It's not perfect, but it's a lot better than the C/C++'s free for all macro system.
Or the other option is the go route. Don't make the compiler generate code, instead have the build system be responsible for code generation (calling code generators). That would be miles better as it'd allow devs to opt in to that slowdown when they need it.
Yes, this was tedious, but we do it for each of our supported platforms.
But we can't do it for various C libraries. This created a problem for us, as it is indeed tedious for users. We created a repository where people shared their conversions, but it was still inadequate.
The solution was to build a C compiler into the D compiler. Now, you can simply "import" a C .h file. It works surprisingly well. Sure, some things don't work, as C programmers cannot resist put some really crazy stuff in the .h files. The solution to that problem turned out be we discovered that the D compiler was able to create D modules from C code. Then, the user could tweak by hand the nutburger bits.
Herb Sutter, Andrei Alexandrescu and myself once submitted an official proposal for "static if" for C++, based on the huge success it has had in D. We received a vehement rejection. It demotivated me from submitting further proposals. ("static if" replaces the C preprocessor #if/#ifdef/#ifndef constructions.)
C++ has gone on to adopt many features of D, but usually with modifications that make them less useful.
It seems particularly tricky to define a template in a module and then instantiate it or specialize it somewhere else.
D also has an `alias` feature, where you can do things like:
where from then on, `abc.T` can be referred to simply as `Q`. This also eliminates a large chunk of purpose behind the preprocessor.For normal functions or classes, we have forward declarations. Something similar needs to exist for templates.
The next major advance to be completely ignored by standards committees will be the 100% memory safe C/C++ compiler, which is also implemented and works amazingly well: https://github.com/pizlonator/fil-c
Tools like ccache have been around for over two decades, and all you need to do to onboard them is to install the executable and set an environment flag.
What value do you think something like zapcc brings that tools like ccache haven't been providing already?
https://en.wikipedia.org/wiki/Ccache
It avoid instantiating the same templates over and over in every translation unit, instead caching the first instantiation of each. ccache doesn't do this: it only caches complete object files, but does not avoid repeated instantiation costs in each object file.
> It's not even possible to link to unsafe code.
This makes it rather theoretical.
It wraps all of the typical API surface used by Linux code.
I’m told it has found real bugs in well-known packages as well, as it will trap on unsafe but otherwise benign accesses (like reading past one the end of a stack buffer).
EDIT: Oh, found the tradeoff:
hollerith on Feb 21, 2024 | prev | next [–]
>Fil-C is currently about 200x slower than legacy C according to my tests
But also consider that it's one guy's side project! If it was standardized and widely adopted I'm certain the performance penalty could be reduced with more effort on the implementation. And I'm also sure that for new C/C++ code that's aware of the performance characteristics of Fil-C that we could come up with ways to mitigate performance issues.
Of course it was completely ignored. Did you expect the standards committee to enforce caching in compilers? That's just not its job.
> The next major advance to be completely ignored by standards committees will be the 100% memory safe C/C++ compiler, which is also implemented and works amazingly well: https://github.com/pizlonator/fil-c
Again—do you expect the standards committee to enforce usage of this compiler or what? The standards commitee doesn't "standardize" compilers...
There are also quite a few compiler cache systems around.
For example, anyone can onboard tools like ccache by installing it and setting an environment variable.
I'm sure you must be aware, these compiler tools do not constitute a language innovation. I'd also imagine that both are not productions ready in any sense, and would be very difficult to debug if they were not working correctly.
C++ build times are actually dominated by redundant parsing of headers included in multiple .cpp files. And also redundant template instantiations in different files. This redundancy still exists when using ccache.
By caching the individual language constructs, you eliminate the redundancy entirely.
ccache doesn't add anything over make for a single project build.
Would have been dirt simple to migrate existing codebases over to using it (find and replace include with import, mostly), and initial implementations of it on the compiler side could have been nearly identical to what's already there, while offering some easy space for optimizing it significantly.
Instead they wanted to make an entirely new thing that's impossible to retrofit into existing projects so its basically DOA
The committee has taken backwards compatibility, backwards - refusing to introduce any nuance change in favor of a completely new modus operandi. Which never jives with existing ways of doing things because no one wants to fix that 20 year old codebase.
As in `#include <vector>` also performs `#include <iterator>` but `import vector` would only import vector, requiring you to `import iterator`, if you wanted to assign `vec.begin()` to a variable?
Or is it more like it shouldn't matter in which order you do an import and that preprocessor directives in an importing file shouldn't affect the imported file?
Deleted Comment
Explicit sub-unit encapsulation. True isolation. No more weird forward declarations, endlessly nested ifdef guards, or insane header dependency graphs. Things just exist as they are separate, atomic, deterministic and reliable.
Modules probably need a revision, and yes, adoption has been slow, but once you start using modules you will never go back. The clarity of explicitly declared interfaces and the freedom from header hell fundamentally changes how you think about organizing C++ code.
Start a new project with modules if you don’t believe me. Try them. That is the largest barrier right now - downstream use. They are not supported because they are not used and they are not used because they are not well supported. But once you use them you will start to eagerly await every new compiler release in hopes of them receiving the attention they deserve.
It's unlikely at this point modules in their current form will ever be anything but a legacy feature in C++. Maybe someday a new implementation will arise, just like noexcept replaced the failed "throws" declaration.
Deleted Comment
> No more weird forward declarations
We c++ devs just have collectively accepted that this hack is still totally ok in the year 2025, just to improve build times.
I made it one of my first projects after joining the Chrome team to fix that gap, which we documented at [1]. (This reminds me of the article's "The only real way to get those done is to have a product owner...".)
You could even stretch the analogy to talk about how standard JS modules compete against the hacked-together solutions of AMD or CommonJS modules, similar to C++ modules competing against precompiled headers.
That said, the C++ modules problem seems worse than the JavaScript one. The technical design seems harder; JS's host/language separation seems cleaner than C++'s spec/compiler split. Perhaps most importantly, organizationally there was a clear place (the WHATWG) where all the browsers were willing to get together to work on a standard for JS module loading. Whereas it doesn't seem like there's as much of a framework for collaboration between C++ compiler writers.
[1]: https://blog.whatwg.org/js-modules
Modules are horrible for build times. If you change an implementation (i.e something that would not normally involve editing a header) the amount of rebuilding that happens is crazy, compared to any C++ project that was set up with a minimal amount of care.
In terms of rebuild performance in defining the interface vs implementation, there is interest in having rustc handle that automatically, see https://rust-lang.github.io/rust-project-goals/2025h2/relink...
Dead Comment
See Zig for a working example of that idea (which also turned out to be a viable cmake replacement for C and C++ projects)
It could even be reduced to compiler vendors agreeing on a convention to compile and run a build.cpp file. Everything else could live in 3rd party code shipped with a project.