> It is [...] entirely optional to call free. If you don’t call free, memory usage will increase over time, but technically, it’s not a leak. As an optimization, you may choose to call free to reduce memory, but again, strictly optional.
This is beautiful! Unless your program is long-running, there's no point of ever calling free in your C programs. The system will free the memory for you when the program ends. Free-less C programming is an exhilarating experience that I recommend to anybody.
In the rare cases where your program needs to be long-running, leaks may become a real problem. In that case, it's better to write a long-running shell script that calls a pipeline of elegant free-less C programs.
Reminds me of the HFT shop that built in Java and simply turned the garbage collector off. Then when the market closed they would restart the process for the next day.
I’ve done something similar in one of our production services. There was a problem with extremely long GC pauses during Gen2 garbage collection (.NET uses a multigenerational GC design). Pauses could be many seconds long or more than a minute in extreme cases.
We found the underlying issue that caused memory pressure in the Gen2 region but the fix was to change some very fundamental aspects of the service and would need to have some significant refactoring. Since this was a legacy service (.net framework) that we were refactoring anyway to run in new .NET (5+), we decided to ignore the issue.
Instead we adjusted the GC to just never do the expensive Gen2 collections (GCLatencyMode) and moved the service to run on higher memory VMs. It would hit OOM every 3 days or so, so we just set instances to auto-restart once a day.
Then 1 year later we deployed the replacement for the legacy service and the problem was solved.
I had a friend who worked for one of the big Market Makers, and he told me that they would indeed turn the GC off, but what they'd do is just pre-allocate everything into bigass arrays before-hand, and have incrementers to simulate the "new" keyword. They might do this in something more or less like a threadlocal to avoid having to deal with locks or race conditions or anything like that.
it is somewhat common in garbage collected langages to fight the garrbage collector like that. Sure manual free probably adds up to more CPU time, but it is more spread out and thus not noticable (normally, real time still cannot allocate in the sensitive areas)
I did a watch face on pebble once... the programming style was "uncommon" when you're hardcoding the ONE watch face that is currently rendering the screen. It "felt" very leaky and illegal... but it's a watch face with limited functionality, so... shrug?
A professor actually told us that freeing prior to exit was harmful, because you may spend all of that time resurrecting swapped pages for no real benefit.
Counterpoint is that debugging leaks is ~hopeless unless you have the ability to prune “intentional leaks” at exit
> debugging leaks is ~hopeless unless you have the ability to prune “intentional leaks” at exit
Not in general. It depends on your debugger. For example, valgrind distinguishes between harmless "visible leaks", memory blocks allocated from main or on global variables, and "true leaks" that you cannot free anymore. The first ones are given a simple warning by the leak detector, while the true leaks are actual errors.
I had to debug a program that did just that once, long ago, and the fix was to not free on exit. The program's behavior had been that it took ~20m and then one day it ran for hours and we never found out how long it would have taken. Fortunately it was a Tcl program, and the fix was to remove `unset`s of large hash tables before exiting.
Actually that is not too far from reality. Data that will be allocated only once does not need be freed. You really only need to free memory that may increase iteratively. If the memory is not used frequently it will end up on swap without major implications. If it is used during the whole execution, it will only be freed when the program ends; there's difference if it's by a 'free' or by the OS; there's just no difference.
As an example, constant data that is allocated by GNU Nano is never freed. AFAIK, the same happens when you use GTK or QT; there were even tips on how to suppress valgrind warnings when using such libs.
If you're writing a library, you're not writing a program.
If you write a C library, it is a good practice to leave the allocations to the library user, or at least provide a way to override the library's allocator. Allowing your user to write a free-less *program*.
Early in my career I saw the aftermath of someone trying to deal with this problem. The solution they went with was to replace all allocations in the offending code with a custom allocator and then just throw the allocator away every so often
Best criticism to this tongue-in-cheek solution yet. I’ll add my own more minor criticism: it will explode if the leaksaver struct allocation ever fails because it doesn’t check the result.
The go guys knew this all along, because if I understood it correctly this is what happens when you disable the garbage collector. Voila, now you can claim your garbage collector is totally optional and your language fundamentally doesn't require a garbage collector.
[I love golang, and I think it's one of the best languages around. If only it had a truly optional garbage collector. But then again, it wouldn't be go I guess...]
>In the rare cases where your program needs to be long-running, leaks may become a real problem.
Where did this idea come from? I have seen leaks where a program can consume all available memory in just a few seconds because the programmer (definitely not me...) forgot to free something in a function called millions of times.
Next thing they do is to optimize the hell out of kernel process management things until it catches up with the overhead of calling a function in a GC language.
I always thought that was one reason the UNIX terminal login process (with `getty` and `exec` to shell, getty restarting when the shell exits) was the way it was.
... or unless someone decides to convert it into a daemon "because of that ticket" and then QA goes all "oh, ah, the routing is dead, the sshd is dead and the whole box is all but bricked, what could've possibly caused that".
TFA is clearly written in jest. The provided code is not supposed to be run in production, just to illustrate how easy is to trick a "leak detector" in your debugger. You just put all your mallocs in a list, and at the end of your program you can free them all.
Yet, the idea of not freeing some memory in your program is not entirely stupid. Unless your memory is allocated inside a loop of unpredictable length, it's not really necessary to ever free it. Worse: the call to "free" may even fall after your program has run successfully. Thus, avoiding the useless (but typical) freeing spree at the end of your program may make it more robust!
This solution doesn't do anything to prevent leaking memory in anything but the most pedantic sense, and actually creates leaks and dangling pointers.
The function just indirects malloc with a wrapper so that all of the memory is traversable by the "bigbucket" structure. Memory is still leaked in the sense that any unfreed data will continue to consume heap memory and will still be inaccessible to the code unless the application does something with the "bigbucket" structure--which it can't safely do (see below).
There is no corresponding free() call, so data put into the "bigbucket" structure is never removed, even when the memory allocation is freed by the application. This, by definition, is a leak, which is ironic.
In an application that does a lot of allocations, the "bigbucket" structure could exhaust the heap even though there are zero memory leaks in the code. Consider the program:
int main(int argc, char** argv) {
for (long i = 0; i < 1000000; i++) {
void *foo = malloc(sizeof(char) * 1024);
free(foo);
}
return 0;
}
At the end of the million iterations, there will be zero allocated memory, but the "bigbucket" structure will have a million entries (8MB of wasted heap space on a 64-bit computer). And every pointer to allocated memory in the "bigbucket" structure is pointing to a memory address previously freed so now points to a completely undefined location--possibly in the middle of some memory block allocated later.
Why isn't it serious? I can imagine a smart pointer that actually accomplished the stated goal. So the joke is that they took a good idea and made a rubbish solution?
Obviously this is a joke, but the real message should be: if you can't manage your own memory, you should be using a language implementation with automatic memory management. If your code really needs to run "close to the metal", you really need to figure out how to manage your memory. In 2024, language implementations without automatic memory management should be reserved for applications that absolutely need them.
Funny, I think the lesson is the opposite: just because you have mechanisms that technically prevent memory leaks doesn't mean that you don't need to think about memory and its allocation/freeing. Or rather, memory leaks generally are not problem, unbounded memory consumption is, regardless if the consumption is technically due leak or some reference stashed somewhere.
> if you can't manage your own memory, you should be using a language implementation with automatic memory management.
I agree, and would expand the idea to all kinds of resources (files for example). Sadly not many languages have "automatic resource management". For example Go has automatic memory management, but if you read an HTTP request's body you have to remember to call req.Body.Close(). If you open a file you have to call file.Close(). If you launch a goroutine, you have to think about when it's going to end.
I'd like to know if some languages manage to automatically managed resources, and how they do it.
C# has Using. When the block ends the destructor is called.
VB Classic has reference counting and the terminate event is fired when the count goes to zero. So as long as you don't store the reference in a global the terminate event is guaranteed to run when it goes out of scope (so long as you don't have circular references of course).
C++ has Resource acquisition is initialization (RAII)
It obviously is a joke, but at the same time it's actually a viable approach for the right problem. Sometimes leaking is very tolerable and in terms of programmer time, very cheap!
I bet this is what some people on my team would come up with if the ticket acceptance criteria said "program must not leak memory when checked with valgrind"
You can easily turn non-leaking program into a faster and leaking program, but the inverse direction is hard, so that criterion is entirely justified. Any optimization of this sort should be guarded against a compile time switch that simply swaps `free` with a placeholder. I think CPython did this for a long time, before the global initialization step was completely removed.
We have a counter that goes up by 1 every time you call malloc.
And down by one every time you call free.
And when the program quits, if the counter isn't zero, an email is fired off and a dollar gets sent from the developers bank account to the users bank account...
That's essentially how all leak detection tools work, minus the money part.
And it is not even always appropriate. It is common to allocate some memory for the entire lifetime of the process. For example, if your app is GUI-based and has a main window, there is no need to free the resources tied to the main window, because closing it means quitting the app which will cause all memory to be reclaimed by the OS. You can properly free your memory but it will only make quitting slower. Usually programmers only do that to satisfy leak detection tools, and if the overhead is significant, it may only be done in debug mode.
There is a bug here... Clearly the author intended to cache the value of nextmalloc to avoid calling dlsym() on every malloc. The correct code should be:
#ifdef HAVE_VALGRIND_VALGRIND_H
if (RUNNING_ON_VALGRIND)
#endif
free_all()
free is way too slow if not needed, so detect valgrind via its API. Just on valgrind do the unnecessary free dance. ASAN's memleak detector is disabled via its env.
Perl5 does its final destruction similarly, only when it has important destructors (like IO, DB handles and such) to call.
That style of #ifdef use, while common in the past, tends to be somewhat fragile and also gets more and more tangled as more conditions are added.
A better way is to have an always existing inline running_on_valgrind() function, and use the #ifdef only for that function definition, either within it or around it (having it around the function also allows it to be inline only for the trivial not-defined case). Examples of this "inline function" style are found all over the Linux kernel (which has lots of conditionally-compiled code).
> It is [...] entirely optional to call free. If you don’t call free, memory usage will increase over time, but technically, it’s not a leak. As an optimization, you may choose to call free to reduce memory, but again, strictly optional.
This is beautiful! Unless your program is long-running, there's no point of ever calling free in your C programs. The system will free the memory for you when the program ends. Free-less C programming is an exhilarating experience that I recommend to anybody.
In the rare cases where your program needs to be long-running, leaks may become a real problem. In that case, it's better to write a long-running shell script that calls a pipeline of elegant free-less C programs.
We found the underlying issue that caused memory pressure in the Gen2 region but the fix was to change some very fundamental aspects of the service and would need to have some significant refactoring. Since this was a legacy service (.net framework) that we were refactoring anyway to run in new .NET (5+), we decided to ignore the issue.
Instead we adjusted the GC to just never do the expensive Gen2 collections (GCLatencyMode) and moved the service to run on higher memory VMs. It would hit OOM every 3 days or so, so we just set instances to auto-restart once a day.
Then 1 year later we deployed the replacement for the legacy service and the problem was solved.
https://devblogs.microsoft.com/oldnewthing/20180228-00/?p=98...
Deleted Comment
But I remember the first time I saw such a program which never freed anything: jitterbug, the simple bug tracker which ran as a CGI script.
It indeed allows a very simple style!
Meanwhile, use ccan/tal (https://github.com/rustyrussell/ccan/blob/master/ccan/tal/_i...) and be happy :)
Counterpoint is that debugging leaks is ~hopeless unless you have the ability to prune “intentional leaks” at exit
Not in general. It depends on your debugger. For example, valgrind distinguishes between harmless "visible leaks", memory blocks allocated from main or on global variables, and "true leaks" that you cannot free anymore. The first ones are given a simple warning by the leak detector, while the true leaks are actual errors.
As an example, constant data that is allocated by GNU Nano is never freed. AFAIK, the same happens when you use GTK or QT; there were even tips on how to suppress valgrind warnings when using such libs.
If you write a C library, it is a good practice to leave the allocations to the library user, or at least provide a way to override the library's allocator. Allowing your user to write a free-less *program*.
That does it. I am not going to use it.
[I love golang, and I think it's one of the best languages around. If only it had a truly optional garbage collector. But then again, it wouldn't be go I guess...]
Where did this idea come from? I have seen leaks where a program can consume all available memory in just a few seconds because the programmer (definitely not me...) forgot to free something in a function called millions of times.
Deleted Comment
... or unless someone decides to convert it into a daemon "because of that ticket" and then QA goes all "oh, ah, the routing is dead, the sshd is dead and the whole box is all but bricked, what could've possibly caused that".
In short: If it works until it crashes, it doesn't work.
Yet, the idea of not freeing some memory in your program is not entirely stupid. Unless your memory is allocated inside a loop of unpredictable length, it's not really necessary to ever free it. Worse: the call to "free" may even fall after your program has run successfully. Thus, avoiding the useless (but typical) freeing spree at the end of your program may make it more robust!
Dead Comment
The function just indirects malloc with a wrapper so that all of the memory is traversable by the "bigbucket" structure. Memory is still leaked in the sense that any unfreed data will continue to consume heap memory and will still be inaccessible to the code unless the application does something with the "bigbucket" structure--which it can't safely do (see below).
There is no corresponding free() call, so data put into the "bigbucket" structure is never removed, even when the memory allocation is freed by the application. This, by definition, is a leak, which is ironic.
In an application that does a lot of allocations, the "bigbucket" structure could exhaust the heap even though there are zero memory leaks in the code. Consider the program:
At the end of the million iterations, there will be zero allocated memory, but the "bigbucket" structure will have a million entries (8MB of wasted heap space on a 64-bit computer). And every pointer to allocated memory in the "bigbucket" structure is pointing to a memory address previously freed so now points to a completely undefined location--possibly in the middle of some memory block allocated later.There are already tools to identify memory leaks, such as LeakSanitiser https://clang.llvm.org/docs/LeakSanitizer.html. Use those instead.
Clearly the author of TFA is aware of such tools, since the idea is to trick them.
You know this isn't serious right?
Deleted Comment
I agree, and would expand the idea to all kinds of resources (files for example). Sadly not many languages have "automatic resource management". For example Go has automatic memory management, but if you read an HTTP request's body you have to remember to call req.Body.Close(). If you open a file you have to call file.Close(). If you launch a goroutine, you have to think about when it's going to end.
I'd like to know if some languages manage to automatically managed resources, and how they do it.
VB Classic has reference counting and the terminate event is fired when the count goes to zero. So as long as you don't store the reference in a global the terminate event is guaranteed to run when it goes out of scope (so long as you don't have circular references of course).
C++ has Resource acquisition is initialization (RAII)
Define "it" here.
Because "just don't free" is pretty different from what's in the post!
1: https://en.wikipedia.org/wiki/Goodhart's_law
We have a counter that goes up by 1 every time you call malloc.
And down by one every time you call free.
And when the program quits, if the counter isn't zero, an email is fired off and a dollar gets sent from the developers bank account to the users bank account...
And it is not even always appropriate. It is common to allocate some memory for the entire lifetime of the process. For example, if your app is GUI-based and has a main window, there is no need to free the resources tied to the main window, because closing it means quitting the app which will cause all memory to be reclaimed by the OS. You can properly free your memory but it will only make quitting slower. Usually programmers only do that to satisfy leak detection tools, and if the overhead is significant, it may only be done in debug mode.
Isn’t that close to how ARC (automatic reference counting) works?
Deleted Comment
the other bug of course is that it's not thread safe
Perl5 does its final destruction similarly, only when it has important destructors (like IO, DB handles and such) to call.
A better way is to have an always existing inline running_on_valgrind() function, and use the #ifdef only for that function definition, either within it or around it (having it around the function also allows it to be inline only for the trivial not-defined case). Examples of this "inline function" style are found all over the Linux kernel (which has lots of conditionally-compiled code).