Borrow Checking, RC, GC, and the Eleven () Other Memory Safety Approaches

I think it’s better to view reference counting and gc as two kinds of garbage collection: https://courses.cs.washington.edu/courses/cse590p/05au/p50-b...

jdougan · a year ago

That is an excellent paper. One of those things that is obvious in hindsight, but no one wrote it down clearly before.

_a_a_a_ · a year ago

IMO it's not remotely obvious. Not to me anyway.

dataflow · a year ago

I have to disagree? Lumping them together is like lumping together formal verification with machine learning...

Two fundamental characteristics of garbage collection (in pretty much every programmer's experience) are that (a) it can handle cycles (read: more general programs), and (b) it does not provide hard performance guarantees in the general case. Reference counting is literally the opposite, and that's exactly what people love about it.

rbehrends · a year ago

First of all, I recommend giving the paper a read, because I think you're misunderstanding the claim (plus, it is a very good paper). The claim is not that they are equivalent, but that tracing GC and reference counting are dual solutions to the same problem, two ends of the same spectrum if you will, with hybrid solutions existing in between.

Second, what you seem to consider to be fundamental characteristics of tracing GC and RC is not in fact so fundamental.

For starters, RC absolutely can handle cycles (e.g. through trial deletion). Such implementations may be difficult or impossible to implement as pure library solutions, but there is nothing that says it can't be done. The most prominent example of a programming language that uses such an approach is probably Python.

Nor does the claim that tracing GC cannot provide hard performance guarantees in the general case (while RC does) hold up under closer examination. Leaving aside the problem that it's already non-trivial to provide hard real time performance guarantees for malloc()/free() and ignoring the issue of cascading deletions, it doesn't hold under the more relaxed assumptions discussed downthread.

For starters, we have such predictability only for the single-threaded case, not for arbitrary multi-threaded situations. And even in the single-threaded case, there are real use cases where predicting performance becomes functionally intractable. Examples are implementations of binary decision diagrams or certain persistent data structures, where the presence of shared subgraphs of arbitrary size make predicting performance of individual deallocations impractical.

In contrast, in the single-threaded case we can absolutely bound individual operations of a tracing GC by either a constant or (in the case of arbitrarily sized allocations) make them linear in the number of bytes allocated (e.g. Baker's treadmill).

What is true is that in the absence of cycles, (naive) reference counting will free memory at the earliest opportunity, which is not something we can say for tracing GC.

wbl · a year ago

Reference counting does not provide hard performance guarantees. If you disagree consider what happens when a large tree is freed at the root.

pulse7 · a year ago

The more optimized RC is, the more similar it is to GC - and this includes properties like cycle handling and performance guarantees.

thyecust · a year ago

This paper is referenced in this post too. As "a whole spectrum between the two." (RC and tracing GC)

stefanos82 · a year ago

Reference Counting is one way of GC and you can clearly read in great detail about it in the book "The Garbage Collection Handbook", chapter 5 [1]

[1] https://gchandbook.org/contents.html

> "Linear reference counting is an elusive concept, where we can completely eliminate the counter integer, and do all of the reference counting at compile time."

Right. I've occasionally thought about that for Rust, as a way to do back references. You can have an Rc owning an object, which can own another object, which can have a weak RC reference back to the original. This allows you to find an object's owner when needed, so you can do inheritance-like things when you need to. The single-ownership and dangling-pointer checks are made at run time and will panic if they fail.

Now, the question is, can you statically check access behavior and determine that those panics will never occur? If you can, you can eliminate the reference counts and checks. That looks 1) possible, and 2) a PhD thesis sized problem.

pirocks · a year ago

You may enjoy: https://docs.rs/static-rc/latest/static_rc/

fiddlerwoaroof · a year ago

hi-v-rocknroll · a year ago

Never free and periodically crashing by OOM killer is rarely used but can be useful in limited circumstances. There are some shops that arbitrarily kill any worker over X hours old under the hypothesis that a memory leak is present or probably present.

littlestymaar · a year ago

> Never free

AFAIK it's how you get memory safety in missile guidance software: you just have to put enough RAM on the board so it never run out of memory before the end of its flight.

Edit: TFA just talks about that in the “never free”

swiftcoder · a year ago

I'm aware of several Java service backends that just disable the GC entirely and take a once-a-day crash as a cost of doing business. As long as you have enough nodes that the crashes aren't correlated, your overall service can maintain decent uptime

kernal · a year ago

Best part

>I was once working with a customer who was producing on-board software for a missile. In my analysis of the code, I pointed out that they had a number of problems with storage leaks. Imagine my surprise when the customers chief software engineer said "Of course it leaks". He went on to point out that they had calculated the amount of memory the application would leak in the total possible flight time for the missile and then doubled that number. They added this much additional memory to the hardware to "support" the leaks. Since the missile will explode when it hits its target or at the end of its flight, the ultimate in garbage collection is performed without programmer intervention.

Animats · a year ago

zozbot234 · a year ago

It's weird to see no mention in this post of separation logic, which would seem to provide a basic, general foundation for most of these techniques if perhaps not all of them. For instance, the RustBelt paper gives a description of Rust borrow checking that can arguably be understood as relying on some form of separation logic.

habitue · a year ago

I am surprised that call stacks aren't mentioned. The call stack is a great way to free up memory that's no longer being used by a function. Maybe it was too obvious to include?

ReleaseCandidat · a year ago

That doesn't solve the "use after free" problem, so not memory safe (as long as not combined with something else).

EVa5I7bHFq9mnYK · a year ago

What is the "use after free" problem? A function leaves the computed result on the stack or modifies data higher in the stack. There are no pointers, nothing to free.

KRAKRISMOTT · a year ago

Most of the fancier runtime-less non-GC techniques fail when it comes to graph data structures (or even linked lists) without a tremendous amount of upfront work from the programmer.

Are you just trying to throw shade on Rust?

https://doc.rust-lang.org/std/collections/struct.LinkedList....

> NOTE: It is almost always better to use Vec or VecDeque because array-based containers are generally faster, more memory efficient, and make better use of CPU cache.

https://docs.rs/petgraph 78 M downloads

kibwen · a year ago

> And as it turns out, reference counting can be blended with immutable region borrowing to greatly reduce its cache misses and make it data-race safe, something no language has done yet.

Rust prevents data races, so I'm not sure what this is referring to.

dullcrisp · a year ago

I think it’s saying that no language has blended reference counting with immutable region borrowing, not that no language is data-race safe.

pjmlp · a year ago

One kind of data races, in-process memory not shared with other processes.