A curiously recurring lifetime issue

Author of Cap'n Proto here.

The main innovation of Cap'n Proto serialization compared to Protobuf is that it doesn't copy anything, it generates a nice API where all the accessor methods are directly backed by the underlying buffer. Hence the generated classes that you use all act as "views" into the single buffer.

C++, meanwhile, is famously an RAII lanugage, not garbage-collected. In such languages, you have to keep track of which things own which other things so that everyone knows who is responsible for freeing memory.

Thus in an RAII language, you generally don't expect view types to own the underlying data -- you must separately ensure that whatever does own the backing data structure stays alive. C++ infamously doesn't really help you with this job -- unlike Rust, which has a type system capable of catching mistakes at compile time.

You might argue that backing buffers should be reference counted and all of Cap'n Proto's view types should hold a refcount on the buffer. However, this creates new footguns. Should the refcounting be atomic? If so, it's really slow. If not, then using a message from multiple threads (even without modifying it) may occasionally blow up. Also, refcounting would have to keep the entire backing buffer alive if any one object is pointing at it. This can lead to hard-to-understand memory bloat.

In short, the design of Cap'n Proto's C++ API is a natural result of what it implements, and the language it is implemented in. It is well-documented that all these types are "pointer-like", behaving as views. This kind of API is very common in C++, especially high-performing C++. New projects should absolutely choose Rust instead of C++ to avoid these kinds of footguns.

In my experience each new developer makes this mistake once, figures it out, and doesn't have much trouble using the API after that.

foxhill · 2 years ago

apologies, perhaps i’m missing something here, having not used cap’n proto in any context at all before.

is it not possible to delete the rvalue reference overload of ‘getList’?

as far as i can tell, the error producing code wouldn’t have produced a diagnostic, but failed to build in the first instance, like the rust case?

kentonv · 2 years ago

That would catch some legitimate use cases, where you get the list and immediately use it on the same line. Admittedly this is not so common for lists, but very common for struct readers, e.g.:

    int i = call.send().getSomeStruct().getValue();

Here, even though `send()` returns a response that is not saved anywhere, and a struct reader is constructed from it, the struct reader is used immediately in the same line, so there's no use-after-free.

Someone else mentioned using lifetimebound annotations. This will probably work a lot better, avoiding the false positives. It just hadn't been done because the annotations didn't exist at the time that most of Cap'n Proto was originally written.

> Is this on Cap'n Proto? Honestly, I don't know.

It absolutely is. It's a fairly basic principle that APIs should be difficult to misuse, and that fact that you made the same mistake 2/3 times shows that it is very easy to misuse. In other words it is a badly designed API. At the very least it should be called ListView.

I am not a big fan of CapnProto. It has some neat features but it's very complicated, full of footguns like this, and the API is extremely unergonomic.

dmeybohm · 2 years ago

Yeah I agree, the ListView naming is more appropriate.

I haven't used non-owning types like string_view or span too much because I haven't needed that level of performance or memory optimization yet, and so those just seem like footguns as compared to just a reference without those needs. I do like to use a technique in classes that use non-owning references that would work for those too to prevent this particular problem.

For that, there are two methods with the same name, but different access - an lvalue version and an rvalue version. Then, you delete the rvalue method like this:

  class Response {
    auto getListView() & -> ListView {
      return ListView(m_List);
    }
    void getListView() && = delete;
  }

Then you get a compile error like in Rust when you try to call getListView() from a temporary object, but if you call the method from an lvalue it still works at least as long as the object is in scope.

bsder · 2 years ago

> In other words it is a badly designed API.

I don't agree. The API is what it is because it is specifically a zero copy API for performance. If you don't care about performance, why are you using C++ (stupid) and a zero-copy API (doubly stupid)?

I absolutely do NOT expect a zero copy API to own things. If I drop the underlying reference that is really an alias, how on earth is that the fault of the zero copy API?

The combination of aliasing and lifetimes are C++ footguns--full stop. This is aptly demonstrated by how quickly Rust kills this cold.

If you use sharp knives, sometimes you cut your fingers. People like you would claim the knife is the problem.

IshKebab · 2 years ago

I think you misunderstood. It's not a bad API because it uses non-owning references. It's a bad API because it doesn't make that clear.

> If you use sharp knives, sometimes you cut your fingers. People like you would claim the knife is the problem.

This is more like a cutting yourself on a razor sharp butter knife. If it's a sharp knife it should look like a sharp knife.

nyanpasu64 · 2 years ago

I'd say that method chaining (referential transparency, etc.) and implicit destructor calls with side effects don't mix.

I have a general rule that "resource" types which own a heap allocation should usually be given a variable name with explicit scope (and likely even an explicit type, rather than `auto response` like in this post). This is a general guideline to avoid holding a reference to a temporary that gets destroyed, but doesn't protect against returning a dangling reference into a resource type from a function.

In other places, where languages make the opposite decision (from this blog post) to extend the lifetime of a temporary variable with a destructor when you call methods on it, you get things like C++'s temporary lifetime extension (not a bug, note that I don't understand it well), and footguns like Rust's `match lock.lock().something {}` (https://github.com/rust-lang/lang-team/blob/master/design-me...).

phendrenad2 · 2 years ago

Referential transparency be damned, I guess. This feels like a inherent downside to languages where you have to manage lifetimes.

GuB-42 · 2 years ago

The problem I see here is that one of the functions returns a pointer and it doesn't use the usual pointer syntax.

I see no *, no & and no -> in the code. So I would assume everything to behave as if it was owned or even copied. Had it returned actual pointers, or pointer-like objects like iterators, it would have been more obvious.

amluto · 2 years ago

This is C++ we’re talking about.

    auto x = y();

Is x a pointer or reference? There’s no way to tell. Maybe if you then do

    x->foo();

You have some idea that x is pointer-ish, but unique_ptr works like this and isn’t very pointer-ish.

HarHarVeryFunny · 2 years ago

Seems like someone trying to be too clever to me, and perhaps a case of premature optimization. Non-owning references are a problem waiting to happen. Even if your language/api allows you to check if the reference is still valid before use, you can obviously forget to do so.

Rather than use a non-owning reference I'd rather use a design that didn't need it, or just use a std::shared_ptr owning reference instead. I realize there are potential cases (i.e. one can choose to design such cases) of circular references where a non-owning reference might be needed to break the circular chain, or where one wants a non-owning view of a data structure, but without very careful API design and code review these are easy to mess up.

This sounds nice but it just isn't realistic. If you try to write a complex system in C++ without non-owning references, you're basically heap-allocating every single object and using slow atomic refcounting everywhere. Performance will likely be much worse than just using a garbage collected language to start with.

Worrying about the speed of smart pointer reference counting sounds like premature optimization at best. If this is really what's slowing your app down in 2023, then you've got bigger problems than choosing owning vs non-owning references.

dataflow · 2 years ago

I think [[clang::lifetimebound]] would let the compiler detect this at compile time?

jimberlage · 2 years ago

Is there a good tutorial on Valgrind for beginners? It’s a tool I’ve only ever seen praised so I’m curious to play with it a bit.

In my experience all you really have to do is:

1. Write a test program that exercises your segfault.

2. Build it in debug mode.

3. Run `valgrind <my-program>`

And it just tells you where your problem is. Not much more to it.

gavinhoward · 2 years ago

As an avid Valgrind user, I don't know of one.

But if you would like, I could post one to my blog.

If you would like me to, contact me privately. [1]

[1]: https://gavinhoward.com/contact/