Generic Containers in C: Vec

petters · a month ago

> Many vector types include a capacity field, so that resizing on every push can be avoided. I do not include one, because simplicity is more important to me and realloc often does this already internally. In most scenarios, the performance is already good enough.

I think this is the wrong decision (for a generic array library).

ethan_smith · a month ago

Without a capacity field, each push operation potentially triggers a realloc, causing O(n) copying and possible memory fragmentation - especially problematic for large vectors or performance-critical code.

tialaramex · a month ago

> realloc often does this already internally

Is Martin claiming that realloc is "often" maintaining a O(1) growable array for us?

That's what the analogous types in C++ or Rust, or indeed Java, Go, C# etc. provide.

uecker · a month ago

No, I claim that the performance of realloc is good enough for most use cases because it also does not move the memory in case there is already enough space left.

I then mention that for other use cases, you can maintain a capacity field only in the part of the code where you need this.

Whether this is the right design for everybody, I do not know, but so far it is what I prefer for myself.

cyber1 · a month ago

Many C programmers need proper generic programming mechanisms (perhaps something like Zig's comptime) in C, but macros are the worst possible approach, and they don't want to switch to a different language like C++. As a result, they struggle with these issues. This is what I think the standardization committee should focus on, but instead, they introduced _Generic.

sparkie · a month ago

The biggest issue is the ABI for C - it's the lingua-franca of language interoperability and can't really be changed - so whatever approach is taken it needs to be fully compatible with the existing ABI. `_Generic` is certainly flawed but doesn't cause any breaking ABI changes.

That's also a major reason why you'd use C rather than C++. The C++ ABI is terrible for language interoperability. It's common for C++ libraries to wrap their API in C so that it can be used from other language's FFIs.

Aside from that another reason we prefer C to C++ is because we don't want vtables. I think there's room for a `C+` language, by which I mean C+templates and not C+classes - perhaps with an ABI which is a subset of the C++ ABI but superset of the C ABI.

IAmLiterallyAB · a month ago

> we don't want vtables

Then don't use virtual functions. Then there will be no vtables.

You might have known that already, but in general I'm surprised how many engineers think that all C++ classes have vtables. No, most in fact do not. C++ classes generally have the same memory layout as a C struct as long as you don't use virtual functions.

signa11 · a month ago

> I think there's room for a `C+` language, by which I mean C+templates and not C+classes - perhaps with an ABI which is a subset of the C++ ABI but superset of the C ABI.

indeed, i have spoken to a lot of my colleagues about just that. if overloading is not allowed, perhaps there is still some hope for a backwards compatible abi ?

cyber1 · a month ago

This is true. I agree with this statement. It's the holy cow of C. However, the problem with generic programming and metaprogramming isn't going away, and many people continue to struggle with it. Introducing something like compile-time reflection might be a solution...

up2isomorphism · a month ago

They showed something they think it’s neat. You start a topic with the assumption that they struggle, not sure how you get that information from the original post or you just want to state that claim anyway?

sirwhinesalot · a month ago

The most insulting thing about _Generic is the name. Really? _Generic? For a type-based switch with horrific syntax? What were they thinking...

That said, generic programming in C isn't that bad, just very annoying.

To me the best approach is to write the code for a concrete type (like Vec_int), make sure everything is working, and then do the following:

A macro Vec(T) sets up the struct. It can then be wrapped in a typedef like typedef Vec(int) Vec_i;

For each function, like vec_append(...), copy the body into a macro VEC_APPEND(...).

Then for each relevant type T: copy paste all the function declarations, then do a manual find/replace to give them some suffix and fill in the body with a call to the macro (to avoid any issues with expressions being executed multiple times in a macro body).

Is it annoying? Definitely. Is it unmanageable? Not really. Some people don't even bother with this last bit and just use the macros to inline the code everywhere.

Some macros can delegate to void*-based helpers to minimize the bloating.

EDIT: I almost dread to suggest this but CMake's configure_file command works great to implement generic files...

uecker · a month ago

There are less annoying ways to implement this in C. There are at least two different common approaches which avoid having macro code for the generic functions:

The first is to put this into an include file

  #define type_argument int
  #include <vector.h>

Then inside vector.h the code looks like regular C code, except where you insert the argument.

  foo_ ## type_argument ( ... )

The other is to write generic code using void pointers or container_of as regular functions, and only have one-line macros as type safe wrappers around it. The optimizer will be able to specialize it, and it avoids compile-time explosion of code during monomorphization,

I do not think that templates are less annoying in practice. My experience with templates is rather poor.

cyber1 · a month ago

Hey, I understand you and know this stuff well, having worked with it for many years as a C dev. To be honest, this isn't how things should generally be done. Macros were invented for very simple problems. Yes, we can abuse them as much as possible (for example, in C++, we discovered SFINAE, which is an ugly, unreadable technique that wasn't part of the programming language designer's intent but rather like a joke that people started abusing), but is it worth it?

SAI_Peregrinus · a month ago

The name has to be ugly, new names in C are always taken from the set of reserved identifiers: those starting with an underscore & a capital letter, or with two underscores. Since they didn't reserve any "normal" names, all new keywords will be stuff like `_Keyword` or `__keyword`, unless they break backwards compatibility. And they really hate breaking backwards compatibility, so that's quite unlikely.

ioasuncvinvaer · a month ago

username checks out

uecker · a month ago

I don't struggle, I switch from C++ to C and find this much nicer.

cyber1 · a month ago

I'm currently at a crossroads: C++ or Zig. One is very popular with a large community, amazing projects, but has lots of ugly design decisions and myriad rules you must know (this is a big pain, it seems like even Stroustrup can't handle all of them). The other is very close to what I want from C, but it's not stable and not popular.

rurban · a month ago

Macros are the best possibly approach, compared to C++ templates or _Generic

codr7 · a month ago

From my experience, trying to make C type safe is counter productive.

It's perfectly possible to do generic vectors in C without twisting the language. This implementation isn't as safe as alternatives in other languages, but plays well on C's strengths.

https://github.com/codr7/hacktical-c/tree/main/vector

uecker · a month ago

Why would you think this? My implementation is type and bounds safe and nice to use.

codr7 · a month ago

Because that level of (type) safety and C is a bad fit.

hyperbolablabla · a month ago

I think the overwhelmingly better approach for C is codegen here. Better ergonomics, tab completion, less error prone, etc. As long as your codegen is solid!

uecker · a month ago

Why? I do not find the ergonomics bad.

It is also not clear how you get tap completion with code generation. But you could also get tab completion here, somebody just has to add this to the tab completion logic.

itay2805 · a month ago

By far the best implementation for type-safe containers in C I found is stb_ds, it provides both a vector and a hashmap (including a string one).

https://nothings.org/stb_ds

lor_louis · a month ago

I do something similar, but I don't implement the logic in a macro, instead I have a Vec struct which looks like

    struct Vec {
        void *data;
        size_t len;
        size_t cap;
        size_t sizeof_ty;
    }

I then use a macro to define a new type

    IntVec {
        struct Vec inner;
        int ty[0];
    }

Using the zero sized filed I can do typeof(*ty) to get some type safety back.

All of the methods are implemented on the base Vec type and have a small wrapper which casts/assets the type of the things you are trying to push.

eps · a month ago

The post is more of a quick-n-dirty (and rather trivial) proof of concept as the code includes only sporadical checks for allocation errors and then adds a hand-wavy disclaimer to improve it as needed.

E.g. in production code this

  if (!vec_ptr) // memory out
    abort();

  for (int i = 0; i < 10; i++)
    vec_push(int, &vec_ptr, i);

should really be

  if (!vec_ptr) // memory out
    abort();

  for (int i = 0; i < 10; i++)
    if (! vec_push(int, &vec_ptr, i))
      abort();

but it doesn't really roll of the tongue.

johnisgood · a month ago

If this is in a library code, then I tend to disagree. As an user of a library, I would rather be able to handle errors the way I want, I do not want the library to decide this for me, so just return an error value, like "VEC_ERR_NOMEM", or whatever.

uecker · a month ago

If all you do is call abort anyway, you do not need an interface that makes you test for errors.

gsliepen · a month ago

It's amazing how many people try to write generic containers for C, when there is already a perfect solution for that, called C++. It's impossible to write generic type-safe code in C, and this version resorts to using GCC extensions to the language (note the ({…}) expressions).

For those afraid of C++: you don't have to use all of it at once, and compilers have been great for the last few decades. You can easily port C code to C++ (often you don't have to do anything at all). Just try it out and reassess the objections you have.

uecker · a month ago

Except that I find C++ far from being perfect. In fact, I switched from C++ to C (a while ago) to avoid its issues and I am being much happier I also find my vec(int) much nicer.

In fact, we are at the moment ripping out some template code in a C code base which has some C++ for cuda in it, and this one file with C++ templates almost doubles the compilation time of the complete project (with ~700 source files). IMHO it is grotesque how bad it is.

serbuvlad · a month ago

My problem with C++, and maybe this is just me, is RAII.

Now, Resource Aquisition Is Initialization is correct, but the corollary is not generally true, which is to say, my variable going out of scope does not generally mean I want to de-aquire that resource.

So, sooner or later, everything gets wrapped in a reference counting smart pointer. And reference counting always seemed to me to be a primitive or last-resort memory managment strategy.

spacechild1 · a month ago

> my variable going out of scope does not generally mean I want to de-aquire that resource.

But it does! When an object goes out of scope, nobody can/shall use it anymore, so of course it should release its (remaining) resources. If you want to hold on the object, you need to revisit its lifetime and ownership, but that's independent from RAII.

gpderetta · a month ago

Your problem is not with RAII, but with reference counting, which you correctly identified should be the last resort, not the default; at least for the applications typically written in C++.

Lvl999Noob · a month ago

Instead of reference counting, consider having two types. An "owner" type which actually contains the resource and the destructor to dequire the resource. And "lender" types which contain a reference (a pointer or just logically (e.g., an fd can just be copied into the lender but only closed by the owner) to the resource which don't dequire on destruction.

Same thing as what Rust does with `String` and `str`.

secondcoming · a month ago

If you want to take back manual control, use the release() function