Joint Allocations in C++

Sadly, this is representative of the new trend of C++ programmers. I will get down voted, hated, and laughed at, but I do prefer the original version of the code. It is infinitely more simple and readable, and the layers of abstraction added are of pretty much no value here.

If I stumbled across the latter version, I would just scratch my head. And I've seen sooo many projects just end up in a big code bloat just because some programmers wanted to add tons of "new" features like this that add nothing and just complexify the code for no reason.

jasode · 10 years ago

> infinitely more simple and readable,

It's simpler at the expense of being more bug prone. Notice that the "simpler" struct has no housekeeping code to "delete[]" the memory. The programmer now has to manually "manage" that cleanup somewhere.

>, and the layers of abstraction added are of pretty much no value here.

The "value" he's addressing from the "simpler" but more fragile code:

1) remove bug-prone code which requires manually synchronization (copy&paste) of pointer types in both the lines defining the variable and the subsequent lines casting the pointer from a raw buffer: "mesh->positions = (Vector3 )mesh->memory_block;"

2) remove brittle manual arithmetic of pointer offsets embedded inside contiguous arrays such as: "mesh->uvs = (Vector2 )(mesh->memory_block + positions_size + indices_size);"

Calculating offsets (and/or enforcing logical boundaries) inside of arrays is very error prone and the OpenSSL Heartbleed bug was an example of that type of defect.

3) manage the release of memory using unique_ptr. Obviously, it is to help the programmer (and other coworkers using Mesh data structs) avoid memory leaks.

>features like this that add nothing and just complexify the code for no reason.

It doesn't look like "no reason" to me. The attempt was to reduce bugs, and add memory safety.

svalorzen · 10 years ago

I don't understand how you can say that with a straight face. The ArrayView class is incredibly simple and takes about 10 seconds to understand. The "complex" stuff is in the make_contiguous code, which in the end is simply adding integers taking into account alignment requirements (which you still need to understand if you want to write the first version in the first place).

And seriously, did you read the initial point? The problem is that the first approach must be fully rewritten for every new class with different members that exists in your code. Given how HN is butthurt over off-by-one and out-of-buffer errors, you can't say that the first example can really be debugged easily when you have 30 of them in your code. If you only had the Mesh class in your code and nothing else similar maybe it would not be worth it.. but only maybe.

The final example is incredibly simple to use, hides the complexity in one single point and prevents you from ever making mistakes outside of it and seriously, it is not that complex.

Galanwe · 10 years ago

Sorry, but again I have to disagree with you on every point. I may appear as an old grumpy programmer, but this kind of article exactly falls in what Zed Shaw calls "the expert", and I totally agree with him. (http://zedshaw.com/archive/the-master-the-expert-the-program...)

There are two 3 wrong premises in this article (IMHO): - It makes you think that using raw pointers (so called 'low level types' in the article) is more error prone than using combined types. - It makes you think that adding abstraction removes complexity. - Makes you think that you should combine/factor everything because repeating yourself is forbidden.

These assertions are straight false.

- Adding abstraction does NOT remove complexity. It ADDS complexity, and then HIDES it. The ArrayView class has absolutely no use whatsoever. Every C/C++ programmer is used to access arrays through pointers. Creating a class for that just makes a reader wonder "what the hell is this thing?". And then you have to crawl in the code to understand what is this thing. And of course, in 3/5 years from now, this ArrayView class will have tripled in size. For something as simple as reading numbers in memory, i don't want any abstraction.

- There is nothing wrong in repeating yourself once on a while => if it improves readability! Factoring too much IS wrong. It makes reading the code a pain. When I read code, I don't want to read an abstract version of what the code does. I want to grasp the maximum possible information with the minimum noise. After reading the final version of this Mesh class, I have no idea of its memory layout. I see some ArrayView and a magical create_contiguous_memory function. I'm lost. I don't know where the memory come from, whether or not I own it, what will happen if I hand a pointer to it. Everything is hidden, and that's a pure pain to understand.

quicknir · 10 years ago

Post author here. Sorry you feel that way. I'm surprised you don't think that moving out the messy code into make_contiguous is a win. Isn't it good to factor out clearly reusable code into functions, so you can reuse it? Would you really prefer writing out code like that in a dozen places for a dozen classes that needed contiguous storage, to writing one function, testing it, and then calling it from all those other places?

JoeAltmaier · 10 years ago

The 'functional people' are currently anti-reusable code. It gets in their way and complexifies their function graph. Which are bad things for them.

As a lifetime C++ guy I find this new paradigm (inline code a-la structured code) brittle and fragile. App assumptions are laced throughout; nothing can be changed without a rewrite. And my go-to code-reading techniques (follow a class through all its incarnations with grep etc) fail - they rarely subclass, just overriding existing classes locally to create callbacks etc.

So I'm not a fan of the new order. I see it's a productivity tool for the app writers out there. Especially with the one-off code that startups write, expecting to do a rewrite once they're funded. It really makes sense for them.

But I weep for the craft.

yassim · 10 years ago

Hi, Another Game dev here. Firstly, neat article, thanks for writing it.

I'm curious as to the problem you're exploring/solving here?

"Concatenated struct's in single chunk" is a neat trick. The use case seems a bit broken[1], but I'm assuming that it's from the original video, and your c++-ifying it? Or at least trying to find a nicer way to write this style of code?

If you'r going this route, I'd be tempted not to use pointers at all, and just use offsets and accessors.[2]

  * Lack of pointers means you can load in 1 read. (endian needs to be watched, and assumes you can get the entire size elsewhere)
  * You can also shuffle it around in memory should you be doing something fancy. (defragging heaps might be useful if doing a sandbox)
  * Better encapsulation? maybe..

But the basic point I'd like to make is, there is no nice way to write this type of code, because c and c++ gives us no nice[3] way of expressing it.

[1] Verts and Indices are generally uploaded to a GPU then discarded, but this has the counts, which is needed CPU side for draw calls, in the same chunk so would need to be copied before freeing.

[2]

  struct Mesh 
  {
  	int32_t num_indices, 
  		num_verts;

  	// inline to this struct.. 
  	// assumes Vector3 is aligned 4 bytes.
  	// and that Mesh has been alloced to same alignment.
  	//
  	// Vector3 positions[num_verts];
  	// int32_t indices[num_indices];
  	// Vector2 uvs[num_verts];
  
	inline const Vector3* PositionsBegin() const { return reinterpret_cast<const Vector3*>(this + 1); }
  	inline const Vector3* PositionsEnd() const { return PositionsBegin() + num_verts; }
  	inline const int32_t* IndicesBegin() const { return reinterpret_cast<const int32_t*>(PositionsEnd()); }
  	inline const int32_t* IndicesEnd() const { return IndicesBegin() + num_indices; }
  	// etc
  };

[3] Maintainable, fast at runtime, low mental friction to people other than the author, etc

peterashford · 10 years ago

I agree with you. While I don't code c++ now it used to be my tool of choice (right back to when it was a c pre-processor) and every time I look in, the language is uglier and conceptually heavier. If there's two ways to solve a problem, c++ will provide three - none of which work on their own without gotchas, none which play nicely with each other and each which will be required for working with some 3rd party library you need.

xedarius · 10 years ago

I could not agree more with you. You would never write a class like that for a skinning system, it's full of bloat and template guff.

Also at some point you're going to have to promote this to the GPU, at that point the original code will have a closer affinity to the GPU code rather the nonsense c++. I can forgive this though as he states he's not a games programmer.

humanrebar · 10 years ago

There is a third way... don't write ArrayView<> and put all the "C style" code in the constructor and destructor of a Mesh class. Then you get something you find intuitive and you get exception safety.

Your comment that this is less readable is valid. Readability varies widely with background, skillset, consistency, and a host of other factors. If your team finds C-style code more intuitive, don't write Java style code. If your team finds malloc error prone, use smart pointers.

Perhaps the biggest advantage of C++ is its flexibility, at least compared to the other mainstream languages. This means everyone can write code the way they want and others can use it (a).

(a) OK. There is still a large list of gotchas around types, the preprocessor, header inclusions, namespacing, etc. that you need to know about to be a good citizen, at least until C++ can figure out how to support modules.

royjacobs · 10 years ago

The current trend is for C++11 libraries to be header-only, lightweight and specifically without the hacks that have traditionally been required to work around all sorts of compiler issues.

carlosrg · 10 years ago

The original version is terrible. C code written in C++. Basically an high level assembly language. Write all your code this way and soon you'll stumble with all the nastiness of manual memory management.

Code like that is the main culprit of C++ is still considered an "unsafe" language, despite a correct usage of it, although more verbose and with more abstractions, brings both the benefits of C-like speed and higher level languages safety.

gd1 · 10 years ago

>Write all your code this way and soon you'll stumble with all the nastiness of manual memory management.

See, this is what I never quite understand. I use C++ when I want to do C type things. Manage my own memory. Bit twiddle with speed. Avoid all dynamic allocations. Increase cache coherency. Call inline assembler. Use the CRTP to achieve static polymorphism. Design my own containers. Lock free containers with atomics. Etc. If I'm using C++ I want nastiness. I want to get down and dirty.

Why on earth would you ever want to use it as a higher level lanuage? It sucks balls at that. Just go and use a proper, nice, well designed high level language instead. If I want a language that holds my hand and protects me from the evil pointers, I'm not choosing C++. Using C# or Java (for example) is complete luxury compared to C++.

But apparently, there are a whole bunch of people that do attempt to use C++ for this purpose. Who strive to achieve a language where you never see a raw pointer, but you are still stuck with maybe 30% of the productivity you'd have in a proper higher level language. It boggles my mind. Congratulations, now you have no garbage collection, a crap IDE, you're still wrestling with forward declarations of classes and header files and the preprocessor and linker options and slow compilation and you've now buried the machine under a layer of abstraction. The absolute worst of both worlds.

>brings both the benefits of C-like speed and higher level languages safety.

It doesn't. You end up with neither!

pczarn · 10 years ago

The original version is not only bug prone. In some scenarios, both versions have a potential vulnerability. Can you spot it?

In a scenario where users can access elements of a joint array of requested size without limitations, they can access memory outside the array. Memory block's total requested size may overflow size_t. Fortunately, overflow prevention is easy to add to the generic version.

Galanwe · 10 years ago

> both versions have a potential vulnerability. Can you spot it?

What the hell are you talking about? It's a game.

> In a scenario where users can access elements of a joint array of requested size without limitations, they can access memory outside the array

Welcome to modern computing! A process can access any part of its mapped address space without limitation. Joke aside, are you suggesting a process should hide memory from itself? Forget it, I can't even make sense out of your comment.

> Memory block's total requested size may overflow size_t

Are you serious? Then please suggest a fix for C, C++, Java, and whatsnot, because nothing will prevent you in pretty much any language to ask for more memory than you can handle. This is not a _bug_, it's just expecting the programmer not to fuck around.

72deluxe · 10 years ago

The second version is quite convoluted but it makes sense. But as you say, in this instance I don't think it is a problem that really needs to be fixed when the first one is quite obvious and simple to read.

And to me, the first version does look like someone used to C who comes to write C++ but writes their C++ like it is C. Wrapping it in the second version (as others have pointed out) is safer because it is reusable, and cleans itself up.

Deleted Comment

huhtenberg · 10 years ago

> the new trend of C++ programmers

Regrettably this trend is all but new. Boost is god knows how old now and just look how popular it is in certain circles.

peterashford · 10 years ago

Quite. It's been going on from the outset. Basically they got it wrong and they've been hacking at it ever since in a vain attempt to get it right. The should have killed the C linking compatibility from the outset (header files are devil spawn), they should have had a native string type as well.

D represents a much better attempt at what C++ ought to have been.

I was recently bit using a similar technique to the original code for an intrusive type in C. Using manual non-typesafe raw offsets for things can definitely lead to nasty bugs.

Automating the calculation of offsets and assigning pointers definitely eliminates a lot of potential bugs, but I do wonder if this isn't a bigger deficiency in C++. Why force storing an extra pointer per array? I am still not sure why C++ doesn't allow FAMs [1].

There's probably a question of which would be more efficient, storing pointers in the object and calculating the offset from the pointer, or dynamically calculating the offset within the object. Still, it doesn't seem like a problem developers should need to solve.

[1] https://en.wikipedia.org/wiki/Flexible_array_member

quicknir · 10 years ago

It's not really an extra pointer, unless you assume there are no alignment issues. If you assume that there's zero padding between members, then yes, you can do 1 pointer for storage + 1 per view. In the follow-up post, I plan to either do one pointer per view, or one integer per view, haven't decided which.

The thing is, who should solve it? The different approaches you listed have different advantages, there isn't one right answer. If the language itself solves it for you, you are stuck with whatever solution the language picked. This would be fine in a higher level language, but not in C++.

You're right though that individual devs shouldn't be solving it, it should be in a library. If there's enough interest in these posts, I'm happy to put up my work (fully fleshed out and documented) on a github for people to use.

uxcn · 10 years ago

I suspect the compiler will be better able to optimize offsets than pointers just because of the semantics required by the language, but I think you're right that it isn't necessarily one size fits all. Another possible solution might even be storing static bounds at specific intervals. Without numbers, I can only guess which would give optimal performance though.

I think the library approach is right. It would be nice to see the language transparently support contiguous array members, but supporting it in a library will allow more people to use it regardless of compiler. It avoids the standards approval process and implementation as well, which take non-trivial amounts of time. The 0x/1y features definitely make it a lot more feasible.

It would probably give people a better chance to play with the source if you could post a link to it somewhere. I'm not sure what you plan to license it under.

I'll keep an eye out for the next post.

yoklov · 10 years ago

I imagine C++ doesn't allow FAMs because it wouldn't know how to call the destructors/constructors on the members of the array (or how many to call).

uxcn · 10 years ago

To allow FAMs for non-trivially constructible/destructible types would be slightly more complicated than C FAMs, but it doesn't seem like an unreasonable thing to expect from the language/library, or intractable from a standard or compiler perspective.

Regardless, I'm still not sure what the justification is requiring it be UB even for just primitive types.