In my personal experience, "a bit more than a pointer" works best as a pair of (start, end) pointers (where "end" points to just beyond the last element.) The most obvious reasons for this are:
- slices become a total non-issue since a pair of (start, end) already is a slice and you can just move start and end.
- comparing against an end pointer is generally easier than adding up a length value first, particularly if you're slicing at the same time.
- the end pointer value is independent of the array element type, so if you e.g. cast to uint8_t * (which arguably you shouldn't in most cases) it stays exactly the same. If you store a count you need to adjust a multiplier. If you store a byte length, you need to do a lot of divides or casts to deal with pointer arithmetics.
#define is ==
#define isnt !=
#define not !
#define and &&
#define or ||
#define in ,
P.S.: This also is a "try to invent a new programming language without inventing a new programming language" thing. Have your cake and eat it... either it's C or it isn't, and this library is leaving the space of "normal" C.
I don’t think you’re too far off with ‘leaving the space of normal C’, but I think it may help to see the context from which the author was coming when writing Cello [1] and evaluating it in that context.
It’s been a while since I watched the talk but I believe his intention was to do just that, to push the bounds of what could be done in a header file purely for the fun of it. The second half of the talk specifically addresses the “why are you doing this?” in quite a charming way.
I'll admit I've never written more than small programs in C, but the criticism that this "isn't C" isn't a fair criticism to me. He's not doing anything more than any other library can do, were he to write a compiler that mapped directly to really boring C, would it be more or less C than this? I don't feel like those questions are useful. We need experiments like this, and for me personally, cello was a revelation when it was posted here years ago. There are no rules that say you can't do it.
I know he's doing a little hackiness by placing the size before the start of the object, but they also take that approach here (https://www.piumarta.com/software/cola/objmodel2.pdf) so maybe it's not that uncommon. How would you do the dual pointer setup? Is there any overhead from that, or is it small enough to not worry about?
The "isn't C" argument is more about the library as a whole, not specifically about the fat pointer suggestions. (Please look at the github repo, IMHO it's obvious.)
Placing a length "before" a pointer is perfectly fine on a _technical_ level. It's also how glibc's malloc works, it has its own data before any allocation it returns. However, hackiness is not the question here - it's whether it's the "best" approach. I simply believe, based on my own experiences, that twin pointers cover/win out in a much larger subset of applicable scenarios.
As for how to implement it - declare a struct with 2 pointers in it. Or just pass 2 pointers around.
> In my personal experience, "a bit more than a pointer" works best as a pair of (start, end) pointers (where "end" points to just beyond the last element.)
There’s a PARC paper from the Cedar team benchmarking base-and bounds vs marker-terminated (e.g. null terminated) strings and base-and-bounds won hands down on a variety of use cases. Won on speed, not just the safety grounds. In those days some people actually worried about the space taken up by the extra bounds variable.
While people did worry about the space taken up by the bounds checking information, systems 10 years older than the PDP-11 had enough hardware resources to support high level systems programming languages with bounds checking data structures, let alone the beefy PDP-11 (by comparison with those older models).
I actually wrote a (sort of) Go-style slice library. It's a little heavier than Go slices because it allows for dynamic array resizing and tracking the parentage of slices:
Go has a rather idiosyncratic take on arrays and how they're used, which is reflected in its slices. I can't think of any other language or framework that did it this way.
The section "struct Header* self = head" is UB.
The alignement requirement of the local char array is 1 but the alignment
requirement of struct Header is that of void* which is probably 8.
Perhaps a stupid questoon: Why isn't a vector type similar to { ptr, count } a normal thing to pass around in C? It's what you reach for in any other language, why did it become idiomatic to pass pointers and lengths separately in C?
A C standard library has a header file for complex math but it doesn't define a simple fixed size array struct? Why is that? Is it because they become pointless when there is no generics to deal with the stride?
> why did it become idiomatic to pass pointers and lengths separately in C?
I've read that it's because there used to be binary interface issues with structures. They can be returned from functions and passed as parameters but it isn't immediately clear how that happens: is it on the stack, in one register or in several registers? Even today there are compiler options that affect the generated code in those cases:
-fpcc-struct-return
Return “short” struct and union values in memory like
longer ones, rather than in registers.
-freg-struct-return
Return struct and union values in registers when possible.
>They can be returned from functions and passed as parameters but it isn't immediately clear how that happens: is it on the stack, in one register or in several registers?
Why does it have to be clear? It can be unspecified and the compiler will do what it thinks is best given the struct, e.g. return `struct {int x,y};` in registers, return `struct int[80] x}` as pointer to memory or write in-place to the caller's stack, via RVO.
It doesn't have to be a return value though. You could pass pointers to it as parameters.
To answer the question though, a number of people do define structures containing a buffer and a length (and potentially capacity), there just isn't such a structure standardized so everybody who wants to do this has to bring their own.
Some examples from Unix: iovec, sendmsg/recvmsg. Surely there are others I'm just not thinking of right now.
In the Windows world you have UNICODE_STRING and similar structures. SChannel has "PSecBufferDesc". Again, surely there are others.
And prominent libraries might also have their own.
Because C is a thin layer above assembly language. A pointer fits in one register, { ptr, count } would require two registers. Also if the count is being passed around, it should surely be checked when doing ptr + i, which slows things down further and is unnecessary if the caller knows what they are doing. If you start trying to make C safe and idiot-proof you also make it slower.
Yet plenty of Assembly languages have opcodes for bounded checking memory accesses, some of them in computer systems developed in the early 60's, 10 years before C was born.
> Why isn't a vector type similar to { ptr, count } a normal thing to pass around in C?
For one thing, as I recall in the original K&R version of C (before ANSI C89), the language didn't support passing a struct as a function argument or return value.
That means if you did make a struct, then every time you wanted to pass one of these pairs around, you'd have to pass a pointer to the struct, and you'd have to dereference that on every use. Which is arguably just as cumbersome as just passing two arguments, at least in terms of how much code you have to type. Plus it was probably slower.
From there it's no surprise if using separate arguments becomes the normal, idiomatic way to do it.
It's indeed a weird thing. It should have been an easy addition to the standard library. But instead people either pass around pointer/length pairs all the time - or even worse: They rely on null-terminated strings / arrays.
I had discussions with people who claimed that null terminated strings are the only idiomatic thing to do in C - because that is how C does strings. They assumed that since the standard library only provided methods which acted on those kinds of strings it was a preferred way to do things. Even though that is a lot less efficient than the string/array types that other languages use as defaults.
Given that years ago we added a mandatory piece of hardware to most systems to implement virtual memory, I'm now starting to wonder what security and/or performance benefits could be achieved by delegating memory allocation to (or through) hardware.
Since years, Oracle has shipped SPARC Solaris with ADI turned on.
Since iPhone X iOS makes use of memory tagging for pointers.
Starting with Android 11, hardware memory tagging is a required feature on ARM platforms on the CPUs that support it, while on other CPUs the kernel will randomly attach GWP-ASan to user processes and is enabled by default on all system processes during the ongoing preview releases.
Every new (post-2018) iPhone ships with this. iOS developers can build code for the architecture but I believe Apple currently strips it out before distribution, so its use is limited to the OS for now. I would assume at some point they’ll flip the switch to allow it; until then developers can use the toolchain to test if their code still works (generally it does, but messing with function pointers in ways unspecified by the standard can occasionally cause problems). ‘pjmlp is fairly interested in this topic so they might be able to share some more examples of it being used if they drop by the thread.
The summary version (from Walter Bright's article) is:
> C can still be fixed. All it needs is a little new syntax:
> void foo(char a[..])
> meaning an array is passed as a so-called "fat pointer", i.e. a pair consisting of a pointer to the start of the array, and a size_t of the array dimension.
The paper mentions fat pointers in passing, not putting the term in quotes, not defining it, and not giving a citation -- which makes it clear that the term was already well established at the time.
Fat pointers were part of Pascal (and derivatives), although I'm sure the concept has existed in one form or another going back to the beginning.
edit: Pascal pointers were just a location and size, however, not a slice-type fat pointer; however, I have always heard of any pointer containing more information than a memory address referred to as a fat pointer (except tagged pointers). YMMV.
Sure, I never said that Walter Bright created the concept. What I’m saying is that the link from Cello that I posted on HN is actually using that definition from Walter Bright’s article.
Is this blog post confused or am I confused? It keeps talking about fat pointers but the description looks much more like "arrays with their length stored before their first element," which is a massive difference.
It's just using "fat pointer" to refer to the concept of passing around a pointer with extra information concerning the data it points to. I agree that generally people would expect "fat pointer" to imply a larger pointer itself, but I don't think the label is misused egregiously enough to warrant picking at this.
I understand their desire to use a library, but there's a faster and safer way to do this that's more C-like if you have access to the compiler:
Just locate anything declared as an array in a particular linker section so the pointer manipulation can be done with two (or one if it's at the top of memory) comparison, possibly even to a constant.
If you do this you can even forbid pointer arithmetic except in actual []-declared memory, and can do transparent bounds checking (&array-1 can hold the array length or, possibly faster, the address of the location after the end of the array).
An advantage of this over the library route is you can prevent pointer/array punning but otherwise allow any C program to work fine. And apart from a few corner cases (there are legit non-array uses of pointer arithmetic, though very few) and noncompliant program can be changed to use [] and still work perfectly fine without this option being used.
"This proposal wasn't accepted into the C standard..."
Walter often shows up on HN, so I'll ask: was this proposal merely on the Dr Dobbs article or did it actually go to a committee for review? If the latter, why wasn't it accepted?
Should C reconsider this? Especially now that C++ has std::span<> and std::string_view<>?
- slices become a total non-issue since a pair of (start, end) already is a slice and you can just move start and end.
- comparing against an end pointer is generally easier than adding up a length value first, particularly if you're slicing at the same time.
- the end pointer value is independent of the array element type, so if you e.g. cast to uint8_t * (which arguably you shouldn't in most cases) it stays exactly the same. If you store a count you need to adjust a multiplier. If you store a byte length, you need to do a lot of divides or casts to deal with pointer arithmetics.
Also, this is a huge red flag to me:
https://github.com/orangeduck/Cello/blob/master/include/Cell...
P.S.: This also is a "try to invent a new programming language without inventing a new programming language" thing. Have your cake and eat it... either it's C or it isn't, and this library is leaving the space of "normal" C.It’s been a while since I watched the talk but I believe his intention was to do just that, to push the bounds of what could be done in a header file purely for the fun of it. The second half of the talk specifically addresses the “why are you doing this?” in quite a charming way.
1: https://youtu.be/bVxfwsgO00o
I know he's doing a little hackiness by placing the size before the start of the object, but they also take that approach here (https://www.piumarta.com/software/cola/objmodel2.pdf) so maybe it's not that uncommon. How would you do the dual pointer setup? Is there any overhead from that, or is it small enough to not worry about?
Placing a length "before" a pointer is perfectly fine on a _technical_ level. It's also how glibc's malloc works, it has its own data before any allocation it returns. However, hackiness is not the question here - it's whether it's the "best" approach. I simply believe, based on my own experiences, that twin pointers cover/win out in a much larger subset of applicable scenarios.
As for how to implement it - declare a struct with 2 pointers in it. Or just pass 2 pointers around.
There’s a PARC paper from the Cedar team benchmarking base-and bounds vs marker-terminated (e.g. null terminated) strings and base-and-bounds won hands down on a variety of use cases. Won on speed, not just the safety grounds. In those days some people actually worried about the space taken up by the extra bounds variable.
If it's good enough for C greybeard Ken Thompson and unix hacker Rob Pike, its good enough for me.
In fact I've looked for a port of go-style slices to C and haven't found one. Maybe people think sds is good enough?
A slice is stored in memory as a `reflect.SliceHeader` https://golang.org/pkg/reflect/#SliceHeader ; the pointer does come first.
https://github.com/jgbaldwinbrown/slice
It's just a proof-of-concept, though, and would need a lot of work to be used in anything serious.
Puke! Who would ever want to create that kind of macro abomination.
And I'm not even joking :)
var is a typedef for void* and no & appears in the function.
A C standard library has a header file for complex math but it doesn't define a simple fixed size array struct? Why is that? Is it because they become pointless when there is no generics to deal with the stride?
I've read that it's because there used to be binary interface issues with structures. They can be returned from functions and passed as parameters but it isn't immediately clear how that happens: is it on the stack, in one register or in several registers? Even today there are compiler options that affect the generated code in those cases:
https://gcc.gnu.org/onlinedocs/gcc/Code-Gen-Options.html#ind...https://gcc.gnu.org/onlinedocs/gcc/Code-Gen-Options.html#ind...
https://gcc.gnu.org/onlinedocs/gcc/Incompatibilities.html#in...
Why does it have to be clear? It can be unspecified and the compiler will do what it thinks is best given the struct, e.g. return `struct {int x,y};` in registers, return `struct int[80] x}` as pointer to memory or write in-place to the caller's stack, via RVO.
To answer the question though, a number of people do define structures containing a buffer and a length (and potentially capacity), there just isn't such a structure standardized so everybody who wants to do this has to bring their own.
Some examples from Unix: iovec, sendmsg/recvmsg. Surely there are others I'm just not thinking of right now.
In the Windows world you have UNICODE_STRING and similar structures. SChannel has "PSecBufferDesc". Again, surely there are others.
And prominent libraries might also have their own.
A ptr to an array of unknown length can’t be used without the length being passed around next to it whereever it’s passed.
You cant deref ptr+i without knowing that it’s under length etc.
Two bits of data that belong together seems like they would be convenient to pass (or return!) as one argument or return value.
It’s especially terrible with methods that e.g take two lists and return a third. That should be two arguments and a return value, not six arguments.
For one thing, as I recall in the original K&R version of C (before ANSI C89), the language didn't support passing a struct as a function argument or return value.
That means if you did make a struct, then every time you wanted to pass one of these pairs around, you'd have to pass a pointer to the struct, and you'd have to dereference that on every use. Which is arguably just as cumbersome as just passing two arguments, at least in terms of how much code you have to type. Plus it was probably slower.
From there it's no surprise if using separate arguments becomes the normal, idiomatic way to do it.
I had discussions with people who claimed that null terminated strings are the only idiomatic thing to do in C - because that is how C does strings. They assumed that since the standard library only provided methods which acted on those kinds of strings it was a preferred way to do things. Even though that is a lot less efficient than the string/array types that other languages use as defaults.
Given that years ago we added a mandatory piece of hardware to most systems to implement virtual memory, I'm now starting to wonder what security and/or performance benefits could be achieved by delegating memory allocation to (or through) hardware.
Since years, Oracle has shipped SPARC Solaris with ADI turned on.
Since iPhone X iOS makes use of memory tagging for pointers.
Starting with Android 11, hardware memory tagging is a required feature on ARM platforms on the CPUs that support it, while on other CPUs the kernel will randomly attach GWP-ASan to user processes and is enabled by default on all system processes during the ongoing preview releases.
The summary version (from Walter Bright's article) is:
> C can still be fixed. All it needs is a little new syntax:
> void foo(char a[..])
> meaning an array is passed as a so-called "fat pointer", i.e. a pair consisting of a pointer to the start of the array, and a size_t of the array dimension.
The paper mentions fat pointers in passing, not putting the term in quotes, not defining it, and not giving a citation -- which makes it clear that the term was already well established at the time.
edit: Pascal pointers were just a location and size, however, not a slice-type fat pointer; however, I have always heard of any pointer containing more information than a memory address referred to as a fat pointer (except tagged pointers). YMMV.
They even link to it.
Just locate anything declared as an array in a particular linker section so the pointer manipulation can be done with two (or one if it's at the top of memory) comparison, possibly even to a constant.
If you do this you can even forbid pointer arithmetic except in actual []-declared memory, and can do transparent bounds checking (&array-1 can hold the array length or, possibly faster, the address of the location after the end of the array).
An advantage of this over the library route is you can prevent pointer/array punning but otherwise allow any C program to work fine. And apart from a few corner cases (there are legit non-array uses of pointer arithmetic, though very few) and noncompliant program can be changed to use [] and still work perfectly fine without this option being used.
Walter often shows up on HN, so I'll ask: was this proposal merely on the Dr Dobbs article or did it actually go to a committee for review? If the latter, why wasn't it accepted?
Should C reconsider this? Especially now that C++ has std::span<> and std::string_view<>?
Contrary to common HN wisdom, most C and C++ related surveys show that only up to 50% actually use some kind of analysis tooling.