Fat pointers in C using libcello

In my personal experience, "a bit more than a pointer" works best as a pair of (start, end) pointers (where "end" points to just beyond the last element.) The most obvious reasons for this are:

- slices become a total non-issue since a pair of (start, end) already is a slice and you can just move start and end.

- comparing against an end pointer is generally easier than adding up a length value first, particularly if you're slicing at the same time.

- the end pointer value is independent of the array element type, so if you e.g. cast to uint8_t * (which arguably you shouldn't in most cases) it stays exactly the same. If you store a count you need to adjust a multiplier. If you store a byte length, you need to do a lot of divides or casts to deal with pointer arithmetics.

Also, this is a huge red flag to me:

https://github.com/orangeduck/Cello/blob/master/include/Cell...

  #define is ==
  #define isnt !=
  #define not !
  #define and &&
  #define or ||
  #define in ,

P.S.: This also is a "try to invent a new programming language without inventing a new programming language" thing. Have your cake and eat it... either it's C or it isn't, and this library is leaving the space of "normal" C.

shawa_a_a · 6 years ago

I don’t think you’re too far off with ‘leaving the space of normal C’, but I think it may help to see the context from which the author was coming when writing Cello [1] and evaluating it in that context.

It’s been a while since I watched the talk but I believe his intention was to do just that, to push the bounds of what could be done in a header file purely for the fun of it. The second half of the talk specifically addresses the “why are you doing this?” in quite a charming way.

1: https://youtu.be/bVxfwsgO00o

dllu · 6 years ago

huh. Interestingly, for "and", "or" and "not", iso646.h already defines C alternative tokens: https://en.wikipedia.org/wiki/C_alternative_tokens

eqvinox · 6 years ago

I know :) ... but have you seen anyone actually use those? I'd be curious to get a pointer or two.

aaron-lebo · 6 years ago

I'll admit I've never written more than small programs in C, but the criticism that this "isn't C" isn't a fair criticism to me. He's not doing anything more than any other library can do, were he to write a compiler that mapped directly to really boring C, would it be more or less C than this? I don't feel like those questions are useful. We need experiments like this, and for me personally, cello was a revelation when it was posted here years ago. There are no rules that say you can't do it.

I know he's doing a little hackiness by placing the size before the start of the object, but they also take that approach here (https://www.piumarta.com/software/cola/objmodel2.pdf) so maybe it's not that uncommon. How would you do the dual pointer setup? Is there any overhead from that, or is it small enough to not worry about?

eqvinox · 6 years ago

The "isn't C" argument is more about the library as a whole, not specifically about the fat pointer suggestions. (Please look at the github repo, IMHO it's obvious.)

Placing a length "before" a pointer is perfectly fine on a _technical_ level. It's also how glibc's malloc works, it has its own data before any allocation it returns. However, hackiness is not the question here - it's whether it's the "best" approach. I simply believe, based on my own experiences, that twin pointers cover/win out in a much larger subset of applicable scenarios.

As for how to implement it - declare a struct with 2 pointers in it. Or just pass 2 pointers around.

artemonster · 6 years ago

Do you, by chance, know where else object model of Piumarta is used? I think the Per6 VM "Potion" used it too as a base, but I am unsure

gumby · 6 years ago

> In my personal experience, "a bit more than a pointer" works best as a pair of (start, end) pointers (where "end" points to just beyond the last element.)

There’s a PARC paper from the Cedar team benchmarking base-and bounds vs marker-terminated (e.g. null terminated) strings and base-and-bounds won hands down on a variety of use cases. Won on speed, not just the safety grounds. In those days some people actually worried about the space taken up by the extra bounds variable.

pjmlp · 6 years ago

While people did worry about the space taken up by the bounds checking information, systems 10 years older than the PDP-11 had enough hardware resources to support high level systems programming languages with bounds checking data structures, let alone the beefy PDP-11 (by comparison with those older models).

kortex · 6 years ago

This is how Go does slices, though they are 3 fields:

    type slice struct {
        zerothElement *type
        len int
        cap int
    }

Though I suspect under the hood the len and cap are ordered first.

If it's good enough for C greybeard Ken Thompson and unix hacker Rob Pike, its good enough for me.

In fact I've looked for a port of go-style slices to C and haven't found one. Maybe people think sds is good enough?

LukeShu · 6 years ago

> Though I suspect under the hood the len and cap are ordered first.

A slice is stored in memory as a `reflect.SliceHeader` https://golang.org/pkg/reflect/#SliceHeader ; the pointer does come first.

jgbaldwinbrown · 6 years ago

I actually wrote a (sort of) Go-style slice library. It's a little heavier than Go slices because it allows for dynamic array resizing and tracking the parentage of slices:

https://github.com/jgbaldwinbrown/slice

It's just a proof-of-concept, though, and would need a lot of work to be used in anything serious.

int_19h · 6 years ago

Go has a rather idiosyncratic take on arrays and how they're used, which is reflected in its slices. I can't think of any other language or framework that did it this way.

hannibalhorn · 6 years ago

Start and end pointers is the approach used by the C++ STL, too.

vardump · 6 years ago

> Also, this is a huge red flag to me:

Puke! Who would ever want to create that kind of macro abomination.

dgellow · 6 years ago

Bjarne Stroustrup! https://en.wikipedia.org/wiki/C_alternative_tokens

And I'm not even joking :)

garaetjjte · 6 years ago

You can do worse: http://oldhome.schmorp.de/marc/bournegol.html

vardump · 6 years ago

Pretty curious: someone seriously disagrees this header should not exist? I'd really love to hear the arguments!

typedef void* var; struct Header { var type; }; // ... #define alloc_stack(T) header_init( \ (char[sizeof(struct Header) + sizeof(struct T)]){0}, T) var header_init(var head, var type) { struct Header* self = head; self->type = type; return ((char*)self) + sizeof(struct Header); }

-fpcc-struct-return Return “short” struct and union values in memory like longer ones, rather than in registers. -freg-struct-return Return struct and union values in registers when possible.

The concept of "fat pointer" the article is about has been described by Walter Bright (D creator) as "C's Biggest Mistake": https://www.drdobbs.com/architecture-and-design/cs-biggest-m.... It's also an interesting read.

The summary version (from Walter Bright's article) is:

> C can still be fixed. All it needs is a little new syntax:

> void foo(char a[..])

> meaning an array is passed as a so-called "fat pointer", i.e. a pair consisting of a pointer to the start of the array, and a size_t of the array dimension.

tom_mellior · 6 years ago

It's worth noting that fat pointers didn't originate with Walter Bright or that 2009 article. The oldest C-with-fat-pointers I can think of off the top of my head is CCured from 2002: https://people.eecs.berkeley.edu/~necula/Papers/ccured_popl0...

The paper mentions fat pointers in passing, not putting the term in quotes, not defining it, and not giving a citation -- which makes it clear that the term was already well established at the time.

caspper69 · 6 years ago

Fat pointers were part of Pascal (and derivatives), although I'm sure the concept has existed in one form or another going back to the beginning.

edit: Pascal pointers were just a location and size, however, not a slice-type fat pointer; however, I have always heard of any pointer containing more information than a memory address referred to as a fat pointer (except tagged pointers). YMMV.

dgellow · 6 years ago

Sure, I never said that Walter Bright created the concept. What I’m saying is that the link from Cello that I posted on HN is actually using that definition from Walter Bright’s article.

They even link to it.

joejev · 6 years ago

Interesting idea, but this implementation has UB:

The section "struct Header* self = head" is UB. The alignement requirement of the local char array is 1 but the alignment requirement of struct Header is that of void* which is probably 8.

tomp · 6 years ago

That's just what I was wondering, are "magic" libraries like this still safe to use considering modern compiler's UB shenanigans?

asveikau · 6 years ago

Not only that but you have a pointer to a parameter returned back and used outside its scope ...

nitrogen · 6 years ago

It's not a pointer to a parameter, it is just the parameter itself.

var is a typedef for void* and no & appears in the function.

alkonaut · 6 years ago

Perhaps a stupid questoon: Why isn't a vector type similar to { ptr, count } a normal thing to pass around in C? It's what you reach for in any other language, why did it become idiomatic to pass pointers and lengths separately in C?

A C standard library has a header file for complex math but it doesn't define a simple fixed size array struct? Why is that? Is it because they become pointless when there is no generics to deal with the stride?

matheusmoreira · 6 years ago

> why did it become idiomatic to pass pointers and lengths separately in C?

I've read that it's because there used to be binary interface issues with structures. They can be returned from functions and passed as parameters but it isn't immediately clear how that happens: is it on the stack, in one register or in several registers? Even today there are compiler options that affect the generated code in those cases:

https://gcc.gnu.org/onlinedocs/gcc/Code-Gen-Options.html#ind...

https://gcc.gnu.org/onlinedocs/gcc/Incompatibilities.html#in...

andrepd · 6 years ago

>They can be returned from functions and passed as parameters but it isn't immediately clear how that happens: is it on the stack, in one register or in several registers?

Why does it have to be clear? It can be unspecified and the compiler will do what it thinks is best given the struct, e.g. return `struct {int x,y};` in registers, return `struct int[80] x}` as pointer to memory or write in-place to the caller's stack, via RVO.

It doesn't have to be a return value though. You could pass pointers to it as parameters.

To answer the question though, a number of people do define structures containing a buffer and a length (and potentially capacity), there just isn't such a structure standardized so everybody who wants to do this has to bring their own.

Some examples from Unix: iovec, sendmsg/recvmsg. Surely there are others I'm just not thinking of right now.

In the Windows world you have UNICODE_STRING and similar structures. SChannel has "PSecBufferDesc". Again, surely there are others.

And prominent libraries might also have their own.

kps · 6 years ago

Originally C didn't have the ability to pass or return structures. That was added in the V7 period, after some habits were well established.

glxxyz · 6 years ago

Because C is a thin layer above assembly language. A pointer fits in one register, { ptr, count } would require two registers. Also if the count is being passed around, it should surely be checked when doing ptr + i, which slows things down further and is unnecessary if the caller knows what they are doing. If you start trying to make C safe and idiot-proof you also make it slower.

Isn’t ptr+count always two registers if passed as two parameters too? It seems like just a syntactic difference.

A ptr to an array of unknown length can’t be used without the length being passed around next to it whereever it’s passed.

You cant deref ptr+i without knowing that it’s under length etc.

Two bits of data that belong together seems like they would be convenient to pass (or return!) as one argument or return value.

It’s especially terrible with methods that e.g take two lists and return a third. That should be two arguments and a return value, not six arguments.

Yet plenty of Assembly languages have opcodes for bounded checking memory accesses, some of them in computer systems developed in the early 60's, 10 years before C was born.

adrianmonk · 6 years ago

> Why isn't a vector type similar to { ptr, count } a normal thing to pass around in C?

For one thing, as I recall in the original K&R version of C (before ANSI C89), the language didn't support passing a struct as a function argument or return value.

That means if you did make a struct, then every time you wanted to pass one of these pairs around, you'd have to pass a pointer to the struct, and you'd have to dereference that on every use. Which is arguably just as cumbersome as just passing two arguments, at least in terms of how much code you have to type. Plus it was probably slower.

From there it's no surprise if using separate arguments becomes the normal, idiomatic way to do it.

That’s a good reason. And a weird restriction (with 2000’s glasses on designing an ergonomic language).

Matthias247 · 6 years ago

It's indeed a weird thing. It should have been an easy addition to the standard library. But instead people either pass around pointer/length pairs all the time - or even worse: They rely on null-terminated strings / arrays.

I had discussions with people who claimed that null terminated strings are the only idiomatic thing to do in C - because that is how C does strings. They assumed that since the standard library only provided methods which acted on those kinds of strings it was a preferred way to do things. Even though that is a lot less efficient than the string/array types that other languages use as defaults.

pjc50 · 6 years ago

On the "weird pointer solutions" tangent, there's the ARM authenticated pointers: https://lwn.net/Articles/718888/

Given that years ago we added a mandatory piece of hardware to most systems to implement virtual memory, I'm now starting to wonder what security and/or performance benefits could be achieved by delegating memory allocation to (or through) hardware.

That is quite easy to validate.

Since years, Oracle has shipped SPARC Solaris with ADI turned on.

Since iPhone X iOS makes use of memory tagging for pointers.

Starting with Android 11, hardware memory tagging is a required feature on ARM platforms on the CPUs that support it, while on other CPUs the kernel will randomly attach GWP-ASan to user processes and is enabled by default on all system processes during the ongoing preview releases.

arethuza · 6 years ago

That's a fascinating approach - how widely used is it? Would love to know whether it causes any problems for 'existing' code.

saagarjha · 6 years ago

Every new (post-2018) iPhone ships with this. iOS developers can build code for the architecture but I believe Apple currently strips it out before distribution, so its use is limited to the OS for now. I would assume at some point they’ll flip the switch to allow it; until then developers can use the toolchain to test if their code still works (generally it does, but messing with function pointers in ways unspecified by the standard can occasionally cause problems). ‘pjmlp is fairly interested in this topic so they might be able to share some more examples of it being used if they drop by the thread.

Besides iOS, Solaris SPARC, and Android 11 onwards.

clarry · 6 years ago

Is this blog post confused or am I confused? It keeps talking about fat pointers but the description looks much more like "arrays with their length stored before their first element," which is a massive difference.

mehrdadn · 6 years ago

I think the latter. Fat pointers were supposed to be 2 pointers wide, not 1...

It's just using "fat pointer" to refer to the concept of passing around a pointer with extra information concerning the data it points to. I agree that generally people would expect "fat pointer" to imply a larger pointer itself, but I don't think the label is misused egregiously enough to warrant picking at this.

I understand their desire to use a library, but there's a faster and safer way to do this that's more C-like if you have access to the compiler:

Just locate anything declared as an array in a particular linker section so the pointer manipulation can be done with two (or one if it's at the top of memory) comparison, possibly even to a constant.

If you do this you can even forbid pointer arithmetic except in actual []-declared memory, and can do transparent bounds checking (&array-1 can hold the array length or, possibly faster, the address of the location after the end of the array).

An advantage of this over the library route is you can prevent pointer/array punning but otherwise allow any C program to work fine. And apart from a few corner cases (there are legit non-array uses of pointer arithmetic, though very few) and noncompliant program can be changed to use [] and still work perfectly fine without this option being used.

wyldfire · 6 years ago

"This proposal wasn't accepted into the C standard..."

Walter often shows up on HN, so I'll ask: was this proposal merely on the Dr Dobbs article or did it actually go to a committee for review? If the latter, why wasn't it accepted?

Should C reconsider this? Especially now that C++ has std::span<> and std::string_view<>?

Currently Checked C seems to be only attempt left, and a mentality shift to at very least use the static analysis tools that come with the compilers.

Contrary to common HN wisdom, most C and C++ related surveys show that only up to 50% actually use some kind of analysis tooling.