C "clockwise/spiral" rule to understand declarations

palotasb · a year ago

The spiral rule works only if there is no pointer to pointer or array of array in the type. In other words it is an incorrect rule. But take this for example:

        +----------------------------+
        | +-----------------------+  |
        | | +------------------+  |  |
        | | | +-------------+  |  |  |
        | | | | +--------+  |  |  |  |
        | | | | |  +--+  |  |  |  |  |
        | | | | |  ^  |  |  |  |  |  |
    int * * ¦ ¦ ¦ VAR[1][2][3] |  |  |
     ^  | | | | |     |  |  |  |  |  |
     |  | | | | +-----+  |  |  |  |  |
     |  | | | +----------+  |  |  |  |
     |  | | +---------------+  |  |  |
     |  | ---------------------+  |  |
     |  +-------------------------+  |
     +-------------------------------+

The type of VAR is a [1-element] array of [2-element] array of [3-element] array of pointer to pointer to ints. I drew a spiral that passes through each specifier in the correct order. To make the spiral correct it has to skip the pointer specifiers in the first three loops. This is marked by ¦.

The Right-Left Rule is quoted less frequently on HN but it's a correct algorithm for deciphering C types: http://cseweb.ucsd.edu/~ricko/rt_lt.rule.html

The spiral rule can be modified to process all array specifiers before all pointer specifiers, but then you'd have to specify that the order to do so is right and then left. At that point it's just the Right-Left Rule.

rcxdude · a year ago

I've found that 'declaration follows use' is actually the easiest way to understand C type declarations. The only annoying to remember thing is which way the various type specifiers like const bind.

wruza · a year ago

In physics that usually means that we are missing something in the ¦'s. Like these hypothetical/imaginary "arr" keywords, look:

  int ** arr arr arr VAR[1][2][3];

By introducing these virtual storage kind specifiers, the spiral model works! It becomes apparent that these specifiers are actually real, cause the model wouldn't be consistent otherwise, and there's so much evidence for it all over the C codebases. We just tend to see the "real" part of it (sigh, stuck forever with that legacy terminology), while this is what actually happens:

  long double _Complex ** arr arr arr VAR[1][-1][sqrtl(-1)];

which is naturally homomorphic to most C compilations.

(spoiler for those who need it: jk)

tpoacher · a year ago

My own interpretation (which effectively matches the right-left rule, at least algorithmically) is that, (), [], and * are operators, which have different semantics at the declaration vs the 'usage/expression' level, where:

() and [] are right-associative unary type-operators, with high fixity, (or equivalently, binary type-operators when () or [] have arguments)

and * is a binary type-operator, with low fixity (i.e. lower than () or [] ), where the 'left' argument is effectively the context that's left after removing the asterisk and the 'right' argument (rather than what's to the left of the asterisk in a purely 'anatomical 'sense')

(whereas, at the expression level, * is a unary value-operator instead, while () and [] behave the same as in their type-form, except acting as value-operators instead)

pwdisswordfishz · a year ago

If you're going to adjust willy-nilly on which loop of the spiral a given element appears, then it's not much of a rule, isn't it? I mean, why not write this?

        +----------------------------+
        | +-----------------------+  |
        | | +------------------+  |  |
        | | | +-------------+  |  |  |
        | | | | +--------+  |  |  |  |
        | | | | |  +--+  |  |  |  |  |
        | | | | |  ^  |  |  |  |  |  |
    int ¦ ¦ * * ¦ VAR ¦ [1] ¦ [2][3] ¦
     ^  | | | | |     |  |  |  |  |  |
     |  | | | | +-----+  |  |  |  |  |
     |  | | | +----------+  |  |  |  |
     |  | | +---------------+  |  |  |
     |  | ---------------------+  |  |
     |  +-------------------------+  |
     +-------------------------------+

256_ · a year ago

I think that's their point. I read it as explaining why the rule doesn't work, rather than defending it.

LysPJ · a year ago

As a C programmer, declarations in Go appeared "backwards" to me when I first saw them.

IMO the Go syntax is a vast improvement as it's much simpler and avoids the clockwise/spiral issue: https://appliedgo.com/blog/go-declaration-syntax

rollcat · a year ago

It's worth to remember that C "happened" to stick around because of UNIX, but it's been just another iteration of what started as BCPL; with Go coming from the same people who made Plan 9 (the spiritual successor to Research UNIX) and Alef/Limbo. These guys are equally interested in pushing both PL/OS research and practice.

(I also have no doubt that just like Go fixed C's type declarations, in another 20-30 years, Go's successor will finally fix error handling.)

unrealhoang · a year ago

hence why every recent programming languages are type after name (go included).

gf000 · a year ago

I think a proper type system that doesn't special-cases arrays are better, e.g. Array<String>. Pointers may also be part of this uniform structure, e.g. Array<Ref<String>>, and there is zero question on how to parse it even if you are not familiar.

atiedebee · a year ago

I like the way Dlang does it. It keeps the types on the left like C, but are actually readable.

tpoacher · a year ago

even better, they should have it above or below!

or, declare the type separately, like in fortran or haskell (or in fact, pre-C99 c)

HexDecOctBin · a year ago

I just spent a week trying to write a parser for C declarations. Recursive Descent didn't work, Top Down Operator Precedence didn't work, I finally found an simple implementation of cdecl in K&R2, but it printed out textual description instead of creating an AST. Converting it to create an AST was a whole different challenge. In the end, I got it working by adding the idea of a blank-type and doing some tree inversions (almost using each AST as a program that executes over the nested chunk of AST).

It was only about 200 lines of code, and yet never have I been happier to finish a solution and never having to think about it again.

mananaysiempre · a year ago

Rec-descent absolutely does work (source: wrote a working parser), but it's a bit annoying to make it build a syntax tree. Making it build a prefix encoding of the declaration is much easier.

If you have to, you can make a syntax tree work, too, but you'll have to thread through the declarator parser a double (triple?) pointer for the hole in the tree where the next part goes, and at the end plug it with the specifiers-and-qualifiers part you parsed before that. Or at least that's what I had working before I gave up and switched to a prefix code. It'd probably pay to check the LCC book (Fraser & Hanson, A Retargetable C Compiler: Design and Implementation) to see what they do there, because their type representation is in fact a hash-consed tree. (The LuaJIT FFI parser needs some sort of magic fix-up pass at the end that I didn't much like, but I don't remember the specifics.)

foldr · a year ago

I think the most straightforward recursive-descent approach is just to parse the expression part of the type as an expression and then turn it inside out to get the type. So if you have e.g. 'int (*(p[5]))(char)', then your parse tree is something like

    declaration
      int
      function call
        *
          []
            p
            5
        char

and you need to turn that inside out to get something like

    array
      pointer to
        function
          int
          (char)

This way you have a simple expression parser combined with a simple recursive tree transformation function. (The tree transformation took about 50 LOC when I implemented it for a toy parser.)

HexDecOctBin · a year ago

> Rec-descent absolutely does work (source: wrote a working parser), but it's a bit annoying to make it build a syntax tree.

Yeah, it required too much backtracking and state snap-shotting and resets, and I couldn't figure out a decent way of reporting good errors.

Thanks for the references. The code I have now is pretty elegant and functional, so I'm not in the mood of diving back into it. But if I ever need to change it, I'll take a look.

keyle · a year ago

Reading this make me happy. People still spend time on fun & nerdy stuff like this!

dataflow · a year ago

How can recursive descent not work? Did you not do backtracking? It might be slow or annoying but it should work.

HexDecOctBin · a year ago

I mean, with enough hacking around, anything can work. But the code had gotten extremely complicated and brittle, and it was still not handling all the edge cases. Readability was important for this project, since I want to be able to modify the parser later to add new extensions to the language.

not-my-account · a year ago

What are you working on?

HexDecOctBin · a year ago

A C parser for a meta-programming layer (automatically generating serialisation routines, etc.)

userbinator · a year ago

I finally found an simple implementation of cdecl in K&R2

Perhaps you should've started learning C with K&R. ;-)

HexDecOctBin · a year ago

I did, 10 years ago. Didn't remember every piece of code.

wruza · a year ago

The proper way is to use /usr/bin/cdecl or https://cdecl.org and extract as many typedefs as reasonable from the gibberish, because in C most of the times you’ll need these anyway to address lifetimes and ownership/borrowing points.

fc417fc802 · a year ago

It's unfortunate that we're stuck with syntax that so many people struggle to accurately decipher.

userbinator · a year ago

It's unfortunate that so many people never read K&R.

A very small yet readable book, written by the original authors of C, that very clearly covers the language and even presents a C program to parse the declarations.

xigoi · a year ago

If you need a program to help you read C programs, that suggests a serious flaw in C.

gf000 · a year ago

There are books on hieroglyphs as well, but there is a good reason we don't use them as types in programming languages.

masklinn · a year ago

You’re only “stuck” with it to the extent that you’re stuck with C(++). Free yourself from that and the rest follows.

immibis · a year ago

C++ is better, actually, because a template type has a straightforward syntax.

f1shy · a year ago

I find this description way better: http://unixwiz.net/techtips/reading-cdecl.html

sylware · a year ago

People should stick to the specifications...

That said, that makes me think of perl5. I though the perl5 coders were doing some kind of sick competition: who is going to use the most implicit code, namely to read the code you would need a complete/perfect/permanent understanding of the full perl5 syntax parser to understand what some code is actually doing. I hate perl5 for that, but ultra complex syntax computer language like c++ and similar (java, rust, etc), are worse. In advanced OOP, you have no idea of what's going on if you don't embrace the full OOP model of a complex program, not to mention it does exponentially increase the syntax complexity, which is a liability in order to get a sane spectrum of alternative "real-life" compilers since those become insane to implement correctly.

If implicit there is in a computer language, it must be very little and very simple, but should be avoided as much as possible. Does it mean more verbose code, well, most of the time yes, and this is usually much better on the long run. For instance in C, I try to avoid complex expression (often I fail, because I am too used to some operators like '++' '--'), many operators should not be around, not pertinent enough (like a ? b : c) only increasing compiler complexity.

immibis · a year ago

The actual rule is "declaration follows use".

int p[5] means the type of p[5] is int (but you still have to remember valid elements are 0-4).

void (signal(void()(int))(int) means the type of (signal(something that is a void()(int))(42) is void. And void(p)(int) means the type of (p)(42) is void.

If you can remember the precedence of these operators, you automatically remember the precedence of their "type operators" as well.

nialv7 · a year ago

yeah, this is so simple, i don't understand why people keep invent more complex and worse rules to understand cdecl...