Readit News logoReadit News
sparkie commented on Thoughts on Generating C   wingolog.org/archives/202... · Posted by u/ingve
yxhuvud · 21 hours ago
"static inline", the best way of getting people doing bindings in other languages to dislike your library (macros are just as bad, FWIW).

I really wish someone on the C language/compiler/linker level took a real look at the problem and actually tried to solve it in a way that isn't a pain to deal with for people that integrate with the code.

sparkie · 12 hours ago
Compile using `-fkeep-inline-functions`.
sparkie commented on Thoughts on Generating C   wingolog.org/archives/202... · Posted by u/ingve
bjourne · 17 hours ago
Last I checked static inline was merely a hint that compilers need not take. They all do, but by definition it's not a zero cost abstraction.
sparkie · 12 hours ago
`inline` is a hint, but he declares `static_inline` in the preprocessor to include `__attribute__((__always_inline__))`, which is more than just a hint. However, even `always_inline` may be troublesome over translation units, though we can still inline things in different translation units if using `-flto`, I believe there are occasional bugs. For libraries we'd also want to use `-ffat-lto-objects`.
sparkie commented on Thoughts on Generating C   wingolog.org/archives/202... · Posted by u/ingve
troad · 14 hours ago
> Has anyone defined a strict subset of C to be used as target for compilers? Or ideally a more regular and simpler language, as writing a C compiler itself is fraught with pitfalls.

The main reason you'd target C is for portability and free compiler optimisations. If you start inventing new intermediate languages or C dialects, what's the benefit of transpiling in the first place? You might as well just write your own compiler backends and output the machine code directly, with optimisations around your own language's semantics rather than C.

Imho, C89 is the strict subset that a compiler ought to target, assuming they want C's portability and free compiler optimisations. It's well understood, not overly complex, and will compile to fast, sensible machine code on any architecture from the past half century.

sparkie · 12 hours ago
A subset of C could still use existing C compilers and get the optimizations. The front-end would just restrict what can be expressed in it.
sparkie commented on Thoughts on Generating C   wingolog.org/archives/202... · Posted by u/ingve
20k · 21 hours ago
Static inline functions can sometimes serve as an optimisation barrier to compilers. Its very annoying. I've run into a lot of cases when targeting C as a compilation target where swapping something out into an always-inline function results in worse code generation, because compilers have bugs sadly

There's also the issue in that the following two things don't have the same semantics in C:

    float v = a * b + c;
vs

    static_inline float get_thing(float a, float b) {
        return a*b;
    }

    float v = get_thing(a, b) + c;
This is just a C-ism (floating point contraction) that can make extracting things into always inlined functions still be a big net performance negative. The C spec mandates it sadly!

uintptr_t's don't actually have the same semantics as pointers either. Eg if you write:

    void my_func(strong_type1* a, strong_type2* b);
a =/= b, and we can pull the underlying type out. However, if you write:

    void my_func(some_type_that_has_a_uintptr_t1 ap, some_type_that_has_a_uintptr_t2 bp) {
        float* a = get(ap);
        float* b = get(bp);
    }
a could equal b. Semantically the uintptr_t version doesn't provide any aliasing semantics. Which may or may not be what you want depending on your higher level language semantics, but its worth keeping the distinction in mind because the compiler won't be able to optimise as well

sparkie · 12 hours ago
`uintptr_t` and `intptr_t` are integer types large enough to hold a pointer. They're not pointer types (They're also optional in the standard).

In the first `my_func`, there is the possiblity that `a` and `b` are equal if their struct layouts are equivalent (or one has a proper subset of the other's fields in the same order). To tell the compiler they don't overlap we would use `(strong_type1 *restrict a, strong_type2 *restrict b)`.

There's also the possibility that the pointers could point to the same address but be non-equal - eg if LAM/UAI/TBI are enabled, a simple pointer equality comparison is not sufficient because the high bits may not be equal. Or on platforms where memory access is always aligned, the low bits may be not equal. These bits are sometimes used to tag pointers with additional information.

sparkie commented on Building Your Own Efficient uint128 in C++   solidean.com/blog/2026/bu... · Posted by u/PaulHoule
wheybags · 8 days ago
> though he incorrectly states that `uint64_t` is `unsigned long`

It probably is, he's just probably using MacOS, where both long and long long are 64 bit. https://www.intel.com/content/www/us/en/developer/articles/t...

(that's the best linkable reference I could find, unfortunately).

I've run into a similar problem where an overload resolution for uint64_t was not being used when calling with a size_t because one was unsigned long and the other was unsigned long long, which are both 64 bit uints, but according to the compiler, they're different types.

This was a while ago so the details may be off, but the silly shape of the issue is correct.

sparkie · 8 days ago
> It probably is

This was my point. It may be `unsigned long` on his machine (or any that use LP64), but that isn't what `uint64_t` means. `uint64_t` means a type that is 64-bits, whereas `unsigned long` is simply a type that is larger than `unsigned int` and at least 32-bits, and `unsigned long long` is a type that is at least as large as `unsigned long` and is at least 64-bits.

I was not aware of compilers rejecting the equivalence of `long` and `long long` on LP64. GCC on Linux certainly doesn't. On windows it would be the case because it uses LLP64 where `long` is 32-bits and `long long` is 64-bits.

An intrinsic like `_addcarry_u64` should be using the `uint64_t` type, since its behavior depends on it being precisely 64-bits, which neither `long` nor `long long` guarantee. Intel's intrinsics spec defines it as using the type `unsigned __int64`, but since `__int64` is not a standard type, it has probably implemented as a typedef or `#define __int64 long long` by the compiler or `<immintrin.h>` he is using.

sparkie commented on Actors: A Model of Concurrent Computation [pdf] (1985)   apps.dtic.mil/sti/tr/pdf/... · Posted by u/kioku
BatteryMountain · 8 days ago
Orleans is pretty cool! The project has matured nicely over the years (been something like 10 years?) and they have some research papers attached to it if you like reading up on the details. The nuget stats indicate a healthy amount of downloads too, more than one might expect.

One of the single most important things I've done in my career was going down the Actor Model -framework rabbit hole about 8 or 9 years ago, read a bunch of books on the topic, that contained a ton of hidden philosophy, amazing reasoning, conversations about real-time vs eventual consistency, Two-Generals-Problem - just a ton of enriching stuff, ways to think about data flows, the direction of the flow, immutability, event-logged systems and on and on. At the time CQS/CQRS was making heavy waves and everyone tried to implement DDD & Event-based (and/or service busses - tons of nasty queues...) and Actor Model (and F# for that matter) was such clean fresh breath of air from all the Enterprise complexity.

Would highly recommend going this path for anyone with time on their hands, its time well spent. I still call on that knowledge frequently even when doing OOP.

sparkie · 8 days ago
I was disappointed when MS discontinued Axum, which I found pleasant to use and thought the language based approach was nicer than a library based solution like Orleans.

The Axum language had `domain` types, which could contain one or more `agent` and some state. Agents could have multiple functions and could share domain state, but not access state in other domains directly. The programming model was passing messages between agents over a typed `channel` using directional infix operators, which could also be used to build process pipelines. The channels could contain `schema` types and a state-machine like protocol spec for message ordering.

It didn't have "classes", but Axum files could live in the same projects as regular C# files and call into them. The C# compiler that came with it was modified to introduce an `isolated` keyword for classes, which prevented them from accessing `static` fields, which was key to ensuring state didn't escape the domain.

The software and most of the information was scrubbed from MS own website, but you can find an archived copy of the manual[1]. I still have a copy of the software installer somewhere but I doubt it would work on any recent Windows.

Sadly this project was axed before MS had embraced open source. It would've been nice if they had released the source when the decided to discontinue working on it.

[1]:https://web.archive.org/web/20110629202213/http://download.m...

sparkie commented on Building Your Own Efficient uint128 in C++   solidean.com/blog/2026/bu... · Posted by u/PaulHoule
ThatGuyRaion · 8 days ago
I suppose that makes sense -- though SIMD seems more useful for accelerating a lot of crypto?
sparkie · 8 days ago
SIMD is for performing parallel operations on many smaller types. It can help with some cryptography, but It doesn't necessarily help when performing single arithmetic operations on larger types. Though it does help when performing logic and shift operations on larger types.

If we were performing 128-bit arithmetic in parallel over many values, then a SIMD implementation may help, but without a SIMD equivalent of `addcarry`, there's a limit to how much it can help.

Something like this could potentially be added to AVX-512 for example by utilizing the `k` mask registers for the carries.

The best we have currently is `adcx` and `adox` which let us use two interleaved addcarry chains, where one utilizes the carry flag and the other utilizes the overflow flag, which improves ILP. These instructions are quite niche but are used in bigint libraries to improve performance.

sparkie commented on Building Your Own Efficient uint128 in C++   solidean.com/blog/2026/bu... · Posted by u/PaulHoule
b1temy · 8 days ago
I understand why a non-standard compiler-specific implementation of int128 was not used (Besides being compiler specific, the point of the article is to walk through an implementation of it), but why use

> using u64 = unsigned long long;

? Although in practice, this is _usually_ an unsigned 64 bit integer, the C++ Standard does not technically guarantee this, all it says is that the type need to be _at least_ 64 bits. [0]

I would use std::uint64_t which guarantees a type of that size, provided it is supported. [1]

Re: Multiplication: regrouping our u64 digits

I am aware more advanced and faster algorithms exist, but I wonder if something simple like Karatsuba's Algorithm [2] which uses 3 multiplications instead of 4, could be a quick win for performance over the naive method used in the article. Though since it was mentioned that the compiler-specific unsigned 128 integers more closely resembles the ones created in the article, I suppose there must be a reason for that method to be used instead, or something I missed that makes this method unsuitable here.

Speaking of which, I would be interested to see how all these operations fair against compiler-specific implementations (as well as the comparisons between different compilers). [3]. The article only briefly mentioned their multiplication method is similar for the builtin `__uint128_t` [4], but did not go into detail or mention similarities/differences with their implementation of the other arithmetic operations.

[0] https://en.cppreference.com/w/cpp/language/types.html The official standard needs to be purchased, which is why I did not reference that. But it should be under the section basic.fundamental

[1] https://en.cppreference.com/w/cpp/types/integer.html

[2] https://en.wikipedia.org/wiki/Karatsuba_algorithm

[3] I suppose I could see for myself using godbolt, but I would like to see some commentary/discussion on this.

[4] And did not state for which compiler, though by context, I suppose it would be MSVC?

sparkie · 8 days ago
> I would use std::uint64_t which guarantees a type of that size, provided it is supported.

The comment on the typedef points out that the signature of intrinsics uses `unsigned long long`, though he incorrectly states that `uint64_t` is `unsigned long` - which isn't true, as long is only guaranteed to be at least 32-bits and at least as large as `int`. In ILP64 and LLP64 for example, `long` is only 32-bits.

I don't think this really matters anyway. `long long` is 64-bits on pretty much everything that matters, and he is using architecture-specific intrinsics in the code so it is not going to be portable anyway.

If some future arch had 128-bit hardware integers and a data model where `long long` is 128-bits, we wouldn't need this code at all, as we would just use the hardware support for 128-bits.

But I agree that `uint64_t` is the correct type to use for the definition of `u128`, if we wanted to guarantee it occupies the same storage. The width-specific intrinsics should also use this type.

> I would be interested to see how all these operations fair against compiler-specific implementations

There's a godbolt link at the top of the article which has the comparison. The resulting assembly is basically equivalent to the built-in support.

sparkie commented on Building Your Own Efficient uint128 in C++   solidean.com/blog/2026/bu... · Posted by u/PaulHoule
ThatGuyRaion · 8 days ago
Question for those smarter than me: What is an application for an int128 type anyways? I've never personally needed it, and I laughed at RISC-V for emphasizing that early on rather than... standardizing packed SIMD.
sparkie · 8 days ago
Cryptography would be one application. Many crypto libraries use an arbitrary size `bigint` type, but the algorithms typically use modular arithmetic on some fixed width types (128-bit, 256-bit, 512-bit, or some in-between like 384-bits).

They're typically implemented with arrays of 64-bit or 32-bit unsigned integers, but if 128-bits were available in hardware, we could get a performance boost. Any arbitrary precision integer library would benefit from 128-bit hardware integers.

sparkie commented on Some C habits I employ for the modern day   unix.dog/~yosh/blog/c-hab... · Posted by u/signa11
sparkie · 16 days ago
Sometimes you want the struct to be defined in a header so it can be passed and returned by value rather than pointer.

A technique I use is to leverage GCC's `poison` pragma to cause an error if attempting to access the struct's fields directly. I give the fields names that won't collide with anything, use macros to access them within the header and then `#undef` the macros at the end of the header.

Example - an immutable, pass-by-value string which couples the `char*` with the length of the string:

    #ifndef FOO_STRING_H
    #define FOO_STRING_H
    
    #include <stddef.h>
    #include <stdlib.h>
    #include <string.h>
    #include "config.h"
    
    typedef size_t string_length_t;
    #define STRING_LENGTH_MAX CONFIG_STRING_LENGTH_MAX
    
    typedef struct {
        string_length_t _internal_string_length;
        char *_internal_string_chars;
    } string_t;
    
    #define STRING_LENGTH(s) (s._internal_string_length)
    #define STRING_CHARS(s) (s._internal_string_chars)
    
    #pragma GCC poison _internal_string_length _internal_string_chars
    
    constexpr string_t error_string = { 0, nullptr };
    constexpr string_t empty_string = { 0, "" };
    
    inline static string_t string_alloc_from_chars(const char *chars) {
        if (chars == nullptr) return error_string;
        size_t len = strnlen(chars, STRING_LENGTH_MAX);
        if (len == 0) return empty_string;
        if (len < STRING_LENGTH_MAX) {
            char *mem = malloc(len + 1);
            strncpy(mem, chars, len);
            mem[len] = '\0';
            return (string_t){ len, mem };
        } else return error_string;
    }
    
    inline static char * string_to_chars(string_t string) {
        return STRING_CHARS(string);
    }

    inline static string_length_t string_length(string_t string) {
        return STRING_LENGTH(string);
    }

    inline static void string_free(string_t s) {
        free(STRING_CHARS(s));
    }
    
    inline static bool string_is_valid(string_t string) {
        return STRING_CHARS(string) != nullptr
            && strnlen(STRING_CHARS(string), STRING_LENGTH_MAX) == STRING_LENGTH(string)
    }
    

    ...

    
    #undef STRING_LENGTH
    #undef STRING_CHARS
    
    #endif /* FOO_STRING_H */
It just wraps `<string.h>` functions in a way that is slightly less error prone to use, and adds zero cost. We can pass the string everywhere by value rather than needing an opaque pointer. It's equivalent on SYSV (64-bit) to passing them as two separate arguments:

    void foo(string_t str);
    //vs
    void foo(size_t length, char *chars); 
These have the exact same calling convention: length passed in `rdi` and `chars` passed in `rsi`. (Or equivalently, `r0:r1` on other architectures).

The main advantage is that we can also return by value without an "out parameter".

    string_t bar();
    //vs
    size_t bar(char **out_chars);
These DO NOT have the same calling convention. The latter is less efficient because it needs to dereference a pointer to return the out parameter. The former just returns length in `rax` and chars in `rdx` (`r0:r1`).

So returning a fat pointer is actually more efficient than returning a size and passing an out parameter on SYSV! (Though only marginally because in the latter case the pointer will be in cache).

Perhaps it's unfair to say "zero-cost" - it's slightly less than zero - cheaper than the conventional idiom of using an out parameter.

But it only works if the struct is <= 16-bytes and contains only INTEGER types. Any larger and the whole struct gets put on the stack for both arguments and returns. In that case it's probably better to use an opaque pointer.

That aside, when we define the struct in the header we can also `inline` most functions, so that avoids unnecessary branching overhead that we might have when using opaque pointers.

`#pragma GCC poison` is not portable, but it will be ignored wherever it isn't supported, so this won't prevent the code being compiled for other platforms - it just won't get the benefits we get from GCC & SYSV.

The biggest downside to this approach is we can't prevent the library user from using a struct initializer and creating an invalid structure (eg, length and actual string length not matching). It would be nice if there were some similar to trick to prevent using compound initializers with the type, then we could have full encapsulation without resorting to opaque pointers.

sparkie · 16 days ago
> The biggest downside to this approach is we can't prevent the library user from using a struct initializer and creating an invalid structure (eg, length and actual string length not matching). It would be nice if there were some similar to trick to prevent using compound initializers with the type, then we could have full encapsulation without resorting to opaque pointers.

Hmm, I found a solution and it was easier than expected. GCC has `__attribute__((designated_init))` we can stick on the struct which prevents positional initializers and requires the field names to be used (assuming -Werror). Since those names are poisoned, we won't be able to initialize except through functions defined in our library. We can similarly use a macro and #undef it.

Full encapsulation of a struct defined in a header:

    #ifndef FOO_STRING_H
    #define FOO_STRING_H

    #include <stddef.h>
    #include <stdlib.h>
    #include <string.h>
    #if defined __has_include
    # if __has_include("config.h")
    #  include "config.h"
    # endif
    #endif

    typedef size_t string_length_t;
    #ifdef CONFIG_STRING_LENGTH_MAX
    #define STRING_LENGTH_MAX CONFIG_STRING_LENGTH_MAX
    #else
    #define STRING_LENGTH_MAX (1 << 24)
    #endif

    typedef struct __attribute__((designated_init)) {
        const string_length_t _internal_string_length;
        const char *const _internal_string_chars;
    } string_t;

    #define STRING_CREATE(len, ptr) (string_t){ ._internal_string_length = (len), ._internal_string_chars = (ptr) }
    #define STRING_LENGTH(s) (s._internal_string_length)
    #define STRING_CHARS(s) (s._internal_string_chars)
    #pragma GCC poison _internal_string_length _internal_string_chars


    constexpr string_t error_string = STRING_CREATE(0, nullptr);
    constexpr string_t empty_string = STRING_CREATE(0, "");

    inline static string_t string_alloc_from_chars(const char *chars) {
        if (__builtin_expect(chars == nullptr, false)) return error_string;
        size_t len = strnlen(chars, STRING_LENGTH_MAX);
        if (__builtin_expect(len == 0, false)) return empty_string;
        if (__builtin_expect(len < STRING_LENGTH_MAX, true)) {
            char *mem = malloc(len + 1);
            strncpy(mem, chars, len);
            mem[len] = '\0';
            return STRING_CREATE(len, mem);
        } else return error_string;
    }

    inline static const char *string_to_chars(string_t string) {
        return STRING_CHARS(string);
    }

    inline static string_length_t string_length(string_t string) {
        return STRING_LENGTH(string);
    }

    inline static void string_free(string_t s) {
        free((char*)STRING_CHARS(s));
    }

    inline static bool string_is_valid(string_t string) {
        return STRING_CHARS(string) != nullptr;
    }

    // ... other string function

    #undef STRING_LENGTH
    #undef STRING_CHARS
    #undef STRING_CREATE

    #endif /* FOO_STRING_H */
Aside from horrible pointer aliasing tricks, the only way to create a `string_t` is via `string_alloc_from_chars` or other functions defined in the library which return `string_t`.

    #include <stdio.h>
    int main() {
        string_t s = string_alloc_from_chars("Hello World!");
        if (string_is_valid(s)) 
            puts(string_to_chars(s));
        string_free(s);
        return 0;
    }

u/sparkie

KarmaCake day2242February 15, 2012View Original