Readit News logoReadit News
lieks · a year ago
For context: the author (see his other posts) is exploring the possibilities of writing C with no C runtime to avoid having to deal with it on Windows. He began to kind of treat it as a new language, with the string type, arenas and such, which help avoid memory bugs (and from my experience, are very useful).

This is a pretty cool hack. Makes me want to write a regex library again.

flohofwoe · a year ago
TBH, most of the C stdlib is quite useless anyway because the APIs are firmly stuck in the 80's and never had been updated for the new C99 language features and more recent common OS features (like non-blocking IO) - and that's coming from a C die hard ;)
chipdart · a year ago
> TBH, most of the C stdlib is quite useless anyway because the APIs are firmly stuck in the 80's (...)

This. A big reason behind Rust managing to get some traction from the onset was how Rust presented itself as an alternative to C for system's programming that offered a modern set of libraries designed with the benefit of having decades of usability research.

ryandrake · a year ago
I don't think everything has to be "modernized" and "updated." When I look at software from the 80s that is still with us, I think: "This is robust, keeps working, and has withstood the test of time" not "This must be changed." I still use C and the standard C library because I know how it worked in the past, I know it works today, and I know it will work for decades to come.

(minus the known foot-guns like strcpy() that we learned long ago were not great)

whiterknight · a year ago
Blocking IO is usually good though. The entire Unix kernel is designed to manage complexity so you can write “if then else”.

What is grep going to do while it waits for data?

jandrese · a year ago
You can do nonblocking IO using the C std library. Poll and select have been in there for decades. They are even in POSIX.
marssaxman · a year ago
Thanks for that explanation! I have occasionally fantasized about a similar project - what could C be like, if one abandoned its ancient stdlib and replaced it with something suited to current purposes? - so I'm looking forward now to reading more of this author's writing.
VancouverMan · a year ago
Something like that would probably end up similar to GLib or the Apache Portable Runtime.

https://gitlab.gnome.org/GNOME/glib/

https://apr.apache.org/

pk-protect-ai · a year ago
Thank you for the context. I wouldn't have read the article without it. I mean, it's a pretty good idea for "no runtime," but when I saw the article title, I thought at first "Why????" Honestly, I'm glad I read it.
pjmlp · a year ago
Being able to write C without the C standard library on Windows is something we have been doing since Windows exists, nothing special there.

As proven by early editions of Petzold famous book.

dwattttt · a year ago
NODEFAULTLIB is quite a rite of passage
nox101 · a year ago
what's special about Windows for a regex library?
dfox · a year ago
On POSIX systems the OS (well, libc) already provides a C regex library: https://pubs.opengroup.org/onlinepubs/9699919799/functions/r...

Whether you want to use that is another question.

judah · a year ago
The article was interesting, but even more so was his link to arena allocation in C: https://www.rfleury.com/p/untangling-lifetimes-the-arena-all...

This comprehensive article goes over the problems of memory allocation, how programmers and educators have been trained to wrongly think about the problem, and how the concept of arenas solve it.

As someone who spends most of his time in garbage collected languages, this was wildly fascinating to me.

jklowden · a year ago
So bad is the performance of gcc std::regex that I reimplemented part of it using regex(3). Of course, I didn’t discover the problem until I’d committed to the interface, so I put mine in namespace dts, just in case one day the supplied implementation becomes useful.

As it stands, std::regex should come with a warning label. It’s fine for occasional use. As part of a parser, it’s not. Slow is better than broken, until slow is broken.

bregma · a year ago
To be fair, the GNU implementation of std::regex has to conform to the API defined by ISO/IEC 14882 (The C++ Programming Language). If you don't have to provide that API purely in a header file, it gets pretty easy to write something bespoke that is faster, or smaller, or conforms to some special esoteric requirement, or does something completely different that what the C++ standard library specification requires.

The purpose of the C++ standard library is to provide well-tested, well-documented general functionality. If you have specific requirements and have an implementation or API that meets your requirements better than what the C++ standard library supplies, that's great. You're encouraged to use that instead.

If you have an implementation of std::regex that meets all the documented requirements and is provably faster under all or most circumstances than my implementation is, then submit it upstream. It's Free software and it wouldn't be the first time improved implementations of library code have been suggested and accepted by that project. Funny how no one has done that for std::regex in over a decade though, despite the complaints.

christianqchung · a year ago
I've always heard that it's a backwards compatibility problem with ABI, not API, is that not true?
yosefk · a year ago
Around 30 years ago, STL introduced an allocator template parameter everywhere to let you control allocation. Here in 2024 we read about making use of the, erm, strange semantics of dynamic linking to force standard C++ code to allocate your way
Rucadi · a year ago
I like the newest* introduction of allocators, PMR, I use it quite a lot.
lelanthran · a year ago
I can't say that I like this very much.

Problematic macro in the header, custom string type compatible with nothing else in C, and I have no idea where the arena type comes from.

Having it magically deallocate memory is nice, but will confuse C programmers reading the caller.

Honestly, adding -lre to the linker is just much easier, and that library comes with docs too.

Brian_K_White · a year ago
TFA links to what arenas are and where they come from, how some bits included here would not really be part of this library but assumed part of the project using these techniques, does explain the general point of the exercise, and how this isn't even strictly a suggestion for a library but a "potpouri of techniques".

They are fully aware of -lre and assume that everyone else is too. This isn't about just achieving regex somehow. It's about avoiding the crt and gc and c++ in general while using an environment that normally includes all that by default.

You don't redefine new just to get regex. Obviously there must be some larger point and this regex is just some zoomed-in detail example of existing and operating within that larger point.

D4ckard · a year ago
Read his other stuff, it’s rather well thought out. The assumption is you don’t use libc or at least you use different interfaces to it.
unwind · a year ago
This is fun and impressive, but it feel the author kind of misses out on explaining in the intro why it would be wrong to just ... use C's regex library [1]?

I guess the entire post could be seen as an exercise in wrapping C++ to C with nice memory-handling properties and so on, but it would also be fine to be open and upfront about that, in my opinion.

1: https://www.man7.org/linux/man-pages/man3/regex.3.html

portaltonowhere · a year ago
Probably because that’s not part of the C standard library, but a POSIX offering. Author does cross-platform work including Windows.
unwind · a year ago
Ah, d'uh. Good point. That's what I get from mostly writing stuff like that in Linux, I guess. Thanks.
malkia · a year ago
Back in the old days of console game programming, most SDKs would come with something like:

my_audio_sdk_init(&arena, sizeof(arena)); // char arena[65536]; // or something like this

WalterBright · a year ago
> The regex engine allocates everything in the arena, including all temporary working memory used while compiling, matching, etc.

I do something quite different. I design the API so any data returned by the library function is allocated by the caller. This means the caller has full control over what style of memory management works best.

For example, you can then choose to use stack allocation, RAII, malloc/free, the GC, static allocation, etc.

For a primitive example, snprintf.

connicpu · a year ago
Isn't giving the caller control over the memory exactly what this API does? The caller just passes in a block of memory that will be used for all of the internal allocations as well as the strings returned by the API.
WalterBright · a year ago
If I misunderstood it, then you are correct.