Readit News logoReadit News
parkmycar commented on Small Strings in Rust: smolstr vs. smartstring (2020)   fasterthanli.me/articles/... · Posted by u/airstrike
tialaramex · a year ago
Thanks! And the explanation of 217 makes sense too.

Since I have you here, wouldn't it be better to name that type "LastByte" or something? It's not a (Rust) char, and it's not necessarily UTF-8 whereas it is definitely the last byte.

parkmycar · a year ago
Ha, naming is hard! You’re totally right, it used to be just the values of the last byte of a UTF-8 char (and before that it was NonMaxU8) but now represents more. I’ll update it once I’m back at my computer, thanks!
parkmycar commented on Small Strings in Rust: smolstr vs. smartstring (2020)   fasterthanli.me/articles/... · Posted by u/airstrike
conaclos · a year ago
compact_str is a fantastic crate, used by many projects. Do you know the byteyarn crate [0]? This could be nice to add this to the `Similar Crates` section if it makes sense.

[0] https://docs.rs/byteyarn/latest/byteyarn/

parkmycar · a year ago
I do, mcyoung wrote a great blogpost[1] about it! Good idea, I’m AFK at the moment but will add it to the ‘Similar Crates’ section once I’m back

[1] https://mcyoung.xyz/2023/08/09/yarns/

parkmycar commented on Small Strings in Rust: smolstr vs. smartstring (2020)   fasterthanli.me/articles/... · Posted by u/airstrike
tialaramex · a year ago
Probably this isn't helpful anyway - what's actually going on is more complicated and is explained later at a high level or I'll try now:

Rust has "niches" - bit patterns which are never used by that type and thus can be occupied by something else in a sum type (Rust's enum) which adds to that type. But stable Rust doesn't allow third parties to promise arbitrary niches exist for a type they made.

However, if you make a simple enumeration of N possibilities that automatically has a niche of all the M-N bit patterns which weren't needed by your enumeration in the M value machine integer that was chosen to store this enumerated type (M will typically be 256 or 65536 depending on how many things you enumerated)

So, CompactString has a custom enum type LastUtf8Char which it uses for the last byte in its data structure - this has values V0 through V191 corresponding to the 192 possible last bytes of a UTF-8 string. That leaves 64 bit patterns unused. Then L0 through L23 represent lengths - inline strings of length 0 to 23 inclusive which didn't need this last byte (if it was 24 then that's V0 through V191). Now we've got 40 bit patterns left.

Then one bit pattern (the pattern equivalent to the unsigned integer 216) signifies that this string data lives on the heap, the rest should be interpreted accordingly, and another (217) signifies that it's a weird static allocation (I do not know why you'd do this)

That leaves 38 bit patterns unused when the type is finished using any it wanted so there's still a niche for Option<CompactString> or MyCustomType<CompactString>

parkmycar · a year ago
Author of compact_str here, you hit the nail on the head, great explanation!

> ... and another (217) signifies that it's a weird static allocation (I do not know why you'd do this)

In addition to String Rust also has str[1], which is an entirely different type. It's common to represent string literals known at compile time as `&'static str`, but they can't be used in all of the same places that a String can. For example, you can't put a &'static str into a Vec<String> unless you first heap allocate and create a String. We added the additional variant of 217 so users of CompactString could abstract over both string literals and strings created at runtime to solve cases like the example.

[1]: https://doc.rust-lang.org/std/primitive.str.html

parkmycar commented on Small Strings in Rust: smolstr vs. smartstring (2020)   fasterthanli.me/articles/... · Posted by u/airstrike
unshavedyak · a year ago
On the note of small strings, Compact String[1] was i believed released after this article and has a nifty trick. Where Smol and Smart can fit 22 and 23 bytes, CompactStr can fit 24! Which is kinda nutty imo, that's the full size of the normal String on the stack.. but packed with actual string data.

There's a nice explanation on their readme[2]. Love tricks like this.

[1]: https://github.com/ParkMyCar/compact_str

[2]: https://github.com/ParkMyCar/compact_str?tab=readme-ov-file#...

parkmycar · a year ago
Hey I'm the author of compact_str, thanks for the kind words!

Fun fact, it was this fasterthanlime post that originally inspired me to play around with small strings and led to the creation of compact_str.

u/parkmycar

KarmaCake day22May 15, 2020View Original