Lock down the prefix string now before it’s too late and document it. I see in Go that it’s lowercase ascii, which seems fine except for compound types (like “article-comment”). May be worth looking at allowing a single separator given that many complex projects (and ORMs) can’t avoid them.
The Go implementation has no tests. This is very unit-testable. Add tests goddammit!
For Go, I’d align with Googles UUID implementation, with proper parse functions and an internal byte array instead of strings. Strings are for rendering (and in your case, the prefix). Right now, it looks like the parsing is too permissive, and goes into generation mode if the suffix is empty. And the SplitN+index thing will panic if no underscores, no? Anyway, tests will tell.
As for the actual design decisions, I tried to poke holes but I fold! I think this strikes the sweet spot between the different tradeoffs. Well done!
1. We've now implemented pretty thorough testing: https://github.com/jetpack-io/typeid-go/blob/main/typeid_tes...
2. I clarified the prefix in the spec
Thanks for the feedback!
There is just a single test. Which only tests the decoding of a single known value. No encoding test.
Go has infrastructure for benchmarking and fuzzing. Use it!
Also, you took code from https://github.com/oklog/ulid/blob/main/ulid.go which has "Copyright 2016 The Oklog Authors" but this is not mentionned in your base32.go.
Thanks for the feedback!
Note that people generally do not type in object identifiers, but they do frequently cut-and-paste them between applications and chat/forum interfaces, forward them by email, search for them in log files. Verbal transmission is rare to non-existent. Under these conditions, pronunciation proves irrelevant, and case-insensitivity becomes an impediment, but consistency and paste/break resilience become necessary.
Base 58 offers a bijective encoding that fits these concerns much more effectively and is more compact to boot. Similarly inspired by Stripe, I've been using type-prefixed base58-encoded UUIDs for object identifiers for some years. user_1BzGURpnHGn6oNru84B3Ri etc.
Edit to add: to be fair to Douglas Crockford, his encoding of base 32 was designed two decades ago, when the usage landscape looked quite different.
Ultimately I ended up leaning towards a base32 encoding, because I didn't want to pre-suppose case sensitivity. For example, you might want to use the id as a filename, and you might be in an environment where you're stuck with a case insensitive filesystem.
Note that TypeID is using the Crockford alphabet and always in lowercase – *not* the full rules of Crockford's encoding. There's no hyphens allowed in TypeIDs, nor multiple encodings of the same ID with different variations of the ambiguous characters.
Lock down the prefix string now before it’s too late and document it. I see in Go that it’s lowercase ascii, which seems fine except for compound types (like “article-comment”). May be worth looking at allowing a single separator given that many complex projects (and ORMs) can’t avoid them.
The Go implementation has no tests. This is very unit-testable. Add tests goddammit!
For Go, I’d align with Googles UUID implementation, with proper parse functions and an internal byte array instead of strings. Strings are for rendering (and in your case, the prefix). Right now, it looks like the parsing is too permissive, and goes into generation mode if the suffix is empty. And the SplitN+index thing will panic if no underscores, no? Anyway, tests will tell.
As for the actual design decisions, I tried to poke holes but I fold! I think this strikes the sweet spot between the different tradeoffs. Well done!
We have tests for the base32 encoding which is the most complicated part of the implementation (https://github.com/jetpack-io/typeid-go/blob/main/base32/bas...) but your point stands. We'll add a more rigorous test suite (particularly as the number of implementations across different languages grows, and we want to make sure all the implementations are compatible with each other)
Re: prefix, is the concern that I haven't defined the allowed character set as part of the spec?
1. I don't believe people actually hand type-in these values, so I'm not really concerned about the 'l' vs '1' issue. I do base 32 without `eiou` (vowels) to reduce the likelihood of words (profanity) sneaking in.
2. I add two base-32 characters as a checksum (salted of course). This is prevents having to go look at the datastore when the value is bogus either by accident or malice. I'm unsure why other implementations don't do this.
We might add a warning in the future if you decode/encode something that is not v7, but if it suits your use-case to encode UUIDv4 in this way, go for it. Just keep in mind that you'll lose the locality property.
- We need to get rid of YAML. Not only because it's a horrible file format but also because it lacks proper variables, proper type safety, proper imports, proper anything. To this day, usage & declaration search in YAML-defined infrastructure still often amounts to a repo-wide string search. Why are we putting up with this?
- The purely declarative approach to infrastructure feels wrong. For instance, if you've ever had to work on Gitlab pipelines, chances are that already on day 1 you started banging your head against the wall because you realized that what you wanted to implement is not possible currently – at least not without jumping through a ton of hoops –, and there's already an open ticket from 2020 in Gitlab's issue tracker. I used to think, how could the Gitlab devs possibly forget to think of that one really obvious use case?! But I've come to realize that it's not really their fault: If you create any declarative language, you as the language creator will have to define what all those declarations are supposed to mean and what the machine is supposed to do when it encounters them. Behind every declaration lies a piece of imperative code. Unfortunately, this means you'll need to think of all potential use cases of your language and your declarations, including combinations and permutations thereof. (There's a reason why it's taken so long for CSS to solve even the most basic use cases.) Meanwhile, imperative languages simply let the user decide what they want. They are much more flexible and powerful. I realize I'm not saying anything new here but it often feels like as if DevOps people have forgotten about the benefits of high-level programming languages. Now this is not to say we should start defining all our infrastructure in Java but let's at least allow for a little bit of imperativeness and expressiveness!
My personal take is that at some point you are better of just using a full programming langugage like TypeScript. We created TySON https://github.com/jetpack-io/tyson to experiment with that idea.