Typical: Data interchange with algebraic data types

habitue · 2 years ago

This seems like the thing I wish protocol buffers were, but learning from them and removing the warts. Very promising.

Also, implementing in rust and typescript as the first languages was a smart choice

revskill · 2 years ago

Protocol buffer lacks of good documentation for beginners which fits in a README.md. It's not a community project i guess.

Basically enterprisey toolings bored me to death.

Nowadays i trust toolings whose documentation could fit in a README file. It means it's concise, simplicity.

kubanczyk · 2 years ago

This comes close https://protobuf.dev/programming-guides/encoding/

It's long, but dev-friendly. Once I got through, I could understand 99% of benefits and limitations of protobufs.

anitil · 2 years ago

I don't know a lot about protocol buffers. What are the warts? I think they don't have a canonical format, is that one of them?

stepchowfun · 2 years ago

Typical creator here. I'm pleasantly surprised to find this on HN today! Happy to answer any questions about it.

andyferris · 2 years ago

I was wondering if you've considered having an alternative, human readable encoding (either your own syntax or a JSON-based schema)?

I find it quite useful to be able to inspect data by eye and even hand-editing payloads occassionally, and having a standard syntax for doing so would be nice.

(More generally, it's a shame JSON doesn't support sum types "natively" and I think a human readable format with Typical's data model would be really cool).

stepchowfun · 2 years ago

It's a good question! The binary format is completely inscrutable to human eyes and is not designed for manual inspection/editing. However:

1) For Rust, the generated types implement the `Debug` trait, so they can be printed in a textual format that Rust programmers are accustomed to reading.

2) For JavaScript/TypeScript, deserialized messages are simple passive data objects that can be logged or inspected in the developer console.

So it's easy to log/inspect deserialized messages in a human-readable format, but there's currently no way to read/edit encoded messages directly. In the future, we may add an option to encode messages as JSON which would match the representation currently used for decoded messages in JavaScript/TypeScript, with sums being encoded as tagged unions.

magicalhippo · 2 years ago

> it's a shame JSON doesn't support sum types "natively"

You can describe it using JSON Schema though[1], using "oneOf" and "const". Though I prefer the more explicit way of using "oneOf" combined with "required" to select one of a number of keys[2].

[1]: https://www.jsonschemavalidator.net/s/6SCuYNBe

[2]: https://www.jsonschemavalidator.net/s/tNnQmsTd

skybrian · 2 years ago

It seems like the safety rules are buggy because some assumptions are missing?

The safety rules say that adding an asymmetric field is safe, and converting asymmetric to required is safe. If you do both steps, then this implies that adding a required field is safe. But it’s not. As you say, it’s not transitive.

But lack of transitivity also means that a pull request that converts a field from asymmetric to required is not safe, in general. You need to know the history of that field. If you know that the field has always been asymmetric (unlikely) or all the older binaries are gone, then it’s safe. A reviewer can’t determine this by reading a pull request alone.

Maybe waiting until old binaries are gone is what you mean by “a single change” but it seems like that should be made explicit?

stepchowfun · 2 years ago

No IDL that supports required fields can offer the transitivity property you're referring to.

Typical has no notion of commits or pull requests in your codebase. The only meaningful notion of a "change" from Typical's perspective is a deployment which updates the live version of the code.

When promoting an asymmetric field to required (for example), you need to be sure the asymmetric field has been rolled out first. If you were using any other IDL framework (like Protocol Buffers) and you wanted to promote an optional field to required, you'd be faced with the same situation: you first need to make sure that the code running in production is always setting that field before you do the promotion. Typical just helps more than other IDL frameworks by making it a compile-time guarantee in the meantime.

We should be more careful about how we use the overloaded word "change", so I'm grateful you pointed this out. Another comment also helped me realize how confusing the word "update" can be.

magicalhippo · 2 years ago

From the description:

Thus, asymmetric fields in choices behave like optional fields for writers and like required fields for readers—the opposite of their behavior in structs.

So if you have a schema change which adds an asymmetric field to both a struct and a choice, it seems both writers and readers needs to be updated in order to successfully transmit to each other?

Or am I missing something fundamental?

stepchowfun · 2 years ago

If you add an asymmetric field to a struct, writers need to be updated to set the field for the code to compile.

If you also add an asymmetric field to a choice, readers need to be updated to be able to handle the new case for the code to compile.

You can do both in the same change. The new code can be deployed to the writers and readers in any order. Messages generated from the old code can be read by the new code and vice versa, so it's fine for both versions of the code to coexist during the rollout.

After that first change is rolled out, you can promote the new fields to required. This change can also be deployed to writers and readers in any order. Since writers are already setting the new field in the struct, it's fine for readers to start relying on it. And since readers can already handle the new case in the choice, it's fine for writers to start using it.

texuf · 2 years ago

Do you think you could ever generate types for go? The protobuf implementation of oneof in go is pretty rough to look at, and not fun to type over and over.

stepchowfun · 2 years ago

I'd love for Typical to support Go! We'd need someone with enough time to implement it.

If anyone is interested in contributing any code generators, you can start by copying the Rust or TypeScript generator and modifying it appropriately. See the contributing guide here: https://github.com/stepchowfun/typical/blob/main/CONTRIBUTIN...

throwaway290 · 2 years ago

Really cool! Does it work in browser if I want to compile .t spec using JS/TS?

stepchowfun · 2 years ago

Yes! We have comprehensive integration tests that run in the browser to ensure the generated code only uses browser-compatible APIs. Also, the generated code never uses reflection or dynamic code evaluation, so it works in Content Security Policy-restricted environments.

See this section of the README for more info: https://github.com/stepchowfun/typical#javascript-and-typesc...

naasking · 2 years ago

Nice that algebraic types are getting more love. Would be nice if these could be imported into existing systems, like Cap 'n Proto.

parhamn · 2 years ago

Curious how this will look when they get to implementations with less expressive type systems. Typescript & Rust are particularly good. Making a usable library for this in golang won't be easy.

And now that I think about it, Protobuf/Thrift/etc type tools are heavily constrained by finding lowest-common-demoninator of language features to allow for cross serialization. Maybe in the next generation of these tools, languages like golang don't get a seat at the table for the sake of progress -- I could be fine with it.

stepchowfun · 2 years ago

You're exactly right about other frameworks appealing to the lowest common denominator, whereas Typical isn't willing to make such compromises.

Languages without proper sum types are at a disadvantage here, but it's possible to encode sum types with exhaustive pattern matching in such languages using the visitor pattern. That approach requires some ergonomic sacrifices (e.g., having to use a reified eliminator rather than the built-in `switch` statement), and people using those languages may prefer convenience over strong guarantees. It's an unfortunate impedance mismatch.

jcparkyn · 2 years ago

I'd imagine that most people voluntarily using go or similar languages wouldn't be too bothered by just having all the checks occur at runtime in the generated code, rather than encoding them in the type system.

Sum types are still awkward, but most languages can at least approximate them, minus some compile-time checks.

tiziano88 · 2 years ago

Brilliant, I have been thinking of doing exactly this for a while now, glad I waited for someone else to do it in a better way :)

munificent · 2 years ago

Asymmetric fields is a really clever idea.

Lacaranian · 2 years ago

I love everything about this! I think a lot of code could benefit from restructuring via ADTs, and ser/deser is an important piece of that story. But I suppose I do have one nitpick.

Using a fallback for asymmetric fields in sum types seems off to me, albeit pragmatic. If the asymmetric fields for product types use an Option<T>, and Option is basically a sum of a T and a Unit, a close dual is a product (struct/tuple) of a T and the dual of Unit (a Nothing type, eg. one with no instantiable values, such as an empty enum).

I think this would provide similar safety guarantees, as a writer couldn't produce a value of an added asymmetric sum type variant, but a reader could write handling for it (including all subfields besides the Nothing typed one)?

stepchowfun · 2 years ago

You're considering an alternative behavior for asymmetric fields in choices, but you need to consider the behavior of optional fields in choices too.

In particular, the following duality is the lynchpin that ties everything together: "asymmetric" behaves like optional for struct readers and choice writers, but required for struct writers and choice readers.

From that duality, the behavior of asymmetric fields is completely determined by the behavior of optional fields. It isn't up to us to decide arbitrarily.

So the question becomes: what is the expected behavior of optional choice fields?

Along the lines you proposed, one could try to make optional choice fields behave as if they had an uninhabited type, so that writers would be unable to instantiate them—then you get exactly the behavior you described for asymmetric choice fields. Optional fields are symmetric, so both readers and writers would treat them as the empty type. This satisfies the safety requirement, but only in a degenerate way: optional fields would then be completely useless.

So this is not the way to go.

It's important to take a step back and consider what "optional" ought to mean: optionality for a struct relaxes the burden on writers (they don't have to set the field), whereas for a choice the burden is relaxed on readers (they don't have to handle the field). So how do you allow readers to ignore a choice field? Not by refusing to construct it (which would make it useless), but rather by providing a fallback. So the insight is not to think of optional as a type operator (1 + T) that should be dualised in some way (0 * T), but rather to think about the requirements imposed on writers and readers.

You're right to note the duality between sums and products and initial and terminal objects, and indeed category theory had a strong influence on Typical's design.

iso8859-1 · 2 years ago

It seems that there are two approaches to schema encoding:

* Writing OpenAPI/Avro and then generating deserializing/serializing from that (like e.g. Avro[0] or Tie[1] or Typical)

* Writing the schema using an in-language DSL (e.g. Autodocodec[2])

If I have a single language codebase, why should I prefer the first approach? You can always make your in-language DSL serialize out to a dedicated language at a later point.

Typical isn't focused on JSON, so it doesn't seem like it is optimized for web. Not doing web makes it more likely that you don't need multiple language support.

You can limit the metaprogramming also: You don't strictly need GHC.Generics for the in-language DSL. But if you're generating code, it's always going to be opaque and hard to debug.

If you keep the DSL in-language, you don't need to generate stubs since you can use the languages own type system to enforce the mapping to the native records[2].

I have heard the argument that everything should be 'documentation first', which was given as an argument for using Tie. But I don't see why an in-language DSL can't provide enough detail. There is so much manually written OpenAPI out there, any of these approaches is vastly better than that.

I have been reading Designing Data Intensive Applications by Martin Kleppmann but it doesn't cover this trade-off. Which makes sense, since it isn't really a book on programming using DSLs.

[0]: https://hackage.haskell.org/package/avro#generating-code-fro...

[1]: https://github.com/scarf-sh/tie

[2]: https://github.com/NorfairKing/autodocodec#fully-featured-ex...

stepchowfun · 2 years ago

> If I have a single language codebase, why should I prefer the first approach?

Probably the most compelling reason is that a single language codebase might not be a single language codebase forever. But, as you suggested, the switch to a language-agnostic framework can be deferred until it becomes necessary.

However, there's a reason to use Typical specifically: asymmetric fields. This feature allows you to change your schema over time without breaking compatibility and without sacrificing type safety.

If you ever expect to have newer versions of the codebase reading messages that were generated by older versions of the codebase (or vice versa), this is a concern that will need to be addressed. This can happen when you have a system that isn't deployed atomically (e.g., microservices, web/mobile applications, etc.) or when you have persisted messages that can outlive a single version of the codebase (e.g., files or database records).

An embedded DSL could in principle provide asymmetric fields, but I'm not aware of any that do.

> Typical isn't focused on JSON, so it doesn't seem like it is optimized for web.

It just makes different trade-offs than most web-based systems, but that doesn't make it unsuitable for web use. We have comprehensive integration tests that run in the browser. Deserialized messages are simple passive data objects [1] that can be logged or inspected in the developer console.

[1] https://en.wikipedia.org/wiki/Passive_data_structure