'Make invalid states unrepresentable' considered harmful

To me, this article misses the mark.

The database is not your domain model, it is the storage representation of your domain model on disk. Your REST/grpc/whatever API also isn’t your domain model, but the representation of your domain model on the wire.

These tools (database, protocols) are not the place to enforce making invalid states un-representable for reasons the article mentions. You translate your domain model into and out of these tools, so code your domain model as separately and as purely as you can and reject invalid states during translation.

Zanfa · 6 months ago

I disagree completely.

You’re already paying the cost of abstraction for using a certain database or protocol, so get the most bang for your buck. If you can encode rules in a schema or a type, it’s basically free, compared to having to enforce it in code, hoping that future developers (yourself or others) will remember to do the same. It just eliminates and entire universe of problems you have to deal with otherwise.

Also, while relaxing constraints is usually easy or at least doable, enforcing new constrains on existing data is impossible in practice. Never seen it done successfully.

The only exception to this rule I typically make is around data state transitions. Meaning that even when business rules dictate a unidirectional transition, it should be bidirectional in code, just because people will click the wrong button and will need a way to undo “irreversible” actions.

bccdee · 6 months ago

You can't gloss over the difference between "schema" and "type." Schemas exist at the edges between pieces of software. They govern how you talk to databases or APIs; they needs to be forwards- and backwards-compatible. A type, conversely, exists within one program. You can update types and type invariants without needing to migrate state or make a breaking API change.

"Make invalid states unrepresentable" is much better advice for the internals of a program than it is for the sql or protobuf schemas at that program's margins. The point of "parse, don't validate" is to transform flexibly-represented external data into strongly-typed internal data that respects invariants; since you can update your internal data model much more easily than the models of external data, keeping external data representations flexible is sometimes just as important as keeping internal representations strict.

lock1 · 6 months ago

In cases like electronics & protocols, it's very often a good idea to add an extra "reserved & unused" section for compatibility reasons.

These representations need not be 1:1 with the domain model. Different versions of the model might reject previously accepted representation in case of breaking changes. It's up to the dev to decide on which conflict reconciliation strategy they should take (fallback values, reject, compute value, etc).

Working with a precise domain model (as in, no representable invalid states) is way more pleasant than stringly-typed/primitives mess. We can just focus on domain logic without continuously second-guessing whether a string contains valid user.role values or it contains "cat".

mamcx · 6 months ago

No, this is wrong, and the example of the database is the best example.

Very regrettably, RBDMS (traditionally) has not support for the complete relational model, neither the way to represent nested relations and for what we have as today with algebraic types.

Making impossible to TRULY model the domain. (plus other factors related to sql and such).

Is the same as old OOP languages that also were incapable of do the same.

The INHABILITY of TRULY model the domain is what is harmful. You need to impledance mismatch all the time, everywhere.

The second thing: What is the DOMAIN?

And here is where you are into something: The DOMAIN of the database and the DOMAIN of the transport protocols and the DOMAIN of the GUI, etc are DISTINCT domains!

Was VERY common in the past that the DBA (that actually understand RDBMS not like most current devs) know that he must model the DB in ways that support not only N apps (with different domains) and hopefully provide a abstraction for them (in terms of VIEWS and functions) but also for the operators.

The article point to this, but incorrectly say is a problem of trying to make the invalid unrepresentable, when, if you have doing DBs for decades, is the total opposite.

For example, being unable to eliminate the mistake of NULL is a headache with not end. Or try to use `1 0` as bools, or need to fill the table with a lot of NULLables columns because you can't represent a algebraic OR.

rich_sasha · 6 months ago

I'm not sure I agree.

My perfunctory reading is thus: first you couple your state representation to the business logic, and make some states unrepresentable (say every client is category A, B or C). Maybe you allow yourself some flexibility, like you can add more client types.

Then the business people come and tell you some new client is both A and B, and maybe Z as well, quickly now, type type type.

And that there's a tradeoff between:

- eliminating invalid states, leading to less errors, and

- inflexibility in changes to the business logic down the way.

Maybe I misunderstood, but if this is right, then it's a good point. And I would add: when modelling some business logic, ask yourself how likely it is to change. If it's something pretty concrete and immovable, feel free to make the representation rigid. But if not, and even if the business people insist otherwise, look for ways to retain flexibility down the line, even if it means some invalid states are indeed representable in code.

lock1 · 6 months ago

IMO, rather than focusing on flexibility vs inflexibility when deciding "tight domain model" or not, it's much better to think about whether your program requirement can tolerate some bugs or not.

Say we have a perfect "make illegal states unrepresentable" model. Like what you said, it's kind of inflexible when there are requirement changes. We need to change affected codes before you can even proceed to compile & run.

On the other hand, an untyped system is flexible. Just look at Javascript & Python ecosystem, a function might even contain a somewhat insane and gibberish statement, yet your program might still run but will throw some error at runtime.

Some bugs in programs like games or average webapp don't matter that much. We can fix it later when users report the bug.

While it's probably better to catch whether an user can withdraw a negative balance or not at compile time, as we don't want to introduce "infinite money glitch" bug.

lmm · 6 months ago

Yep. "Make invalid states unrepresentable" pairs well with "parse, don't validate"; the states that are valid for your business domain are (maybe) not the same as the states that are valid for your storage or wire format, so have different representations for these things.

motorest · 6 months ago

> To me, this article misses the mark.

Yes, I agree. The blogger shows a fundamental misunderstanding of what it means to "make invalid states unrepresentable". I'll add that the state machine example is also pretty awful. The blogger lists examples of usecases that the hypothetical implementation does not support, and the rationale to not implement it was that "this can dramatically complicate the design". Which is baffling as the scenario was based on complicating the design with "edge cases", but it's even more baffling when the blogger has an epiphany of "you need to remain flexible enough to allow some arbitrary transitions". As if the whole point was not to allow some transitions and reject all other that would be invalid.

The foreign key example takes the cake, though. Allowing non-normalized data to be stored in databases has absolutely no relation with the domain model.

I stopped reading after that point. The blog post is a waste of bandwidth.

lock1 · 6 months ago

To be fair to Sean (post author), it does kind of make sense if you view "make invalid states unrepresentable" from distributed system perspective (Sean's blog tends to cover this topic) as it way more painful to enforce there.

Making invalid states unrepresentable may be a great idea or terrible idea depending on what you are doing. My experience is all in scientific simulation, data analysis, and embedded software in medical devices.

For scientific simulations, I almost always want invalid state to immediately result in a program crash. Invalid state is usually due to a bug. And invalid state is often the type of bug which may invalidate any conclusions you'd want to draw from the simulation.

For data analysis, things are looser. I'll split data up into data which is successfully cleaned to where invalid state is unrepresentable and dirty data which I then inspect manually to see if I am wrong about what is "invalid" or if I'm missing a cleaning step.

I don't write embedded software (although I've written control algorithms to be deployed on it and have been involved in testing that the design and implementation are equivalent), but while you can't exactly make every invalid state unrepresentable you definitely don't punch giant holes in your state machine. A good design has clean state machines, never has an uncovered case, and should pretty much only reach a failure state due to outside physical events or hardware failure. Even then, if possible the software should be able to provide information to intervene to fix certain physical issues. I've seen devices RMA's where the root cause was the FPU failed; when your software detects the sort of error that might be hardware failure, sometimes the best you can do is bail out very carefully. But you want to make these unknown failures be a once per thousands or millions of device years event.

Sean is writing mostly about distributed systems where it sounds like it's not a big deal if certain things are wrong or there's not a single well defined problem being solved. That's very different than the domains I'm used to, so the correct engineering in that situation may more often be to allow invalid state. (EDIT: and it also seems very relevant that there may be multiple live systems updated independently so you can't just force upgrade everything at once. You have to handle more software incompatibilities gracefully.)

shepherdjerred · 6 months ago

> For scientific simulations, I almost always want invalid state to immediately result in a program crash.

If you have actually made invalid states unrepresentable, then it is _impossible_ for your program to transition into an invalid state at runtime.

Otherwise, you're just talking about failing fast

cherryteastain · 6 months ago

> _impossible_ for your program to transition into an invalid state at runtime

Not the case for scientific computing/HPC. Often HPC codebases will use numerical schemes which are mathematically proven to 'blow up' (produce infs/nans) under certain conditions even with a perfect implementation - see for instance the CFL condition [1].

The solution to that is typically changing to a numerical scheme more suited for your problem or tweaking the current scheme's parameters (temporal step size, mesh, formula coefficients...). It is not trivial to find what the correct settings will be before starting. Encountering situations like a job which runs fine for 2 days and then suddenly blows up is not particularly rare.

[1] https://en.m.wikipedia.org/wiki/Courant%E2%80%93Friedrichs%E...

neRok · 6 months ago

> Invalid state is usually due to a bug

I don't think the article is referring to that sort of issue, which sounds fundamental to the task at hand (calculations etc). To me it's about making the code flexible with regards to future changes/requirements/adaptions/etc. I guess you could consider Y2K as an example of this issue, because the problem with 6 digit date codes wasn't with their practicality at handling dates in the 80's/90's, but about dates that "spanned" beyond 991231, ie 000101.

bakemawaytoys · 6 months ago

kqr · 6 months ago

The thing about making invalid states unrepresentable is that we are often overconfident in what counts as invalid. What counts as valid and invalid behaviour is given by the requirements specification, but if there's anything that changes frequently throughout development, it's the requirements. What's invalid today might be desirable tomorrow.

Thus, we have to be careful at which level we make invalid states unrepresentable.

If we follow where Parnas and Dijkstra suggested we go[1], we'll build software with an onion architecture. There, the innermost layers are still quite primitive and will certainly be capable of representing states considered invalid by the requirements specification! Then as we stack more layers on top, we impose more constraints on the functionality that can be expressed, until the outermost layer almost by accident solves the problems that needed to be solved. The outermost layers are where invalid states should be unrepresentable.

What often happens in practice when people try to make invalid states unrepresentable is they encode assumptions from the requirements specification into the innermost layers of the onion, into the fundamental building blocks of the software. That results in a lot of rework when the requirements inevitably change. The goal of good abstraction should be to write the innermost layers such that they're usable across all possible variations of the requirements – even those we haven't learned yet. Overspecifying the innermost layers runs counter to that.

In the example of the app market place, any state transitions that are well-defined should be allowed at the inner layer of the abstraction that manages state transitions, but the high-level API in which normal transitions are commanded should only allow the ones currently considered valid by the requirements.

[1]: https://entropicthoughts.com/deliberate-abstraction

rupertlssmith · 6 months ago

I think you got your onion the wrong way around. Inner layers more constrained, outer layers less - working out from Entities in the middle.

An inner layer that permits lots of invalid behavious is akin to building a house on sand. Even Jesus had a parable about that!

3np · 6 months ago

Good insight and goes along with keeping abstraction layers from leaking in any direction best you can.

slowking2 · 6 months ago

danpalmer · 6 months ago

One of the main pushbacks in this article is on the difficulty of later edits once the domain changes. The "make invalid states unrepresentable" mantra really came out of the strongly typed functional programming crowd – Elm, F#, Haskell, and now adopted by Rust. These all have exceptionally strong compilers, a main advantage of which is _easier refactoring_.

Which side of the argument one falls on is likely to be heavily influenced by which language they're writing. The mantra is likely worth sticking to heavily in, say, Haskell or Rust, and I've had plenty of success with it in Swift. Go or Java on the other hand? You'd probably want to err on the side of flexibility because that suits the language more and you can rely on the compiler less during development.

b_e_n_t_o_n · 6 months ago

Perhaps it's not really language but the type of programs developers use these languages for? Open vs closed systems, heavy/light interactions with the outside world, how long they're maintained, how much they change, etc.

frizlab · 6 months ago

I can tell you the language is a huge part of it. I used to code in ObjC and now I’m using Swift; refactoring is easy in Swift and was a pain in ObjC.

I know I can trust my Swift code. Usually when it compiles it works because I try to, and often can, make invalid states unrepresentable. My ObjC code was always full of holes because doing so was not so easy (or not possible at all)…

pyrale · 6 months ago

The gist of that advice is that closed systems are that much easier to work with.

The tradeoff open system users have to evaluate is whether they’d rather have data rejected and not evaluated if it doesn’t fit their model of it (extreme case: your program silently drops key instructions), or whether they’d rather have their program work with data it wasn’t built to understand (extreme case: sql injections).

Proponents of this mantra argue that it’s easier to make a process to monitor rejected inputs than to fix issues created by rogue data.

That might be another factor. You could say that a program in a stable ecosystem doesn't need changing so should priorities strictness over flexibility. However even in a changing ecosystem, rather than building in flexibility that allows for incorrect states, you can raise the level of abstraction and build in extension points that still retain nearly the same strictness while still giving you the flexibility to change in the future in the ways that you will need.

NooneAtAll3 · 6 months ago

easier refactoring - for the cost of having much more of it

flexibility saves the effort and allows doing more of the Actual Things

taberiand · 6 months ago

The Actual Things being mostly fixing technical debt introduced over the years by not making invalid states unrepresentable

yrand · 6 months ago

The problem with making invalid states representable is that the app logic must be able to handle it, everywhere. Which means you have to always take it into account when reasoning about your app and implementing it, otherwise it will blow up in your face when the invalid state actually occurs. Which means your state machine is inherently larger and harder to reason about.

To illustrate what I mean, take the null pointer. It's essentially an Optional<T> and in a lot of languages it's involuntary, meaning any value is implicitly optional. What happens when something returns null, but in some place in your logic you don't take it into account? Null pointer dereference.

jraph · 6 months ago

Yep. That's what I was going to comment.

I'm not convinced, because you have to deal with invalid state in some way or another in any case.

If you have to delete some record referenced by other records using foreign keys, you'll have to handle the complexity in any case, except if this is enforced by the database, you'll have no choice than to think about this upfront, which is good. It might lead you to handle this differently, for instance by not deleting the row but mark it as deleted, or by changing the id to some "ghost" user.

If you don't do this, all the complexity of not having to deal with the inconsistencies when you are creating them will have to be encoded in your code at write time, with null or exception checks everywhere.

Constraints encodes what safe assumptions your code make to be simpler. If an assumption needs to be relaxed, it's going to be hard to change things as you'll have to update all the code, but the alternative is to make no assumption at all anywhere, which is hard too and leads to code that's more complicated than necessary.

And then what does an invalid reviewer_id mean? Was this caused by a user deletion? Or a bug somewhere? Or some corruption?

Bonus: some commenters here write that it can depend on which programming language is used, I don't think it matters at all. You'll have to handle the null or dangling values in any case, whether the language guides you for this or not.

Mikhail_Edoshin · 6 months ago

The accounting practice in Russia (and, I guess, in other countries) has a concept of a correcting transaction ("storno"): it is an entry in the books made specifically to undo a mistake. To stand out these entries are made in red ink. (Obviously, it is an old rule.) So yes, a data model needs a way to make an arbitrary change to correct a mistake, that is right. But that is about it. To expand it to "let's place no restrictions" is too far.

A type by definition is a set of restrictions, and normally we go by making these restrictions more and more elaborate. The "Parsing Techniques" book has a nice analogy about the expressive power of different grammars: more powerful ones do not describe larger sets of sequences, but carve out more and more specific subsets of valid sequences out of the pool of all possible sequences (which itself is trivial to define).

A type by definition is a set of possible states, if we "allow invalid states", then we've just defined a different type, a wider one. Whether it is what you need depends on the situation. E.g. we add entries to a database and place restrictions on them. But the user may take a long time to compose an entry. Fine; let's add a new type, "draft", that is similar to the primary entry type but does not have their restrictions. These drafts are stored separately so the user may keep editing them until they are ready. But they do not go into the part of the system that needs proper entries.

Aside from the flaws of the article, one provided example annoys me:

> What happens when you need to account for “official” apps, which are developed internally and shouldn’t go through the normal review process?

There is a reason devs are advised to eat their own dogfood. Building a process-bypass for the users that are also the ones responsible for fixing the process is the easiest way to get a broken process.

000ooo000 · 6 months ago

Yep. It's an example contrived to the point of silliness anyway. Like most problems, you would decompose it - not hamfist a bunch of new shit in, throw your hands in there because it's complex, then write a confused blog.

karmakaze · 6 months ago

The updated design of that state machine looks so bad I stopped reading there. Either the author isn't the best person to be making these arguments or are being disingenuous to support one.

I do agree with the overall take, "it depends" which is where you end up when you get enough experience and can also be able to articulate when/where and why. As others mention, I find that being able to represent invalid states (and function calls for that matter) in static types is well worth it up to a point.

The other thing I want to say is that you can have a state machine without a thing called a state machine, it's basically what the app is doing regardless so there's really no way of getting away from it, it's a matter of whether you build it with that clarity or not. For the record I prefer implementations that are state machines rather than ones that use an abstraction called a state machine (as the former is a 'zero cognitive-load' abstraction).

jeschiefer · 6 months ago

"some invalid states" - what does this mean, please? How do I constrain the "some" part? If you can't, you might as well say "mostly invalid states", which is what tends to happen in practice.

The whole point of state machines/type constraints/foreign key constraints/protocol definitions is that there is a clear boundary between what is allowed and what is not. And I would argue that this is what makes life easier, because you can tell one from the other. And with the right tooling, the compiler or some conformance tool will tell you that the change you just introduced, breaks your implementation in 412 places.Because this allows me to stop and think whether I can make a different change that only creates 117 problems. Or estimate the cost of the changes and compare it with the business benefit, and have appropriate conversations with stake holders. Or find out that my domain model was completely wrong to begin with and start refactoring my way out of this. And all of this before the first successful compile.

For me, this gives me maximum flexibility, because I have a good idea of the impact of any changes on the first day of development. This does require appropriate tooling though, like you would find in Ocaml, F#, Rust, State Machine Compilers, ... <insert your favourite tool here>.