Systems Correctness Practices at AWS: Leveraging Formal and Semi-Formal Methods

A key concern I've consistently had regarding formal verification systems is: how does one confirm the accuracy of the verifier itself?

This issue appears to present an intrinsically unsolvable problem, implying that a formally verified system could still contain bugs due to potential issues in the verification software.

While this perspective doesn't necessarily render formal verification impractical, it does introduce certain caveats that, in my experience, are not frequently addressed in discussions about these systems.

alexisread · a year ago

I'd guess the best approach is similar to security - minimal TCB (trusted computing base) that you can verify by hand, and then construct your proof checker on top, and have it verify itself. Proof and type checkers have a lot of overlap, so you can gain coverage via the language types.

I can't recall exactly, but I think CakeML (https://cakeml.org/jfp14.pdf) had a precursor lisp prover, written in a verified lisp, so would be amenable to this approach.

EDIT: think it was this one https://www.schemeworkshop.org/2016/verified-lisp-2016-schem...

guerrilla · a year ago

This is correct. That's why you have core type theories. Very small, easy to understand and implement in a canonocal way. Other languages can be compiled to them.

newAccount2025 · a year ago

A deeply considered take: this is theoretically interesting but in practice matters very little.

It’s like worrying about compiler bugs instead of semantic programming errors. If you are getting to the point where you have machine checked proofs, you are already doing great. It’s way more likely that you have a specification bug or modeling error than an error in the solver.

If you are worried about it, many people have solutions that are very compelling. There are LCF style theorem provers like HOL Light with tiny kernels. But for any real world systems, none of this matters, use any automated solvers you like, it will be fine.

ninetyninenine · a year ago

This is not just a problem with formal verification systems. It's a problem intrinsic to math, science and reality.

We call these things axioms. The verifier must be assumed to be true.

For all the axioms in mathematics... we don't know if any of those axioms are actually fully true. They seem to be true because we haven't witnessed a case where they aren't true. This is similar to how we verify programs with unit tests. For the verifier it's the same thing, you treat it like an axiom you "verify" it with tests.

gqgs · a year ago

I believe there is a distinction between those concepts.

Axioms are true by their own definition. Therefore, discovering an axiom to be false is a concept that is inherently illogical.

Discovering that a formal verification system produced an incorrect output due to a bug in its implementation is a perfectly well-defined concept and doesn't led to any logical contradictions; unless you axiomatically define the output of formal verification systems to be necessarily correct.

I believe this definition don't make sense in the general case considering the number of vectors that can introduce errors into software.

trenchgun · a year ago

Verifiers can be based on a small proven kernel. That is not really the issue.

The issue is writing the formal specification. Formal verification is in nutshell just proving the equivalence of two different formal constructs: one that is called the spec and one that is called the system (or program). Writing specs is hard.

csbartus · a year ago

Perhaps there is no such thing like absolute truth.

In category theory / ologs, a formal method for knowledge representation, the result is always mathematically sound, yet ologs are prefixed with "According to the author's world view ..."

On the other way truth, even if it's not absolute, it's very expensive.

Lately AWS advocates a middle ground, the lightweight formal methods, which are much cheaper than formal methods yet deliver good enough correctness for their business case.

In the same way MIT CSAIL's Daniel Jackson advocates a semi-formal approach to design likely-correct apps (Concept design)

It seems there is a push for better correctness in software, without the aim of perfectionism.

whattheheckheck · a year ago

Yeah it reminds me of https://en.m.wikipedia.org/wiki/Tarski%27s_undefinability_th...

solidsnack9000 · a year ago

...it does introduce certain caveats that, in my experience, are not frequently addressed in discussions about these systems.

What are those?

As far as I can tell, people don't deploy verified or high assurance systems without testing or without online monitoring and independent fail safes.

Ar-Curunir · a year ago

There are many things people try to do here: multiple implementations of the verifier, formally verified implementations of the verifier in other verification tools, by-hand verification, etc.

deterministic · a year ago

This is a solved problem. See CakeML for example.

I find this highly unlikely. My first day at Amazon I encountered an engineer puzzling over a perfect sine wave in a graph. After looking at the scale I made the comment "oh. you're using 5 minute metrics." Their response was "how could you figure that out just by looking at the graph?" When I replied "Queuing theory and control theory," their response was "what's that?"

Since then, Amazon's hiring practices have only gotten worse. Can you invert a tree? Can you respond "tree" or "hash map" when you're asked what is the best data structure for a specific situation? Can you solve a riddle or code an ill-explained l33tcode problem? Are you sure you can parse HTML with regexes? You're Amazon material.

Did you pay attention to the lecture about formal proofs. TLA+ or Coq/Kami? That's great, but it won't help you get a job at Amazon.

The idea that formal proofs are used anywhere but the most obscure corners of AWS is laughable.

Although... it is a nice paper. Props to Amazon for supporting Ph.D.'s doing pure research that will never impact AWS' systems or processes.

scubbo · a year ago

They specifically mention "the P team at AWS". The two following things are perfectly able to be simultaneously true:

* The average Amazon engineer is not expected to have awareness of CS fundamentals that go beyond LeetCode-y challenges

* The average Amazon engineer builds using tools which are each developed by a small core team who _are_ expected to know those fundamentals, and who package up those concepts to be usefully usable without full understanding

(I did 10 years on CDO, and my experience matches yours when interacting with my peers, but every time I interacted with the actual Builder Tools teams I was very aware that lived in different worlds)

nextos · a year ago

> The average Amazon engineer is not expected to have awareness of CS fundamentals that go beyond LeetCode-y challenges

I find this a bit unsettling. There are dozens of great CS schools in the US. Even non-elite BSc programs in EU sometimes teach formal methods.

There are also some good introductory books now, e.g. [1]. Perhaps its time to interview more on concepts and less on algorithmic tricks favored by LeetCode?

I doubt current undergrads can't go beyond LeetCode-like challenges.

[1] Formal Methods, An Appetizer. https://link.springer.com/book/10.1007/978-3-030-05156-3

Dichlorodiphen · a year ago

We recently did a proof of concept with P for our system, and the reception was warm enough that I expect adoption within the year. I wouldn’t exactly call us obscure, at least in the sense used above—greenfield big data system with a bog-standard mix of AWS services.

I will say that time to productivity with P is low, and, in my experience, it emphasizes practical applicability more so than traditional modeling languages like TLA+ (this point is perhaps somewhat unsubstantiated, but the specific tooling we used is still internal). That is to say, it is fairly easy to sell to management, and I can see momentum already starting to pick up internally. No Ph.D. in formal methods required.

And re: the hiring bar, I agree that the bulk of the distribution lies a bit further left than one would hope/imagine, but there is a long tail, and it only takes a handful of engineers to spearhead a project like this. For maintainability, we are betting on the approachable learning curve, but time will tell.

matu3ba · a year ago

Thanks for the interesting insight. 1. What would you recommend to model multi-thread or multi-process problems on the same system? Is there a good read to give recommendations? 2. Is there a good overview on trade-offs between CSP, Actor model and other methods for this? 3. Are you aware of any process group models?

Taikonerd · a year ago

The problem (in my mind) with using P is that only a small % of problems are formalizable as communicating state machines.

In your greenfield big data system, what were the state machines? What states could they be in?

gqgs · a year ago

Cool anecdote but I'm not sure how reasonable it is to expect every person to have expert domain knowledge and recall of every single computer science field just because they got a job to work at Amazon or any other MAANG company.

Deleted Comment

atdt · a year ago

OK, I will be the useful idiot. I don't fully understand your anecdote. Could you explain what is exactly it that you perceived and that the other engineer failed to see?

sethammons · a year ago

Periodic and regular cycles (sine wave) suggests that there is over correction happening in the system, preventing a stable point. More regular measurements or tempered corrections may be called for.

Concrete example: the system sees it's queue utilization as high so it throttles incoming requests, but for too long, the queue looks super healthy on the next check, so the throttle removed but too long until checking again, and now the util is too high again.

georgeburdell · a year ago

A graph of what?

Deleted Comment