Leslie Lamport: Video course on TLA+

Great to see this here! I act as a TA for Dr. Lamport's TLA+ courses at Microsoft, and can answer any questions y'all have.

TLA+ in one sentence: it is a language used to write specifications, same as you might write a spec in English/your chosen informal language, except here you write your spec in basic mathematics; benefits of a formal specification language include freedom from ambiguity, model-checking, and even machine-checked proofs of correctness.

This language is a joy to use and I've found it really affects the way I think about system design.

anateus · 8 years ago

Kind of an out of left field question: Lamport appears in a little square of video occasionally overlaid on the slides and sometimes full screen. As far as I can tell, he never recycles an outfit and seems to switch head gear regularly. The switch is often between segments that are quite short.

It feels almost like an intentional joke. Do you think that is the case? Or just even short segments were recorded on different days and Lamport like to wear all sorts of hats? That one where he's wearing a beanie and indoor sunglasses seems particularly intentional.

I've never met him, and the delivery of the lecture seems fairly dry and serious, so I can't tell :)

ahelwer · 8 years ago

Haha yep, it's intentional. He has a sense of humor like that. Famously, he first presented his paper on the Paxos algorithm dressed like Indiana Jones with the fictional backstory of the algorithm being an archeological discovery of the ancient parliamentary systems of the Greek island of Paxos.

djb_hackernews · 8 years ago

I haven't watched the videos yet but I have been keeping my eye on TLA+ ever since seeing the whitepaper from Amazon and how they use TLA+ to spec out their distributed systems. As someone that works on distributed systems that could use some formality I think TLA+ could help, however, I have a hard time really understanding it.

Do you know of any straight forward non-trivial examples? Something a little more complex than "hello world" and something that isn't abstract?

ahelwer · 8 years ago

There's a few I can think of! First, the TLA+ GitHub repo has a bunch of examples[0]. The best one to start with is the Die Hard puzzle[1], where the heroes must obtain four gallons of water given only three and five gallon jugs. In the spec[2] we set up the system and specify all possible actions; then we model-check the system to see whether it can spit out an execution trace of actions to reach a state where we have four gallons of water in a jug.

The Wikipedia article also has some good example specs[3].

For your own first specification, I personally cut my teeth on a river-crossing puzzle with a farmer, wolf, and sheep.

[0] https://github.com/tlaplus/Examples/tree/master/specificatio...

[1] https://www.youtube.com/watch?v=6cAbgAaEOVE

[2] https://github.com/tlaplus/Examples/blob/master/specificatio...

[3]https://en.wikipedia.org/wiki/TLA+

hwayne · 8 years ago

I wrote an essay[0] about a non-trivial example I did at my company. We had to work with several finicky APIs and the TLA+ spec caught several critical bugs with it. Plus, it's purely a business logic system, so it's pretty concrete.

If you're interested in learning more, I also wrote a beginner's guide[1] to the language, which contains concrete examples and exercises.

[0] https://medium.com/espark-engineering-blog/formal-methods-in...

[1] https://learntla.com/introduction/

jdubray · 8 years ago

I have created the SAM Pattern [1] (State/Action/Model) based on my interpretation of TLA+. The goal is to make the semantics available to developers. Why not writing code as close as possible to the way it would be specified?

SAM can also be used for stateful API/Microservice orchestrations [2]

If you want a slightly more formal introduction to SAM and its relationship to TLA+ (again, based on my own interpretation) I gave this lecture last month [3].

[1] http://sam.js.org

[2] http://www.ebpml.org/blog15/2015/06/designing-a-reliable-api...

[3] http://cloudsentinel.com/sam-state-machines-and-computation....

agentultra · 8 years ago

As someone who worked with distributed systems for many years before using TLA+ to find bugs and validate work: It's possible to understand it.

I think it would help to have a good mentor, as I did, or a lecture series from the language inventor. I ran through the trial version of these videos when Lamport was developing them and it's quite approachable even for someone who's more of a liberal arts kind of person.

colanderman · 8 years ago

I work on distributed systems and have started pushing for us to use TLA+/PlusCal to model them. I have a simple PlusCal model of a system with two threads which juggles network and disk I/O which I can e-mail you if you like. It's not at all a complex system, but I think it captures the basics of modeling concurrency and I/O well.

gghh · 8 years ago

I have a passing familiarity with proof assistants such as Cow and HOL. How would you compare TLA+ with those?

https://en.m.wikipedia.org/wiki/Coq

https://en.m.wikipedia.org/wiki/HOL_(proof_assistant)

ahelwer · 8 years ago

Great question! So, TLA+ does have a machine-checked proof system called TLAPS, but its intended use is quite a bit different than Coq. While Coq is very well-suited for doing deep reasoning about mathematics and proving all sorts of stuff about whatever mathematical construct you care to define, TLAPS is focused much more on "mile wide, inch deep" type proofs required to demonstrate correctness of algorithms. Most algorithms use very simple mathematical properties, but have a bunch of edge cases and cyclomatic complexity which TLAPS is well-equipped to handle.

There might come a day where TLAPS is as powerful as Coq for writing general mathematical proofs (indeed, better ways of writing proofs and mathematics seem to be a passion of Dr. Lamport's, see his paper How to Write a 21st Century Proof[0]), but for now it's far, far easier to prove even simple statements about the Natural numbers in Coq.

[0] http://lamport.azurewebsites.net/pubs/proof.pdf

pron · 8 years ago

This is a complicated question. I am currently writing a blog post series on the theory of TLA+ that I will publish in May that, among other things, touches precisely on this point. But I will try to quickly point out some areas of difference, but the most important thing to remember is that both Coq and TLA+ -- as far as specifying and verifying software is concerned (not general math) -- are universal formalisms. Ultimately, anything you can do in TLA+ you can do in Coq and vice-versa. Also, when using mechanical proof to verify software, the proof techniques and the actual work is pretty much the same.

1. Difference in theory -- the theory of computation in Coq and HOL is based on functional programming; especially in Coq (I know very little about HOL), algorithms are functions in the FP sense (lambda expressions). In TLA, algorithms are state machines, and there's a richer notion of relationship between algorithms. This makes some things easier in TLA: Instead of programs being equal extensionally or definitionally, they can be equal or similar in many ways. In fact, in TLA there's a partial order relation on algorithms that reflects the refinement/abstraction relation. Also, it means that different kinds of algorithms -- sequential, concurrent, parallel, realtime etc., are all defined in the same way using the exact same ideas. This also makes it very easy to specify some properties -- such as worst-case complexity -- that are tricky in Coq. Having said that, people working on realtime systems in Coq implemented a simple version of TLA in Coq.

2. Differences in goals and audience -- Coq and HOL are ultimately general mathematical proof assistants. TLA+ is a tool for reasoning about digital systems. Coq and HOL can obviously reason about programs and TLA+ can prove general math theorems, but the focus is different. For example, Coq is used by logicians to explore new logics and mathematical foundations; you don't want to use TLA+ for that. On the other hand, TLA+ is used by "ordinary" engineers in industry working on "ordinary" software, while Coq and HOL are virtually unused in industry, and in the few cases they are, it's almost always in conjunction with academics and academic projects.

3. Differences in tools/abilities -- Coq can generate runnable code and be used for end-to-end verification. TLA+ cannot generate code, and while it has been used by academics for end-to-end verification, that is most certainly not its intended use. It is intended to be used in modeling large, complex systems at an abstraction level that's significantly higher than the code to find errors in design. As far as general math proofs are concerned, Coq is more capable than TLAPS, the TLA+ proof system. On the other hand, TLA+ has a model-checker that makes its proof checker almost irrelevant for large, complex specifications. As the users of TLA+ usually don't require a 100% watertight end-to-end specification, they're content to get the less than 100% guarantee of the model checker in exchange for what is at least a one order of magnitude reduction in effort.

In short, I would say that tools like Coq have a more academic focus, and are in general more interesting to academics and less to engineers, while TLA+ has a more industrial focus, and is of more interest to engineers and less to academics.

But I will repeat and say that while there are big difference in theory, goals and usage, ultimately when working on verifying software, the challenge is in thinking about your system mathematically and understanding how and why it works (or not). This work is similar both in effort and technique no matter what formalism you choose to use.

tmccrmck · 8 years ago

Outside of distributed systems what is it used for?

I've only ever heard of it's use in the formalization of distributed systems.

ahelwer · 8 years ago

You are correct the language really shines when there is concurrency (not necessarily just distributed systems), because validating system invariants with the model-checker is just magical. I've also found the language to hold great value for specifications as such; Dr. Lamport seems to believe that most technical problems just disappear when you precisely identify what it is you want to do, and writing a TLA+ spec is an efficient way of accomplishing that. So, I've found it useful to write specs for solutions to very technical problems simply to clarify my thinking and root out areas of sloppiness.

pron · 8 years ago

Indeed, TLA+ is mostly known for formalizing and verifying distributed systems and concurrent algorithms. This has a few reasons:

1. Lamport's algorithmic work is in concurrent and distributed algorithms, and as he uses TLA for his own algorithms -- and he's TLA+'s first user -- that's how it's known.

2. Few other general software verification formalisms are able to handle concurrency as easily as TLA, so that is where it shines in comparison. Of course, this is intentional because Lamport designed TLA+ to work well for the kinds of algorithms he's interested in.

3. When engineers write software systems that are too complex or subtle to be obviously correct, and could therefore benefit from formal verification, it is usually the case that a concurrent or distributed algorithm is involved.

Having said that, there is nothing specific about TLA+ that makes it any less suitable for sequential algorithms. If you write one that you think is complicated or subtle enough to require help in reasoning, you could certainly benefit from TLA+.

But if you're interested just in sequential algorithms, you do have more options, like Why3/WhyML, which I think is very nice (although I prefer TLA+). Whether WhyML or TLA+ would be better suited to verify your sequential algorithm depends largely on personal aesthetic preferences, as well as some details of requirements. For example, WhyML can generate OCaml code, while TLA+ can't. On the other hand, TLA+ has both SMT solvers and a model checker, which makes it more likely that you can verify the entire algorithm automatically, while WhyML requires manual proof in Isabelle or Coq for properties the automatic solvers can't verify (I hear they're working on a proof language directly in Why; I hope it's as nice as TLA+'s proof language). Also, TLA+ lets you describe your algorithm at various levels of abstraction, and show that the more detailed ones implement the more abstract ones. WhyML is more focused on the abstraction level of actual program code.

hwayne · 8 years ago

I've personally found it absolutely fantastic for modeling business logic. It's also really good for any kind of concurrent system, which covers pretty much every webtech startup.

a-saleh · 8 years ago

I understand that I wouldn't be able to actually run the TLA+ specs, but is there some way how would I be able to run i.e. some of the traces against a live system?

I do QA work and for quite some time I have been intrigued by property based testing. But we are usually testing lot of interconnected micro-services and when I tried to define test-scenario in i.e. quick-check, the end-result was always too complicated to maintain.

I basically tried to do a similar thing to John Hughes "Mysteries of Dropbox" paper.

It seems that with temporal logic it would have been much easier to specify the expected behavior of the program and then use the generated trace to instrument the live service and check that all of the properties still hold.

ahelwer · 8 years ago

This is a really good idea! It's been pitched inside where I work in Azure, but not yet implemented anywhere (to my knowledge). The basic idea is you have a TLA+ spec of your system, then you run the model-checker in "simulation" mode to generate random execution traces. You take these generated execution traces, and, according to some predefined correspondence between the actions in your TLA+ spec and some API call in your code (a node becoming inaccessible or a message being sent, for example), run the execution traces on your real-world system.

I'd love to see you implement this and write about your experiences.

sadgit · 8 years ago

I havn't looked at the new video series, but when I was trying to get started with TLA+ using the Eclipse toolbox, i got stuck trying to define a behaviour spec for the model checker. Do you know of any good documentation for this? Thanks

cerebraltangent · 8 years ago

Kindly add transcripts to this for the benefit of the hearing impaired people ?

ahelwer · 8 years ago

Excellent suggestion!

maerF0x0 · 8 years ago

What role/job title usually writes the TLA+ specs at an organization?

unboxed_type · 8 years ago

I hold a "senior researcher" position at research dep. of one of security-related IT companies. I am developing a TLA model to specify one of companies product core protocols. My previous job title was "senior distributed systems developer"; I was doing a similar thing there.

ahelwer · 8 years ago

I've yet to come across a TLA+-specific job title, although would love to be surprised. Usually it's the same people writing the spec, or engineers who produce a TLA+ spec from a provided informal-language spec.

Deleted Comment

qznc · 8 years ago

Why not YouTube? Clickable links are possible there as well.

albertocsm · 8 years ago

plz check -> https://groups.google.com/d/msg/tlaplus/lAeWdCE9MLw/v03t3UEv...

justanotheratom · 8 years ago

Dr. Lamport does courses at Microsoft? More details please.

ahelwer · 8 years ago

Only for Microsoft employees, unfortunately! If you work at Microsoft, send me an email (alias is same as username here) and I'll get you added to the TLA+ Outlook group for future course announcements and such.

UK-AL · 8 years ago

He worked for Microsoft research, so not surprising.

seahckr · 8 years ago

attended a course where @ahewler TA'd. can attest, he is awesome! disclaimer - msft engineer.

ahelwer · 8 years ago

Aw, thanks!