Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Learn TLA+ (learntla.com)
254 points by yarapavan on Sept 28, 2022 | hide | past | favorite | 64 comments


I haven't done any work on this since July due to needing to prepare a new TLA+ workshop from scratch, but now that that's over I plan to spend a day each week doing updates. On the docket:

- Moving some of the examples from Practical TLA+ over to the site

- Adding more exercises to the core

- Adding the new modules and TLC options showcased at this years TLAConf

- Way, way more examples and advanced topics. I have half a dozen pure TLA+ specifications on my computer I need to write up.


Just wanted to say that I think you single-handedly have made formal methods approachable to an order of magnitude more people (at least) than it was before.

TLA+ is much easier to get into than, say, Coq or Isabelle or something, but a lot of the "pre-learntla" content out there is still really mathey and scary for engineers to jump into. I think you did a really good job making TLA+ and PlusCal a useful tool for non-scientists. Thank you.


It would be quite easy for Coq or Isabelle to express the logic in a TLA spec, you're basically just adding a few modalities (time, state, non-determinism) over plain old boolean logic. Then you're free to use any of a number of tools (usually SAT/SMT solvers) to try and find a refutation of your spec. There really isn't a lot of complex math involved.


Sure, but that's somewhat orthogonal to my point.

Yes, you could basically do anything that TLA+ does inside of Isabelle (more familiar with that one than Coq), but it's harder. I think what makes TLA+ interesting is that an engineer with basically no understanding of anything more complex than "I basically know what a set is" can be useful within an afternoon or two.

I love Isabelle but it's got a substantially steeper learning curve than TLA+. I'm not saying it's unnecessarily steep, because Isabelle is a lot more versatile than TLA+, but as a result it's a fairly difficult sell for employers, at least in my experience.


> needing to prepare a new TLA+ workshop from scratch,

I didn't attend your workshop at Strange Loop but I talked to a few people that did and all of them had good things to say about it.


This is incredible. You have been a big help in both TLA+ and Alloy 6 — thanks so much for everything you do :)


I just really really really like teaching stuff :)


Love "Computer Things", and, tbh, forget sometimes that you and the "LearnTLA+ guy" are the same person; great work on both fronts, and thank you so much for putting posts/resources out there and contributing to these corners of the web; I'll be sure to check out LTLA+ when it & I are ready c:


One thing I really appreciate about your site is that you have a setup guide, with photos! This is really helpful for folks unfamiliar with how to run a spec.

https://learntla.com/core/setup.html


I used your guide earlier this very week to get a handle on the basics of pcal/tla for some stuff at work. Thanks!


I've invested a fair amount of time into learning TLA+, have bought Hillel Wayne's book (Practical TLA+), and found his stuff super valuable. However, if I could do it again, I would have started in the following order:

- 1. Watch https://www.youtube.com/watch?v=-4Yp3j_jk8Q

- 2. Watch all of https://www.youtube.com/watch?v=p54W-XOIEF8&list=PLWAv2Etpa7...

- 3. Work through examples on https://learntla.com

I'm normally not a huge fan of YouTube over books / web sites, but in this case the best content comes from Lamport himself, and he's presented it as a YouTube Course. Also, the costume changes are hilarious.


I watched Lamport's entire video series once and only just felt I was starting to get it. I will have to watch the entire thing again.

I also found a (very minor, inconsequential) bug in the materials and emailed him, and he wrote me back and said thanks, so that felt like a career highlight to me ;)


As a contribution to further increasing the accessibility of TLA+, I've created a VSCode Dev Container environment supporting it. I believe it's the shortest publicly available route as of this writing to a fully working environment, with TLA+, LaTex formatting, graphviz visualization of state machines, etc. Just go here and follow the instructions. It's 10-15 minutes to a working environment, and you can then go through the TLA+ tutorial using the environment for a nice introduction to TLA+ (albeit with sadly low resolution video) and a guided tour of how to use this environment. After the quietest of roll-outs, it's six forks and six stars. Hope you'll agree! https://github.com/kevinsullivan/TLAPlusDocker


I'm a bit skeptical of TLA - while I'm not so full of myself that I believe there's nothing I could learn that would make me a better programmer, TLA has always seemed... awfully dubious. In fairness, I haven't devoted more than a few minutes scanning over it, but I've also never seen or heard of anybody else using it (but maybe I don't travel in the right circles?), much less getting any benefit from it. The examples always seem awfully contrived and prone to exactly the same sorts of problems that code itself has. The promise, if I understand it correctly, is to have a way to nail down the design before you start coding so that you don't have a whole class of surprises (bugs), but this extra non-debuggable step actually seems like it would be worse.


I used it to debug the design of a bidirectional message-passing system with checksums, acknowledgements, retransmissions and message-order-maintenance guarantees.

Here is the good:

* A high level of confidence in the design - useful for systems you need to be reliable, or where a bug could create hard-to-diagnose problems.

* The satisfaction of having done a really good job for once, like the great programmers of yore who couldn't patch after release and had to get it right the first time.

* Interesting in an academic sense.

Here is the bad:

* You can still make a mistake when translating the design into code.

* The tools are sometimes baffling, both in their design decisions and their performance. Be prepared for some frustrations.

* With the tools complex and the design proven, your colleagues might not take over the TLA portion of your work and carry it forward.


There was a great talk at TLAConf this year by a PE at Nike, where they used it to find bugs in their in-store payment apps. He said it's a 40/60 chance legal will give us permission to share the recording, though.


I have used it in production at small companies (i.e. not Amazon, as some other commenters mention) for things that needed very high reliability. It helped us find a few race conditions that took > 20 steps to reproduce, that I would probably never have found otherwise, and also helped us be very confident in our fixes. I strongly recommend it if you work in scenarios where race conditions can have very bad consequences. OTOH, I also had 2-3 failed efforts to learn the language and tooling before it really stuck, and I still mostly just use the pluscal syntax.


Maybe give this a scan, to understand the impact it had had on AWS as of 2015: https://lamport.azurewebsites.net/tla/amazon-excerpt.html


Formal verification is not new. It's been around for at least 40 years. The founders of model checking even got Turing award back in 2017. All major chip makers use formal verification to verify their circuit designs. Microsoft published papers more than 10 years ago on how they use model checking to verify Windows kernel. It was a pretty big thing when people formally verified the IEEE cache coherence protocol. Universities have been teaching formal verification and model checking for over 20 years. The list goes on. What TLA+ offers, though, is amazing usability to engineers who had no interesting studying temporal logic in depth or all kinds of mathematical logic in general. Previous generation of engineers who want to use model checking had to deal with atrocities like this: https://matthewbdwyer.github.io/psp/patterns/ctl.html#Univer.... Yeah, I'm not kidding, the simplest spec would take days if not weeks for engineers to master, if they can master them at all.

> but this extra non-debuggable step actually seems like it would be worse. Not really. Some types of bugs are just too hard to be spotted by mere mortal, or too expensive to catch in production. Case in point, do you know there's a subtle bug or at least ambiguity in the description of Paxos Made Simple? I don't know how many hundreds of people have read the paper, but I doubt if more than 100 of them spotted the bug. Similarly, Amazon hired about 20 experts in formal verification to help them catch elusive flaws in specifications. After, if S3 corrupts customer data, the consequence to the S3 team can be devastating, no?


I've heard it is used at AWS. I've seen it personally used in crypto on the Cardano blockchain. Leslie Lamport gave a talk at my old workplace and mentioned several big companies in tech and finance that use it but I'm not sure if that is public info.


The FAQ [0] clarifies some of what TLA+ is good for

[0] https://learntla.com/intro/faq.html


I'm surprised I had to navigate to the glossary to actually "learn" what the acronym 'TLA' actually meant, I'd recommend at least having that promoted to the home-page.


Temporal Logic of Actions, for other curious readers.


I just assumed it was Three Letter Acronym.


yes, i noticed that when i looked up the term and saw what it meant, it helped my understanding of the topic.


I've only ever used TLA+ in anger once but by god it was a good once.


What I don't get about TLA+, as far as I understand, it'll "only" tell you whether or not some invariants can be violated, given the current design of your program.

While this does sound useful, it also sounds like now the problem is just shifted to coming up with proper invariants, isn't it?


Sometimes the invariants are obvious.

For example, some sorting algorithms are very complex in the details of their implementation - but they all have the quality that given an input, the output should be the same set of elements, in order.


That's a post-condition not an invariant.


The invariant form would be something like: for any program state, there is eventually a state that will be reached that contains the ordered set containing input elements.

My phrasing is rough, but it is an invariant that can be expressed by TLA+


Aren't invariants essentially just a very specific "what do you want"? TLA+ doesn't know what you want the code to do. It can only tell you if it's doing it properly.


If you're working on problems where TLA+ can help you, then you damn make sure you know your state invariants.


perhaps I'm missing the intent, but would it be feasible to have formal verification specifications like with TLA+ act as a blueprint for generating machine code?

It would be fairly cool to be able to design a system using tools like this and have a way to translate it without the possibility of introducing errors during translation to code.

Perhaps what I'm thinking is more automatic generation of test cases? But I'm not quite sure what my brain is trying to watch onto here.


The universal problem is "generating code from a formal spec" is extremely, maddeningly, pull-your-hair-outingly hard. A slightly easier thing, as you said, generating test cases, which is what the Mongo people have been advancing: https://arxiv.org/abs/2006.00915

(Another interesting approach in this space is P, which is lower-level than TLA+ but also can compile to C#: https://github.com/p-org/P)


Check out Coyote (which is an evolution of P) and allows you to "model check" .NET code to find hard to find race condition-y bugs in your implementation. We have used it very successfully on a number of projects at Microsoft. I should probably do more public talks on it to get the word out. Finding subtle bugs in your implementation using tools like Coyote is easier than most people realize.


TLA+ and Coyote pair quite well actually. Verifying the high level design in TLA+ while checking subtle bugs as you translate that design to an implemention.


Aren't F* and Dafny kinda also solving this problem of bridging the gap between formal spec and executable code? Another interesting question is if there are gaps between the abstract systems-level specs of TLA and the relatively low-level specs of F* that might need bridging?


TLA+ is not an end-to-end system. You're not necessarily verifying your final code, most of the time you're only building a "toy" model of your high-level design and verifying that. So generating code from the model would not be helpful.


I don't remember the details, but I remember some people where comparing traces of execution between the program and the TLA+ code.

If the traces don't match, you've got a problem.


Programmers will be even more productive when someone comes up with TLA++, which will help you check if your TLA+ code has design flaws.


I know you're being tongue-in-cheek... but this is actually something we care about a lot! There's a technique in formal specification called "Refinement", where you show a detailed spec successfully implements a more abstract one.

On the tooling side, Informal Systems has been doing some good work on static-typechecking TLA+ specifications: https://apalache.informal.systems/docs/tutorials/snowcat-tut...

(Disclosure, I've done consulting work for IS.)


TLA+ is not code but mathematics because that's what allows it to be arbitrarily expressive [1] so that (we hope) you can write a short and clear high-level description that is easy enough (with some study) to fully grasp and see if it is what you intended.

[1]: To the point where it can easily describe systems that can't be implemented in reality.


This is so snarky.

People are putting work into solving classes of bugs that cost us billions and billions in GDP per year.

This is one solution, and sure, like TDD, has tradeoffs and problems. The details, circumstances, and implementation all play a role.

But it's a potential way to go. Snark is not.

What's your solution?


It's HN, have some fun.


It's HN, don't.


I tried to use TLA+ but what annoys me the most is the disconnection between the actual implementation and its code. I think the P language has a much better future just because it can generate code that works: https://github.com/p-org/P


I've been writing TLA+ for like 10 years, it's been a great tool and I found many bugs in protocols thanks to it


Anyone know of some good free software TLA+ model checkers? The "Other Tooling" mentions one alternative checker, https://apalache.informal.systems/, but that's all I could find. Thanks.


Is there any way to do performance testing with TLA+? I always had the impression that TLA+ could help you validate the design of what you intend to build, but was less helpful if you wanted to see the performance implications of the design.


The problem with TLA+ is that you still have to hand translate your solution to C++/C#/Java/…

Using Dafny (for example) gives you the advantages of TLA+ combined with proven correct running code.


Dafny is cool, but I believe they tackle different problem spaces. For example, how would you verify a distributed algorithm with Dafny?



Thanks, Hillel. I brought a speculation to a citation fight -- I'm glad you won :) Seriously, this is a very interesting read.


You absolutely can. TLA+ is basically a state transition system. Something you can write in Dafny and prove correct using its Refinement types. And the end result is running software you can drop right into production.


In the world of CRUD microservices with browser-based front ends, what does TLA+ give you over something like unit tests + integration tests (Selenium, cypress)?


Multiple services talking to each other via passing messages or RPC - ideal use case for TLA+ to make sure you don't end up in undesired state (deadlock, endless loop, ...)


Related:

Learn TLA+ - https://news.ycombinator.com/item?id=31952643 - July 2022 (74 comments)

Learn TLA+ (2018) - https://news.ycombinator.com/item?id=22393653 - Feb 2020 (58 comments)

Learn TLA+ (2018) - https://news.ycombinator.com/item?id=19661329 - April 2019 (92 comments)

Also related:

Ask HN: Do you use TLA+? - https://news.ycombinator.com/item?id=30193431 - Feb 2022 (24 comments)

TLA+ is hard to learn (2018) - https://news.ycombinator.com/item?id=28256643 - Aug 2021 (44 comments)

TLA+ Action Properties - https://news.ycombinator.com/item?id=26649273 - March 2021 (36 comments)

TLA+ - https://news.ycombinator.com/item?id=26385075 - March 2021 (69 comments)

Applying TLA+ in cloud systems [video] - https://news.ycombinator.com/item?id=25426030 - Dec 2020 (14 comments)

Using TLA+ in the Real World to Understand a Glibc Bug - https://news.ycombinator.com/item?id=24958504 - Nov 2020 (76 comments)

Finding Goroutine Bugs with TLA+ - https://news.ycombinator.com/item?id=24591131 - Sept 2020 (40 comments)

A walkthrough tutorial of TLA+ and its tools: analyzing a blocking queue - https://news.ycombinator.com/item?id=22496287 - March 2020 (6 comments)

TLA+ model checking made symbolic - https://news.ycombinator.com/item?id=21662484 - Nov 2019 (51 comments)

Using TLA+ for fun and profit in the development of ElasticSearch [video] - https://news.ycombinator.com/item?id=21003470 - Sept 2019 (15 comments)

Modeling Adversaries with TLA+ - https://news.ycombinator.com/item?id=19839388 - May 2019 (13 comments)

TLA+: design, model, document, and verify concurrent systems - https://news.ycombinator.com/item?id=19821272 - May 2019 (32 comments)

Using TLA+ to Model Cascading Failures - https://news.ycombinator.com/item?id=19623634 - April 2019 (24 comments)

Using TLA+ to Understand Xen Vchan - https://news.ycombinator.com/item?id=18814350 - Jan 2019 (12 comments)

Modeling Message Queues in TLA+ - https://news.ycombinator.com/item?id=18357550 - Nov 2018 (46 comments)

Practical TLA+ - https://news.ycombinator.com/item?id=18249841 - Oct 2018 (6 comments)

The TLA+ Video Course by Leslie Lamport - https://news.ycombinator.com/item?id=16956778 - April 2018 (17 comments)

Modeling Redux with TLA+ - https://news.ycombinator.com/item?id=16569653 - March 2018 (33 comments)

TLA+ in Practice and Theory, Part 3: The Temporal Logic of Actions - https://news.ycombinator.com/item?id=14528072 - June 2017 (7 comments)

TLA+ in Practice and Theory, Part 2: The + in TLA+ - https://news.ycombinator.com/item?id=14475791 - June 2017 (8 comments)

Principles of TLA+ - https://news.ycombinator.com/item?id=14432754 - May 2017 (23 comments)

Formal Methods in Practice: Using TLA+ at ESpark - https://news.ycombinator.com/item?id=14221848 - April 2017 (19 comments)

Leslie Lamport: Video course on TLA+ - https://news.ycombinator.com/item?id=13918648 - March 2017 (74 comments)

My experience with using TLA+ in distributed systems class - https://news.ycombinator.com/item?id=10220264 - Sept 2015 (12 comments)

TLA+ - https://news.ycombinator.com/item?id=9601770 - May 2015 (21 comments)


I started learning pluscal on my own. The biggest insight for me was that we have to test uh - everything.


Does TLA+ support double-primed variables? Like second-derivative sort of thing.


Unfortunately it does not.


Can anybody explain to me what this is? I've never heard of this.


does anyone have recommended video-based ways of learning about TLA+?


There's a video series by Leslie Lamport himself: https://lamport.azurewebsites.net/video/videos.html

There's also Dr. TLA+ series by Microsoft Research at https://www.youtube.com/watch?v=ao58xine3jM&list=PLD7HFcN7LX...


You can watch Leslie Lamport himself explaining it at https://lamport.azurewebsites.net/video/videos.html . It doesn't cover Pluscal, but goes deep into TLA+.

(BTW count the different shirts).




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: