Does anything like the Antithesis hypervisor exist as open source?
The closest I've seen is Qemu record/replay, but that's very slow (no KVM acceleration, no multicore), and broken in current Qemu versions (replayed system just gets stuck).
There's languages that support time travel debugging, like RR for GDB, or smalltalk, but no open source system wide thing like Antithesis that I know of yet.
rr can record process trees; i.e. basically any part/descendant of a process you spawn will be recorded and can be replayed (userspace CPU & memory, that is); won't record the entire OS though.
Would love to hear a technical comparison between this and King et al.'s classic paper on Time-Traveling VMs from USENIX ATC 2006: "Debugging operating systems with time-traveling virtual machines" (https://www.usenix.org/legacy/events/usenix05/tech/general/k..., 505 citations).
Seguing to talks regarding literal time traveling VMs, I'm reminded of Damian Conway's "Temporally Quaquaversal Virtual Nanomachine Programming In Multiple Topologically Connected Quantum-Relativistic Parallel Spacetimes... Made Easy!" presentation.
Essentially, it involves a series of sci-fi concepts, and then showing the kind of program (in modified perl) that someone might write to take advantage of those capabilities.
It is really interesting to me that this sort of thing didn’t come from programming language folks like I’d expect. You’d think PLs are in the absolute perfect spot to implement things, because they define the semantics and runtime. And there are a few PLs who have time-travel demos, but they’ve never really been seen as more than a cool tech demo.
Perhaps the language is too small a vantage point to really get into what’s happening when debugging.
From the little I have seen, most programming language folks don't seem to care much about debugging. They care a lot about bugs not happening in the first place, which is good, testability is sometimes taken in consideration, but not much about what to do after a bug happened.
No language will prevent you from misimplementing the specs, but languages can be designed in such a way that it easy to trace back why the button is green and not red.
It seems like those who are the most serious about debugging are from the video game industry. They get all the cool stuff with time travel, hot reload, etc... So much that I expected to see something about video games, and was surprised it wasn't.
Coming out of the games industry, I am constantly amazed by how rarely people outside of games use debuggers. And, how slow they are to debug everything because of that...
> Perhaps the language is too small a vantage point to really get into what’s happening when debugging
A little bit. The big thing that others are missing is that it's basically impossible for a PL to accomplish this. Antithesis is basically recording all the state including I/O, network I/O, all RNGS (including the OS) and the big one which everyone has trouble with which is time. So basically you don't need to set up your code and how it interfaces with its environment to be deterministic - you can run within a deterministic container instead which flips the problem on its head and makes it much easier. I'm sure there are tradeoffs. A noteable one is how expensive and slow this approach is vs making your code deterministic. But given how basically no one bothers to make their code deterministic and this is a drop-in solution for scenarios like that, it's really worth it. Additionally, unlike approaches like rr which offer similar capabilities, this is even more generic & not dependent on adding support for every OS interface (e.g. rr doesn't support io_uring yet but I believe antithesis would since it's running at the VM level)
I know time-travel debugging is very very close to Gilad Bracha's heart and something he was really hoping would make its way into Dart.
I don't know to what degree this is true for other language teams but one thing I've observed is that language designers, compiler people, VM people, and IDE/debugger people have more distinct cultures than you might expect. That can make it hard to ship features that cut across those domains. I think we've gotten a lot better at doing that kind of holistic design on the Dart team, but it took years of team-building to get there.
There's reverse debugging and then there's what antithesis does which is a deterministic guarantee of the state. So for example, if you rewind, you'll get the exact same disk & network I/O happening across each call. And it supports arbitrary OS operations whereas typically at the PL level you'll be left at the mercy of whichever OS APIs the PL chooses to support for recording (i.e. similar to rr in terms of what it'll be able to do). Often times, PLs don't even bother with recording state across OS calls since they don't actually know what are OS calls vs normal function calls.
Yeah, I can believe this. I've been working in static analysis for C++ for only 2 years or so, but balancing soundness, precision, and reasonable analysis time really does a number on being more holistic about how to approach these problems. Very much feels like your brain just cannot see other ways to reason about programs because it is so sunk into the current way.
I sometimes wonder if this sort of determinism is the sort of thing that is either designed in from the start of a system/PL, or you need near-hardware level control (like Antithesis).
Yeah it's fun that we get to do this at the hypervisor level. This opens up time-traveling in systems where there's cross-machine or inter-process communication, which really widens what we're able to do.
(I work at Antithesis, if youre interested in chatting more once this thread has gone cold come join discord.gg/antithesis)
I've enjoyed reading many of the blog posts by Antithesis, really cool work.
I don't really see a fit for the automated testing product in our stack at the moment, but I would love to use a time traveling hypervisor that I can hop into whenever I'd like.
Currently, it seems your pricing is pretty focused on the automated testing service. Do you have pricing or plans that offer just the deterministic dev environment?
(antithesis employee here) We don't currently just offer the deterministic dev environment, but we do offer extended 30 day demos for prospects interested in trying out the tech and seeing how it works. If you're interested contact us directly! contact@antithesis.com
How do you handle side effects that interact with third party systems? In my own tests, I use network request mocks. Do you need to provide a test mode flag to indicate that mocks should be used?
Any third party service does need to be mocked or stubbed out. We have a partnership with Localstack that lets us provide very polished AWS mocks that require zero configuration on your part (https://antithesis.com/docs/using_antithesis/environment.htm...).
If you need something else, reach out and ask us about it, because we have a few of them in the pipeline.
I was once working in a company producing software / operating systems for smart cards (such as the chips on your credit cards). We developed a simulator for the hardware that logged all changes to registers, memory and other states in a very large ring buffer, allowing us to undo / step backwards through code. With RAM being large, those chips being slow, and some snapshotting, we were usually able to undo back to the reset of the card. That was a game changer regarding debugging the OS.
Is the hypervisor multicore? How do you handle shared memory non-determinism? What is the runtime slowdown for shared memory multicore (lets say 16 cores if you need a concrete example) execution?
Found the answer in a different post [1]. The hypervisor and virtual machines are single-core only. The talk also indicates that all I/O operations need to be manually rewritten to use the instrumented mechanism, so it demands a highly paravirtualized guest OS. Logically, that means there are probably no cross-VM shared memory interfaces either. So, no shared memory and thus no need to deal with shared memory non-determinism.
This is just a standard replay engine from what I can tell.
> Let’s get more concrete. Let’s use this to solve a real problem. My server has crashed and its process has exited! No worries, I’ll just rewind time, attach a debugger to the process, and set a breakpoint or capture a thread dump:
Is this kind of stuff only possible in an Antithesis Environment?
The intro mentions that ordinarily, we have to pay a high upfront cost to record info that we might need to debug later.
> When we succeed at this, we collect huge volumes of logs “just in case” they provide some crucial clue, incurring equally huge storage costs.
The 'packets from the past' section says we can just retroactively decide what we should have recorded.
Doesn't that mean we're effectively recording everything always? What's the cost of this?
Or is all of this under the assumption that we never have to debug something that happened outside of the simulation environment, e.g. in response to an actual in-bound request from a customer? If this is just saying we can afford to save everything in our development environment ... well in that context recording the logs probably wasn't a "huge storage cost" either, right? Or am I missing something basic here?
You're right that if you tried to do something like this using record/replay, you would pay an enormous cost. Antithesis does not use record/replay, but rather a deterministic hypervisor (https://antithesis.com/blog/deterministic_hypervisor/). So all we have to remember is the set of inputs/changes to entropy that got us somewhere, not the result of every system operation.
The closest I've seen is Qemu record/replay, but that's very slow (no KVM acceleration, no multicore), and broken in current Qemu versions (replayed system just gets stuck).
Essentially, it involves a series of sci-fi concepts, and then showing the kind of program (in modified perl) that someone might write to take advantage of those capabilities.
Perhaps the language is too small a vantage point to really get into what’s happening when debugging.
No language will prevent you from misimplementing the specs, but languages can be designed in such a way that it easy to trace back why the button is green and not red.
It seems like those who are the most serious about debugging are from the video game industry. They get all the cool stuff with time travel, hot reload, etc... So much that I expected to see something about video games, and was surprised it wasn't.
If the spec is written in the language itself, then some languages certainly will.
See Lean, Rocq, Isabelle, etc
A little bit. The big thing that others are missing is that it's basically impossible for a PL to accomplish this. Antithesis is basically recording all the state including I/O, network I/O, all RNGS (including the OS) and the big one which everyone has trouble with which is time. So basically you don't need to set up your code and how it interfaces with its environment to be deterministic - you can run within a deterministic container instead which flips the problem on its head and makes it much easier. I'm sure there are tradeoffs. A noteable one is how expensive and slow this approach is vs making your code deterministic. But given how basically no one bothers to make their code deterministic and this is a drop-in solution for scenarios like that, it's really worth it. Additionally, unlike approaches like rr which offer similar capabilities, this is even more generic & not dependent on adding support for every OS interface (e.g. rr doesn't support io_uring yet but I believe antithesis would since it's running at the VM level)
I don't know to what degree this is true for other language teams but one thing I've observed is that language designers, compiler people, VM people, and IDE/debugger people have more distinct cultures than you might expect. That can make it hard to ship features that cut across those domains. I think we've gotten a lot better at doing that kind of holistic design on the Dart team, but it took years of team-building to get there.
I sometimes wonder if this sort of determinism is the sort of thing that is either designed in from the start of a system/PL, or you need near-hardware level control (like Antithesis).
(I work at Antithesis, if youre interested in chatting more once this thread has gone cold come join discord.gg/antithesis)
I don't really see a fit for the automated testing product in our stack at the moment, but I would love to use a time traveling hypervisor that I can hop into whenever I'd like.
Currently, it seems your pricing is pretty focused on the automated testing service. Do you have pricing or plans that offer just the deterministic dev environment?
If you need something else, reach out and ask us about it, because we have a few of them in the pipeline.
This is just a standard replay engine from what I can tell.
[1] https://news.ycombinator.com/item?id=41501577
Is this kind of stuff only possible in an Antithesis Environment?
> When we succeed at this, we collect huge volumes of logs “just in case” they provide some crucial clue, incurring equally huge storage costs.
The 'packets from the past' section says we can just retroactively decide what we should have recorded.
Doesn't that mean we're effectively recording everything always? What's the cost of this? Or is all of this under the assumption that we never have to debug something that happened outside of the simulation environment, e.g. in response to an actual in-bound request from a customer? If this is just saying we can afford to save everything in our development environment ... well in that context recording the logs probably wasn't a "huge storage cost" either, right? Or am I missing something basic here?
[0]: https://www.usenix.org/conference/enigma2016/conference-prog...
[1]: https://qira.me/