Why Oxide Chose Illumos

> QEMU is often the subject of bugs affecting its reliability and security.

{{citation needed}}?

When I ran the numbers in 2019, there hadn't been guest exploitable vulnerabilities that affected devices normally used for IaaS for 3 years. Pretty much every cloud outside the big three (AWS, GCE, Azure) runs on QEMU.

Here's a talk I gave about it that includes that analysis:

slides - https://kvm-forum.qemu.org/2019/kvmforum19-bloat.pdf

video - https://youtu.be/5TY7m1AneRY?si=Sj0DFpRav7PAzQ0Y

TimTheTinker · a year ago

> When I ran the numbers in 2019, there hadn't been guest exploitable vulnerabilities that affected devices normally used for IaaS for 3 years.

So there existed known guest-exploitable vulnerabilities as recently as 8 years ago. Maybe that, combined with the fact that QEMU is not written in Rust, is what is causing Oxide to decide against QEMU.

I think it's fair to say that any sufficiently large codebase originally written in C or C++ has memory safety bugs. Yes, the Oxide RFD author may be phrasing this using weasel words; and memory safety bugs may not be exploitable at a given point in a codebase's history. But I don't think that makes Oxide's decision invalid.

bonzini · a year ago

That would be a damn good record though, isn't it? (I am fairly sure that more were found since, but the point is that these are pretty rare). Firecracker, which is written in Rust, had one in 2019: https://www.cve.org/CVERecord?id=CVE-2019-18960

Also QEMU's fuzzing is very sophisticated. Most recent vulnerabilities were found that way rather than by security researchers, which I don't think it's the case for "competitors".

hinkley · a year ago

If they are being precise, then “reliability and security” means something different than “security and reliability”.

How many reliability bugs has QEMU experienced in this time?

The man power to go on site and deal with in the field problems could be crippling. You often pick the boring problems for this reason. High touch is super expensive. Just look at Ferrari.

anonfordays · a year ago

>Pretty much every cloud outside the big three (AWS, GCE, Azure) runs on QEMU.

QEMU typically uses KVM for the hypervisor, so the vulnerabilities will be KVM anyway. The big three all use KVM now. Oxide decided to go with bhyve instead of KVM.

bonzini · a year ago

No, QEMU is a huge C program which can have its own vulnerabilities.

Usually QEMU runs heavily confined, but remote code execution in QEMU (remote = "from the guest") can be a first step towards exploiting a more serious local escalation via a kernel vulnerability. This second vulnerability can be in KVM or in any other part of the kernel.

cmeacham98 · a year ago

> The big three all use KVM now.

This isn't true - Azure uses Hyper-V (https://learn.microsoft.com/en-us/azure/security/fundamental...), and AWS uses an in-house hypervisor called Nitro (https://aws.amazon.com/ec2/nitro/).

6c696e7578 · a year ago

Azure uses hyper-v, unless things have changed massively, the linux they run for infra and customers is in hyper-v.

_rs · a year ago

I thought AWS uses KVM, which is the same VM that QEMU would use? Or am I mistaken?

bonzini · a year ago

AWS uses KVM in the kernel but they have a different, non-open source userspace stack for EC2; plus Firecracker which is open source but is only used for Lambda, and runs on EC2 bare metal instances.

Google also uses KVM with a variety of userspace stacks: a proprietary one (tied to a lot of internal Google infrastructure but overall a lot more similar to QEMU than Amazon's) for GCE, gVisor for AppEngine or whatever it is called these days, crosvm for ChromeOS, and QEMU for Android Emulator.

daneel_w · a year ago

QEMU can use a number of different hypervisors, KVM and Xen being the two most common ones. Additionally it can also emulate any architecture if one would want/need that.

Deleted Comment

dvdbloc · a year ago

What do the big three use?

paxys · a year ago

AWS – Nitro (based on KVM)

Google – "KVM-based hypervisor"

Azure – Hyper-V

You can of course assume that all of them heavily customize the underlying implemenation for their own needs and for their own hardware. And then they have stuff like Firecracker, GVisor etc. layered on top depending on the product line.

daneel_w · a year ago

Some more data:

Oracle Cloud - QEMU/KVM

Scaleway - QEMU/KVM

> Xen: Large and complicated (by dom0) codebase, discarded for KVM by AMZN

  1. Xen Type-1 hypervisor is smaller than KVM/QEMU.
  2. Xen "dom0" = Linux/FreeBSD/OpenSolaris. KVM/bhyve also need host OS.
  3. AMZN KVM-subset: x86 cpu/mem virt, blk/net via Arm Nitro hardware.
  4. bhyve is Type-2.
  5. Xen has Type-2 (uXen).
  6. Xen dom0/host can be disaggregated (Hyperlaunch), unlike KVM.
  7. pKVM (Arm/Android) is smaller than KVM/Xen.

> The Service Management Facility (SMF) is responsible for the supervision of services under illumos.. a [Linux] robust infrastructure product would likely end up using few if any of the components provided by the systemd project, despite there now being something like a hundred of them. Instead, more traditional components would need to be revived, or thoroughly bespoke software would need to be developed, in order to avoid the technological and political issues with this increasingly dominant force in the Linux ecosystem.

Is this an argument for Illumos over Linux, or for translating SMF to Linux?

cthalupa · a year ago

> Is this an argument for Illumos over Linux, or for translating SMF to Linux?

I'd certainly like that! I had spent some time working with Solaris a lifetime ago, and ran a good amount of SmartOS infrastructure slightly more recently. I really enjoyed working with SMF. I really do not enjoy working with the systemd sprawl.

I will note the distinction between type-1/type-2 hypervisors never really made sense, and makes even less sense today. http://blog.codemonkey.ws/2007/10/myth-of-type-i-and-type-ii...

evandrofisico · a year ago

I've been using xen in production for at least 18 years, and although there is been some development, it is extremely hard to get actual documentation on how to do things with it.

There is no place documenting how to integrate the Dom0less/Hyperlaunch in a distribution or how to build infrastructure with it, at best you will find a github repo, with the last commit dated 4 years ago, with little to no information on what to do with the code.

transpute · a year ago

> github repo, with the last commit dated 4 years ago

Some preparatory work shipped in Xen 4.19.

Aug 2024 v4 patch series [1] + Feb 2024 repo [2] has recent dev work.

> hard to get actual documentation

Hyperlaunch: this [3] repo looks promising, but it's probably easier to ask for help on xen-devel and/or trenchboot-devel [4]. Upstream acceptance is delayed by competing boot requirements for Arm, x86, RISC-V and Power.

dom0less: ELC2022 slides [5] and video [6].

[1] https://lists.xenproject.org/archives/html/xen-devel/2024-08... [2] https://github.com/FidelisPlatform/xen

[3] https://github.com/apertussolutions/initrd-builder [4] https://groups.google.com/g/trenchboot-devel

[5] https://www.slideshare.net/StefanoStabellini/static-partitio... [6] https://www.youtube.com/watch?v=CiELAJCuHJg

bonzini · a year ago

Talking about "technological and political issues" without mentioning any, or without mentioning which components would need to be revived, sounds a lot like FUD unfortunately. Mixing and matching traditional and systemd components is super common, for example Fedora and RHEL use chrony instead of timesyncd, and NetworkManager instead of networkd.

netbsdusers · a year ago

> Talking about "technological and political issues" without mentioning any

I don't know why you think none were mentioned - to name one, they link a GitHub issue created against the systemd repository by a Googler complaining that systemd is inappropriately using Google's NTP servers, which at the time were not a public service, and kindly asking for systemd to stop using them.

This request was refused and the issue was closed and locked.

Behaviour like this from the systemd maintainers can only appear bizarre, childish, and unreasonable to any unprejudiced observer, putting their character and integrity into question and casting doubt on whether they should be trusted with the maintenance of software so integral to at least a reasonably large minority of modern Linux systems.

packetlost · a year ago

The Oxide folks are rather vocal about their distaste for the Linux Foundation. FWIW I think they went with the right choice for them considering they'd rather sign up for maintaining the entire thing themselves than saddling themselves with the baggage of a Linux fork or upstreaming

actionfromafar · a year ago

I read it as "we can sit in this more quiet room where people don't rave about systemd all day long".

dijit · a year ago

Honestly, SMF is superior to SystemD and it’s ironic it came earlier (and, that shows based on the fact that it uses XML as its configuration language.. ick).

However, two things are an issue:

1) The CDDL license of SMF makes it difficult to use, or at least that’s what I was told when I asked someone why SMF wasn’t ported to Linux in 2009.

2) SystemD is it now. It’s too complicated to replace and software has become hopelessly dependent on its existence, which is what I mentioned was my largest worry with a monoculture and I was routinely dismissed.

So, to answer your question. The argument must be: IllumOS over Linux.

transpute · a year ago

> software has become hopelessly dependent on its existence

With some effort, Devuan has managed to support multiple init systems, at least for the software packaged by Devuan/Debian.

> SMF is superior to SystemD ... [CDDL]

OSS workalike opportunity, as new Devuan init system?

> The argument must be: IllumOS over Linux.

Thanks :)

anonfordays · a year ago

>Honestly, SMF is superior to SystemD

Maybe 15 years ago, not by a mile now. systemd surpassed SMF years ago and it's not even close now. No one in their right mind would pick SMF over systemd in 2024.

ReleaseCandidat · a year ago

Instead of stating more or less irrelevant reasons, I'd prefer to read something like "I am (have been?) one of the core maintainers and know Illumos and Bhyve, so even if there would be 'objectively' better choices, our familiarity with the OS and hypervisor trump that". A "I like $A, always use $A and have experience using $A" is almost always a better argument than "$A is better than $B because $BLA", because that doesn't tell me anything about the depth of knowledge of using $A and $B or the knowledge of the subject of decision - there is a reason half of Google's results is some kind of "comparison" spam.

But everyone at Oxide already knows that back story. At least if you list some other reasons list you can have a discussion about technical merits if you want to.

But that doesn't make sense if you have specialists for $A that also like to work with $A. Why should I as a customer trust Illumos/Bhye developers that are using Linux/KVM instead of "real" Linux/KVM developers? The only thing that such a decision would tell me is to not even think about using Illumos or Bhyve.

The difference between

    "Buy our Illumos/Bhye solution! Why? I have been an Illumos/Bhyve Maintainer!"

and

    "Buy our Linux/KVM solution! Why? I have been an Illumos/Bhyve Maintainer!"

should make my point a bit clearer

panick21_ · a year ago

But Bryan also ported KVM to Illumos. And Joyand used KVM and they supported KVM there for years, I assume Bryan knows more about KVM then Bhyve as he seemed very hands on in the implementation (there is nice talk on youtube). So the idea that he isn't familiar with KVM isn't the case. So based on that KVM or Bhyve on Illumos, KVM would suggest itself.

In the long term if $A is actually better then $B, then it makes sense to start with $A even if you don't know $A. Because if you are trying to building a company that is hopefully making billions in revenue in the future, then long term matters a great deal.

Now the question is can you objectively figure out if $A or $B is better. And how much time does it take to figure out. Familiarity of the team is one consideration but not the most important one.

Trying to be objective about this, instead of just saying 'I know $A' seems quite like a smart thing to do. And writing it down also seems smart.

In a few years you can look back and actually say, was our analysis correct, if no what did we misjudge. And then you can learn from that.

If you just go with familiarity you are basically saying 'our failure was predetermined so we did nothing wrong', when you clearly did go wrong.

jclulow · a year ago

For what it's worth, we at _Joyent_ were seriously investing in bhyve as our next generation of hypervisor for quite a while. We had been diverging from upstream KVM, and most especially upstream QEMU, for a long time, and bhyve was a better fit for us for a variety of reasons. We adopted a port that had begun at Pluribus, another company that was doing things with OpenSolaris and eventually illumos, and Bryan lead us through that period as well.

specialist · a year ago

> Trying to be objective about this... And writing it down also seems smart.

Mosdef.

IIRC, these RFDs are part of Oxide's commitment to FOSS and radical openness.

Whatever decision is ultimately made, for better or worse, having that written record allows the future team(s) to pick up the discussion where it previously left off.

Working on a team that didn't have sacred cows, an inscrutible backstory ("hmmm, I dunno why, that's just how it is. if it ain't broke, don't fix it."), and gatekeepers would be so great.

sausagefeet · a year ago

While it's fair to say this does describe why Illumos was chosen, the actual RFD title is not presented and it is about Host OS + Virtualization software choice.

Even if you think it's a foregone conclusion given the history of bcantrill and other founders of Oxide, there absolutely is value in putting decision to paper and trying to provide a rational because then it can be challenged.

The company I co-founded does an RFD process as well and even if there is 99% chance that we're going to use the thing we've always used, if you're a serious person, the act of expressing it is useful and sometimes you even change your own mind thanks to the process.

taspeotis · a year ago

I kagi’d Illumos and apparently Bryan Cantrill was a maintainer.

Bryan Cantrill is CTO of Oxide [1].

I assume that has no bearing on the choice, otherwise it would be mentioned in the discussion.

[1] https://bcantrill.dtrace.org/2019/12/02/the-soul-of-a-new-co...

Early Oxide founders came from Joyent which was an illumos shop and Cantrill is quite vocal about the history of Solaris, OpenSolaris, and illumos.

codetrotter · a year ago

> Joyent which was an illumos shop

And before that, they used to run FreeBSD.

Mentioned for example in this comment by Bryan Cantrill a decade ago:

https://news.ycombinator.com/item?id=6254092

> […] Speaking only for us (I work for Joyent), we have deployed hundreds of thousands of zones into production over the years -- and Joyent was running with FreeBSD jails before that […]

And I’ve seen some other primary sources (people who worked at Joyent) write that online too.

And Bryan Cantrill, and several other people, came from Sun Microsystems to Joyent. Though I’ve never seen it mentioned which order that happened in; was it people from Sun that joined Joyent and then Joyent switched from FreeBSD to Illumos and creating SmartOS? Or had Joyent already switched to Illumos before the people that came from Sun joined?

I would actually really enjoy a long documentary or talk from some people that worked at Joyent about the history of the company, how they were using FreeBSD and when they switched to Illumos and so on.

Bryan Cantrill also ported KVM to Illumos. At Joyent they had plenty of experience with KVM. See:

https://www.youtube.com/watch?v=cwAfJywzk8o

As far as I know, Bryan didn't personally work on the porting of bhyve (this might be wrong).

So if anything, that would point to KVM as the 'familiar' thing given how many former Joyant people were there.

KVM got more and more integrated with the rest of Linux as more virtualization features became general system features (e.g. posted interrupts). Also Google and Amazon are working more upstream and the pace of development increased a lot.

Keeping a KVM port up to date is a huge effort compared to bhyve, and they probably had learnt that in the years between the porting of KVM and the founding of Oxide.

elijahwright · a year ago

Where is Max Bruning these days?

mzi · a year ago

@bcantrill is the CTO of Oxide.

Yup, thanks

gyre007 · a year ago

Yeah I came here to say that Bryan worked at Sun so why do they even need to write this post (yes, I appreciate the techinical reasons, just wanted to highlight the fact via a subtle dig :-))

This isn't a blog post from an Oxide, it's a link to their internal RFD which they use to make decisions.

Dead Comment

tcdent · a year ago

Linux has a rich ecosystem, but the toolkit is haphazard and a little shakey. Sure, everyone uses it, because when we last evaluated our options (in like 2009) it was still the most robust solution. That may no longer be the case.

Given all of that, and taking into account building a product on top of it, and thus needing to support it and stand behind it, Linux wasn't the best choice. Looking ahead (in terms of decades) and not just shipping a product now, it was found that an alternate ecosystem existed to support that.

Culture of the community, design principles, maintainability are all things to consider beyond just "is it popular".

Exciting times in computing once again!

rtpg · a year ago

> There is not a significant difference in functionality between the illumos and FreeBSD implementations, since pulling patches downstream has not been a significant burden. Conversely, the more advanced OS primitives in illumos have resulted in certain bugs being fixed only there, having been difficult to upstream to FreeBSD.

curious about what bugs are being thought of there. Sounds like a very interesting situation to be in

alberth · a year ago

Isn’t it simply Oxide founders are old Sun engineers, and Illumos is the open source spinoff of their old work.

sophacles · a year ago

According to the founders and early engineers on their podcast - no, they tried to fairly evaluate all the oses and were willing to go with other options.

Practically speaking, its hard to do it completely objectively and the in-house expertise probably colored the decision.

pclmulqdq · a year ago

Tried to, sure, but when you evaluate other products strictly against the criteria under which you built your own version, you know what the conclusion will be. Never mind that you are carrying your blind spots with you. I would say that there was an attempt to evaluate other products, but not so much an attempt to be objective in that evaluation.

In general, being on your own private tech island is a tough thing to do, but many engineers would rather do that than swallow their pride.