macOS 14.4 causes JVM crashes

“The Java Virtual Machine […] leverages the protected memory access signal mechanism both for correctness (e.g., to handle the truncation of memory mapped files) and for performance.”

Where by “protected memory access signal mechanism”, they mean SIGBUS/SIGSEGV, i.e., a segfault.

This is probably because the JVM is doing “zero cost access checks”, which is where you do the moral equivalent of:

    try {
      writeToFile()
    } catch(err) {
      if (err == SYSTEM_CRASH_IMMINENT) {
        changeFilePermissions()
        retry
      }
    }

…because it’s faster than checking file permissions before every write. (It’s a common pattern in systems programming, so it’s not quite as crazy as it sounds.)

I guess my opinion on this is that if you write your program to intentionally trigger and ignore kill(10) / kill(11) from the host OS, for the sake of a speed boost, you can’t really get too mad when the host OS gets fed up and starts sending kill(9) instead.

I also wonder what happens in the (extremely rare) case where the signal the JVM is trapping is a real segfault, and not an operating system signal.

dzaima · a year ago

This isn't about files, this is about plain pages of RAM[0]. It is a basic CPU operation to trap on unmapped pages, and OSes rightfully expose this useful feature (in addition to using it themselves), allowing processes to do many things, from lazily-computed memory regions to removing significant amounts of overhead doing a thing the CPU will inevitably do itself anyway.

I believe the "the truncation of memory mapped files" section is for when the Java process memory-maps a file (as Java provides memory-mapping operations in its standard library, and probably also uses them itself), and afterwards some other unrelated process truncates the file, resulting in the OS quietly making (parts of) the mappings inaccessible. Here the process couldn't even check the permissions before reading (never mind how utterly hilariously inefficient that would be, defeating the purpose of memory-mapping) as the mappings could change between the check and subsequent read anyway.

[0]: https://bugs.java.com/bugdatabase/view_bug?bug_id=8327860, "I've managed to narrow this down to this small reproducer:" section

Jtsummers · a year ago

And it's worth noting that while man mmap on macOS doesn't indicate what happens when the protections are violated (that is, if you try to read, write, or execute in violation of the set protections) the related function mprotect has this to say in macOS 14.3 (what I have available):

> When a program violates the protections of a page, it gets a SIGBUS or SIGSEGV signal.

(The Linux man pages for mmap and mprotect indicates SIGSEGV would be signaled.)

So the past use and assumption (SIGSEGV or SIGBUS) are consistent with the expectations of mmap and mprotect given the documentation provided.

fwlr · a year ago

You are of course completely correct.

However, I still stand by my pseudocode - I claim that it will give a fairly accurate impression of the basic concept of zero-cost access checks to a reader who isn’t familiar with low-level systems programming. (That said, I have updated my comment to make it clear it’s more of a metaphor than a literal description.)

mrlsph · a year ago

A talk at FOSDEM this year [0] describes how the OpenJDK JVM relies on triggering SIGSEGVs in order to efficiently implement thread-local safepoint checks - I wonder if that would also be affected?

[0]: https://mostlynerdless.de/blog/2023/07/31/the-inner-workings...

kaba0 · a year ago

> I also wonder what happens in the (extremely rare) case where the signal the JVM is trapping is a real segfault, and not an operating system signal.

Just an educated guess, but the JVM knows if a thread may expect a segfault at a given point or not. If no thread expects one, then I assume the segfault handler just writes out that a segfault happened with some useful info, and terminates the program. I mean, I’m sure about the effect as I have caused a JVM to segfault a couple of times with native memory, so it handles it as expected.

> "As a normal part of the just-in-time compile and execute cycle, processes running on macOS may access memory in protected memory regions."

I'm just a lowly JavaScript/TypeScript/PHP programmer, but what is the Very Good Reason that Java trying to access other processes' memory?

mayoff · a year ago

I don’t think the article claims that a Java process tries to access some other process’s memory.

In a typical modern operating system, a memory page can be non-writable and non-executable, writable and non-executable, or non-writable and executable, but not simultaneously writable AND executable.

If you generate executable code at runtime, then you need write access to a page to write the executable code into that page. Then you need to tell the operating system to change the page from writable to executable.

If you then try to write to the page, you’ll get a signal (SIGSEGV or SIGBUS, according to the article).

Oracle’s JVM apparently relies on this behavior: a Java process sometimes tries to write to a page (in its own memory space) that is not marked writable. The JVM then catches the SIGSEGV and recovers (perhaps by asking the operating system to change the page back from executable to writable, or by arranging to write to a different page, or to abort the write operation altogether).

Traubenfuchs · a year ago

Thank you, that explained it way better than the original link.

scialex · a year ago

It's not. It's trying to access unmapped or protected memory in its own process.

Basically what its used for is to implement an 'if' that's super fast on the most likely path but super slow on the less likely path.

It's not super clear what its being used for (this is often used for the GC but the fact that graal isn't affected means that likely still works). Possibly they are using this to detect attempts to use inline-cache entries that have been deleted.

moonchild · a year ago

object.field is implemented as a direct load from the object; if the object turned out to be null, then the resultant signal is caught and turned into a NullPointerException

toast0 · a year ago

In a virtual memory operating system, every program has its own address space. Accessing an unmapped address is not the same as trying to access another process's memory.

It's also pretty common to use memory protection to autoextend stacks... Allocate the stack size you need, ask the OS to mark the page(s) after the stack as protected, catch the signal when you hit the protection, allocate some more stack and a new protected page unless the stack is too big. Works for heaps too.

Let the MMU hardware check accesses, so you don't have to check everything in software all the time.

olliej · a year ago

It depends on exactly what is being done.

A fairly common idiom is to use memory protection to provide zero cost access checks, as you can generally catch the signals produced by most memory faults, and then work out where things went wrong and convert the memory access error into a catchable exception, or to lazily construct data structures or code.

So you want the trap, but the trap itself can be handled. It sounds like there’s been a semantic change when the trap occurs for execution of an address or an access to an executable page.

There are also a bunch of poorly documented Mac APIs to inform the memory manager and linker about JIT regions and I wonder if it’s related to those. It really depends on exactly what oracle’s jvm is trying to do, and what the subsequent cause of the fault is.

Certainly it’s a less than optimal failure though :-/

royjacobs · a year ago

The reasons are literally spelled out in the following paragraphs.

CharlesW · a year ago

I’m asking because the reasons seem dumb to me, which is why I’m asking people smarter than I am about low-level memory management if they’re legitimate.

samus · a year ago

Accessing such areas is sometimes done deliberately since programmers could rely on the OS telling them what just happened using signals instead of nuking the process wholesale. Doing it without signals is usually slow and/or clunky (null-pointer checks, read/write permissions, existence of pages), or straight out impossible.

Accessing other processes' memory is not the concern since virtual memory provides each process the illusion of having the entire address space for itself.

riscy · a year ago

> macOS on Apple silicon processors (M1, M2, and M3) includes a feature which controls how and when dynamically generated code can be either produced (written) or executed on a per-thread basis. […] With macOS 14.4, when a thread is operating in the write mode, if a memory access to a protected memory region is attempted, macOS will send the signal SIGKILL instead.

This isn’t just any old thread triggering SIGKILL, it’s the JIT thread privileged to write to executable pages that is performing illegal memory accesses. That’s typically a sign of a bug, and allowing a thread with write access to executable pages to continue executing after that is a security risk.

But I know of other language runtimes that take advantage of installing signal handlers for SIGBUS/SIGSEGV to detect when they overflow a page so they can allocate more memory, etc. This saves from having to do an explicit overflow check on every allocation. Those threads aren’t given privilege to write to executable memory, so they’re not seeing this issue…

So this sounds like a narrow design problem the JVM is facing with their JIT thread. This blog doesn’t explain why their JIT thread needs to make illegal memory accesses instead of an explicit check.

Reason077 · a year ago

> "This blog doesn’t explain why their JIT thread needs to make illegal memory accesses instead of an explicit check."

Because explicit checks on every memory access (pointer dereference) makes Java significantly slower, even with compiler optimisations to remove redundant checks[1]. Memory protection is a fundamental, very useful, hardware feature and it's perfectly reasonable for user space language runtimes to take advantage of it.

Or, to put it another way, SIGSEGV has been a part of Unix-family OSes for decades. It works perfectly fine on Linux and Windows and there's no reason it shouldn't work on macOS.

[1] (Many years ago I worked on a cross-platform implementation of the Java runtime and wrote much of the threads and signal handling code. We had an option to enable explicit memory checks, which got us up and running faster on new platforms where the SIGSEGV handlers hadn't been written yet. From memory this made everything something like 30-50% slower, so it was definitely worthwhile to implement SIGSEGV handling. In our case SIGSEGV handlers were used both as part of the garbage collector/memory management and to implement Java's NullPointerException)

destring · a year ago

As Linus famously said: Shut. Up. Don’t break userspace and then blame the user.

https://lkml.org/lkml/2012/12/23/75

extraduder_ire · a year ago

Linux did break adobe flash when it used memmove like memcopy after fixing a kernel bug. Can't think of any other examples though.

stephenr · a year ago

I feel like preventing illegal writes to protected memory is less "breaking user space" and more "protecting all space".

This is like arguing to allow the guy who can't drive and just pin-balls his way down the freeway bouncing of other cars, because to prevent him from driving would be to take away his personal freedoms.

amelius · a year ago

So MacOS is trying to be smart, changes their API, and now we're blaming the JVM for doing something we don't understand?

At least they could have provided a path back to the old behavior.

chaostheory · a year ago

That’s how MS works which leads to compatibility, but less stability. Historically with Apple, it’s their way or the highway. Less compatibility, but the OS is more stable.

hbbio · a year ago

Sorry, that's not how security works.

sunshinerag · a year ago

macOS is trying to keep its systems safe. Can’t leave the back door open for few who were used to it.

LadyCailin · a year ago

It said it affected back to Java 8, so seems like this design has been there for a while, and since older versions are EOL, any Java level fix would not be patched back.

flohofwoe · a year ago

I wonder what that means for the Android SDK, which AFAIK requires an ancient Java8 runtime for the command line SDK tools on macOS.

beeboobaa · a year ago

If something works on OS release version 1 then it should still work on OS release version 2.

Or in apple vernacular, it should just work.

w10-1 · a year ago

"The issue was not present in the early access releases for macOS 14.4, so it was discovered only after Apple released the update."

I wonder if Oracle really didn't know beforehand.

Apple has long been telling people (writing JITs) that to write to executable memory, they need the correct entitlements (com.apple.security.cs.allow-jit, allow-unsigned--executable-memory, and or/ .disable-executable-page-protection). I wonder if Oracle has been ignoring them, satisfied with the signal-handler workaround, and Apple finally enforced their policy.

Apple also expects that developers deploying apps on MacOS that use Java have these entitlements configured on a per-app basis. Oracle likely objects that this is not really for the application developer to certify, since it's pretty much out of their control.

In any case, I'm doubting Oracle's release is the whole truth.

> Apple has long been telling people (writing JITs) that to write to executable memory, they need the correct entitlements (com.apple.security.cs.allow-jit, allow-unsigned--executable-memory, and or/ .disable-executable-page-protection). I wonder if Oracle has been ignoring them, satisfied with the signal-handler workaround, and Apple finally enforced their policy.

As far as I understand, that’s not the issue, the JIT itself works just fine. The JVM just uses the (quite common) trick that it doesn’t actually bound check everything, but let’s the hardware trigger an interrupt, expecting that to “bubble up” to the program at hand, so it can handle certain cases “for free”. This behavior was changed by apple, which causes issues.

exabrial · a year ago

Why not just let it bubble up from the hardware? Seems like a redundant thing to build into the kernel

vips7L · a year ago

This is honestly a wild and out there claim. The OpenJdk team would never want to see this happen to their user base. They’re some of the most professional programmers I’ve ever seen.

The whole truth is that the Apple kernel team broke user space.

zx8080 · a year ago

The main question now is why hasn't it been exposed in pre-release 14.4. This could mean some very urgent and risky change got its way to the 14.4 release, or that the whole macos release process is broken and unstable.

pier25 · a year ago

Amazing that Apple introduced a breaking change in a .4 release. Probably a mistake?

Also amazing it wasn't caught during the beta period.

empthought · a year ago

Apple has never been a follower of semantic versioning.

lloeki · a year ago

nitpick: Apple doesn't follow SemVer 2.0, but they do have a semantic versioning scheme, that is, the version components carry a certain semantic, it's just so that this semantic is different than the semantic defined by the SemVer 2.0 specification.

One can have any sort of semantic versioning that is not SemVer 2.0 compliant and still be useful, see e.g Rails or Ruby.

Even .Net assemblies are not SemVer 2.0 compliant: their pattern is maj.min.patch.build but SemVer 2.0 specifies that there can only be three conponents and build info must be behind a plus, like maj.min.patch+build

goosedragons · a year ago

It wasn't in the public beta according to Oracle.

mvdtnz · a year ago

This kind of behaviour is very common from Apple.

8crazyideas · a year ago

I just bought a MacBook Pro with the M3 Max chip and installed MATLAB R2023b. Sonoma 14.3 is in place. As a requirement, I had to also install Corretto 8. MathWorks only supports the Java 8 JRE included with Amazon Corretto 8. I am already having several problems in MATLAB with his new setup. Can I assume that updating to Sonoma 14.4 might very well cause even more problems? I really don't understand any of this.

xcv123 · a year ago

".. is affecting all Java versions from Java 8 to the early access builds of JDK 22. There is no workaround available .."

Do not update until Apple fixes the issue.

e40 · a year ago

Isn’t this something Oracle will be fixing? Seems like it from other comments here.

ls612 · a year ago

Is it Apple or Oracle who should rightly be fixing this issue?

dimask · a year ago

I would not update for the time being. If it works it will probably break, if it is broken it may break more.

Btw what sort of problems are you facing? I have had problems with closing figures, but figured it out eventually with a workaround [0].

[0] https://se.mathworks.com/matlabcentral/answers/2027964-matla...

tebruno99 · a year ago

It is always funny to Me when Apple zealots come into threads blaming everyone but Apple that software broke. Complaining Java doesn’t follow Apple standards or some crap. Then 9 days later Apple issues a fix because they did indeed break it.

Yes, you mean: https://support.apple.com/en-us/109035

Can you tell from this or any other Oracle bug whether Apple is bending its rules for Java? I can't tell either way.