Jazelle DBX: Allow ARM processors to execute Java bytecode in hardware

The gains seem to not have been high enough to sustain that project. Nowadays CPUs plan, fuse and reorder so much of micro-code that lower-level languages can sort of be considered virtual as well.

But Java and similar languages extract more freedom-of-operation from the programmer to the runtime: no memory address shenanigans, richer types, and to some extent immutability and sealed chunks of code. All these could be picked up and turned into more performance by the hardware; with some help from the compiler. Sort of like SQL being a 4th-gen language, letting the runtime collect statistics and chose the best course of execution (if you squint at it in the dark with colored glasses)

More recent work about this is to be found on the RISC-V J extension [1], still to be formalized and picked up by the industry. Three features could help dynamic languages:

* Pointer masking: you can fit a lot in the unused higher bits of an address. Some GCs use them to annotate memory (refered-to/visited/unvisited/etc.), but you have to mask them. A hardware assisted mask could help a lot.

* Memory tagging: Helps with security, helps with bounds-checking

* More control over instruction caches

It is sort of stale at the moment, and if you track down the people working on it they've been reassigned to the AI-accelerator craze. But it's going to come back, as Moore's law continues to end and Java's TCO will again be at the top of the bean-counter's stack.

[1] https://github.com/riscv/riscv-j-extension

funcDropShadow · 2 years ago

The Java ecosystems initially started with optimizing Java compilers. That setup could benefit from direct hardware support for Java bytecode. Later, it was discovered that it is more beneficial to remove the optimization from javac in order to provide more context to the JIT compiler. Which enables better optimizations from JIT compilers. By directly running Java bytecode, you would loose so many optimizations done by Hotspot, that it is hard to get on par just by interpreting bytecode in hardware. The story may be different for restricted JVMs that don't have a sophisticated JIT.

miohtama · 2 years ago

The current (largest) end-user Java ecosystem is in practice Android and it ahead-of-time compiling ART.

Java itself got very good. Though Oracle was blocked to leech money, or have return for their investment, depending on the viewpoint.

pjmlp · 2 years ago

As free beer AOT compilers for Java are commonly available, and as shown on Android since version 5, I doubt special opcodes will matter again.

Ironically when one dives into computer archeology, old Assembly languages are occasionally referred as bytecodes, the reason being that in CISC designs with microcoded CPUs they were already seen that way by hardware teams.

BenoitP · 2 years ago

I'm still not decided on AOT vs JIT being the endgame.

In theory JIT should be higher performance, because it benefits from statistics taken at actual runtime. Given a smart enough compiler. But as a piece of code matures and gets more stable, the envelope of executions is better known and programmers can encode that at compile-time. That's the tradeoff taken by Rust: ask for more proofs from the programmers, and Rust is continuing to pick up speed.

That's also what the Leyden project / condensers [1] is about, if I understand correctly. Pick up proofs and guarantees as early as possible and transform the program. For example by constant-propagating a configuration file taken up during build-time.

Something I've pondered over the years: a programmer's job is not to produce code. It is to produce proofs and guarantees (yet another digression/rant: generating code was never a problem. Before LLMs we could copy-paste code from StackOverflow just fine)

In the end it's only about marginal improvements though. These could be superseded by changes of paradigm like RAM getting some compute capabilities; or programs being split into a myriad of specialized instructions. For example filters, rules and parsing going inside the network card; SQL projections and filters going into the SSD controller; or matrix-multiplication going into integrated GPU/TPU/etc just like now.

[1] https://openjdk.org/projects/leyden/notes/03-toward-condense...

toast0 · 2 years ago

> Pointer masking: you can fit a lot in the unused higher bits of an address. Some GCs use them to annotate memory (refered-to/visited/unvisited/etc.), but you have to mask them. A hardware assisted mask could help a lot.

If you're building hardware masking, it should be viable for low bits too. If you define all your objects to be n-byte aligned, it frees up low bits for things too, and might not be an imposition, things like to be aligned.

ithkuil · 2 years ago

The sparc ISA had tagged arithmetic instructions so that you could tag integers using LSBs and ignore them

vextea · 2 years ago

Remember when for a while Azul tried to sell custom CPUs to support features in their JVM (e.g. some garbage collector features that required hardware interrupts and some other extra instructions). Although they dropped it pretty quickly in favor of just working on software

https://www.cpushack.com/2016/05/21/azul-systems-vega-3-54-c...

sillywalk · 2 years ago

IBM's Z14 (and later I assume) supported Guarded Storage Facility for 'pauseless Java Garbage collection.'

pjc50 · 2 years ago

One of the few elements left like this is the ARM Javascript instruction: https://news.ycombinator.com/item?id=24808207

lionkor · 2 years ago

> as Moore's law continues to end

more like Wirths law proving itself still

I remember reading about Jazelle many years ago - before the release of the iPhone and suchlike. This was the age when people were coming up with things like 'Java Card' - smartcards programmed directly in Java.

I never heard of anyone actually using Jazelle, though - I assume JIT ended up working better.

fch42 · 2 years ago

I'm a little in the realm of speculation here. Part of the issue with Java for embedded devices was "a bad fit". What made Java thrive in the server or even applet spaces wasn't the instruction set but the rich ecosystem around Java. Yet, threading - as "inherent" to Java it is - is provided by the OS and "only" used/wrapped by the JVM. All the libraries ... megabytes of (useful) software, yet not implemented (nor even helped) by hardware acceleration. The "equivalent" of static linking to minimize the footprint never quite existed.

So on a smartcard ... write software in a (uncommon, and when compared with ARM which is a very "rich" assembly language) form of low-level instruction set, and pay both Sun and ARM top$ for the privilege - nevermind the likely "runtime" footprint far exceeding the 256kB RAM you planned for that 5$ card - why? Writing small singlethreaded software in anything that compiles down to a static ARM binary has been easy and quick enough that going off the ARM instruction set looked pointless for most. And learning which parts of "Java" actually worked in such an environment was hard, even (or especially?) for developers who knew (the strengths of) Java well. Because developers and specifiers expected "rich Java", and couldn't care less about the Bytecode. JITs later only hoovered up the ashes.

pjmlp · 2 years ago

Java is doing just fine in embedded devices.

https://www.ptc.com/en/products/developer-tools/perc

https://www.aicas.com/wp/

https://www.microej.com/

https://en.wikipedia.org/wiki/BD-J

https://www.thalesgroup.com/en/markets/digital-identity-and-...

bombcar · 2 years ago

IIRC people didn't "really believe" that Java could actually be performant because they assumed that since it has a JIT layer, it would never even get close to native code.

But the reality was that JIT allows code to get faster over time, as the JIT improves.

Things like Jazelle let chip manufacturers paper over a paper objection.

o11c · 2 years ago

> But the reality was that JIT allows code to get faster over time, as the JIT improves.

Ehh .. PGO is only somewhat better for JIT than AOT. More often for purely-numerical code the win is because the AOT doesn't do per-machine `-march=native`. It's the memory model that kills JVM performance for any nontrivial app though.

PaulHoule · 2 years ago

Specialized hardware has been losing out against general for years.

There were those "LISP machines" in the early 1980s but when Common Lisp was designed they made sure it could be implemented efficiently on emerging 32-bit machines.

lxgr · 2 years ago

Java Card is still around! There's a very high chance it's running on more than one chip inside your phone right now, and at least on one card in your wallet every time you use it.

balou23 · 2 years ago

I think many smartcards, including SIMs, are still programmed in Java. Seems like it is the only standard of programming smartcards that was ever developed

My brainfog claims some blurry memories of this ... for one, documentation is lacking so much that an opensource JVM using Jazelle never happened; you wanted to develop a JVM on top of it, you'd pay ARM for docs, professional services, and unit licenses. And second, that once things got to the ARM11 series cores, software JITs beat the cr* out of Jazelle. I don't remember any early Android device ever used it.

ARM is quite capable in vapourware generation. 64bit ARM was press-released (https://www.zdnet.com/article/arm-to-unleash-64-bit-jaguar-f...) a decade before ARMv8 / aarch64 became a thing.

(I'd love to learn more)

jaywee · 2 years ago

Sun wanted to do the same thing in late 90ties - picoJAVA (embedded), microJava and UltraJava (VLIW workstations).

Relegated to the dustbin of history.

miki123211 · 2 years ago

Java Card still survives, though.

I find Java Card pretty puzzling. You go from high-level interpreted languages on powerful servers, to Java and C++ on less powerful devices (like old phones for example), to almost exclusively C on Microcontrollers, and then back to Java again on cards. If. it makes sense to write Java code for a device small enough to draw power from radio waves, why aren't we doing that on microcontrollers?

FYI UltraJava was renamed to MAJC[0] which IIRC was only used in Sun's XVR graphics cards.

[0] https://en.wikipedia.org/wiki/MAJC

Vogtinator · 2 years ago

There is a bit of info including example code on https://hackspire.org/index.php/Jazelle

RicoElectrico · 2 years ago

> I don't remember any early Android device ever used it.

It couldn't have, as Dalvik VM is distinct from JVM.

It executes Java Bytecode. Whether Dalvik VM was/is a "Java" VM is hardly relevant there (not the least because "Java" is so much more than Java Bytecode, and Jazelle does nothing to help with anything on top of the latter).

fidotron · 2 years ago

The performance of Dalvik was far below J2ME on Nokia and Sony Ericsson feature phones for a very long time, and Android relied on pushing a lot to C libraries to compensate.

As Nokia alumni it was incredible how much of the Google fanbase believed in Dalvik's performance fairy tail.

ART is another matter, though.

Sure, but J2ME can't seek backwards in open files. (That was added in Java 1.4, and J2ME is 1.3)

happosai · 2 years ago

IIRC jazelle left it to the implementers which bytecodes to handle in HW and what to trap to SW. Since SW JIT beat the Jazelle implementtions, by the arm11 times, the cpu implementers would just leave everything for the SW traps... So while the original raspberry pi was arm1176j and J meant Jazelle support, it was all already hollowed out.

zozbot234 · 2 years ago

There was a successor ThumbEE ("Execution Environment") that was comprehensively documented. But it didn't get much attention either and later chips removed it.

depr · 2 years ago

The trend of posting a page here based on some detail from a comment, that was posted in another thread ("What's that touchscreen in my room?" in this case) a few days ago, has become quite frequent and a bit annoying.

To everyone who wants to write: but I didn't read that thread and I find this quite interesting; you are free to find it interesting, but I did read about it 2 days ago and to me it looks like karma farming.

bitwize · 2 years ago

Why you gotta yuck other people's yum, man?

To me it seems like Hackernews, as a whole, goes off on the same kinds of thought-tangents as I do, and that makes the site more interesting. And I was one of the commenters about Jazelle on the thread you mentioned.

Because this is a community and I care about what goes on in it. I think this is precisely not a thought-tangent, it is taking the tangent that occurred elsewhere and posting an article about it to score internet points. If we keep doing that, this becomes even more of an echo chamber and a very boring place.

czscout · 2 years ago

Not everyone spends so much time on this site that they can easily spot a post as an extension of a related discussion elsewhere on the site. Someone posted this page, it got upvoted by others who found it interesting, and now it's on the front page. What's wrong with that?

Popular content is not necessarily good content (very boring to say this, but just look at reddit). And posting articles to get upvotes, which I'm not saying this post is necessarily doing but at least _some_ are doing, leads to lower quality. HN barely has any methods for maintaining overall quality of the website and it will automatically degrade as it gets larger.

To simply allow these posts and having them hit the front page when they get upvotes is a valid position. But I think it contributes to a website that is less interesting.

I don't think these posts should be removed, but they should at least be frowned upon, and/or linked to the original comment thread.

donio · 2 years ago

Regarding the quality of posts in general the problem is not what gets posted, there is a ton of junk in the "new" queue and most of it never makes it to the front page. It's what gets upvoted.

icegreentea2 · 2 years ago

I think attribution would be nice. Both from a honesty standpoint, but also generally useful.

Deleted Comment

willvarfar · 2 years ago

In a similar spirit, Apple seems to have made sure some critical OSX idioms were fast on the M1, perhaps even influencing their instruction set.

Retaining and releasing an NSObject took ~6.5 nanoseconds on the M1 when it came out, comparing with ~30 nanoseconds on the equiv gen Intel.

In fact, the M1 _emulated_ an Intel retaining and releasing an NSObject fast than an Intel could!

One source: https://daringfireball.net/2020/11/the_m1_macs

The M1 emulation with Rosetta is actually dynamic recompilation so of you're measuring only that specific small section it's not surprising that Rosetta could have emitted optimal code for that instruction sequence

That seems very similar to ARM introducing an instruction set that can be used to efficiently implement common Java idioms to me, just that x86 bytecode wasn't explicitly conceived as a VM target.

nadavwr · 2 years ago

Running bytecode instructions in hardware essentially means a hardware-based interpreter. It likely would have been the best performing interpreter for the hardware, but JIT-compilation to native code still would run circles around it.

During years when this instruction set was relevant (though apparently unutilized), Oracle still had very limited ARM support for Java SE, so having a fast interpreter could have been desirable -- but it makes no sense on beefier ARM systems that are able to support decent JIT or AOT support available nowadays.

michaelt · 2 years ago

_chu1 · 2 years ago

Fun fact, both the Wii's seconday ARM chip used for security tasks and the iPhone 2G's processors had Jazelle but never used them.

repiret · 2 years ago

It was on every Arm926, 1136 and 1176. Lots of devices of a certain era had it but didn’t use it.

khangaroo · 2 years ago

The 3DS has it too: https://github.com/SonoSooS/libjz

Twirrim · 2 years ago

A little related, back in the day Sun Microsystems came up with picoJava, https://en.wikipedia.org/wiki/PicoJava, a full microprocessor specification dedicated to native execution of java bytecode. It never really went anywhere, other than a few engineering experiments, as far as I remember.

For a while Linus Torvalds, of the Linux kernel fame, worked for a company called Transmeta, https://en.wikipedia.org/wiki/Transmeta, who were doing some really interesting things. They were aiming to make a highly efficient processor, that could handle x86 through a special software translation layer. One of the languages they could support was picoJava. IIRC, the processor was never designed to run operating systems etc. natively. The intent was always to have it work through the translation layer, something that could easily be patched and updated to add support for any x86 extensions that Intel or AMD might introduce.