The gains seem to not have been high enough to sustain that project. Nowadays CPUs plan, fuse and reorder so much of micro-code that lower-level languages can sort of be considered virtual as well.
But Java and similar languages extract more freedom-of-operation from the programmer to the runtime: no memory address shenanigans, richer types, and to some extent immutability and sealed chunks of code. All these could be picked up and turned into more performance by the hardware; with some help from the compiler. Sort of like SQL being a 4th-gen language, letting the runtime collect statistics and chose the best course of execution (if you squint at it in the dark with colored glasses)
More recent work about this is to be found on the RISC-V J extension [1], still to be formalized and picked up by the industry. Three features could help dynamic languages:
* Pointer masking: you can fit a lot in the unused higher bits of an address. Some GCs use them to annotate memory (refered-to/visited/unvisited/etc.), but you have to mask them. A hardware assisted mask could help a lot.
* Memory tagging: Helps with security, helps with bounds-checking
* More control over instruction caches
It is sort of stale at the moment, and if you track down the people working on it they've been reassigned to the AI-accelerator craze. But it's going to come back, as Moore's law continues to end and Java's TCO will again be at the top of the bean-counter's stack.
The Java ecosystems initially started with optimizing Java compilers. That setup could benefit from direct hardware support for Java bytecode. Later, it was discovered that it is more beneficial to remove the optimization from javac in order to provide more context to the JIT compiler. Which enables better optimizations from JIT compilers. By directly running Java bytecode, you would loose so many optimizations done by Hotspot, that it is hard to get on par just by interpreting bytecode in hardware. The story may be different for restricted JVMs that don't have a sophisticated JIT.
As free beer AOT compilers for Java are commonly available, and as shown on Android since version 5, I doubt special opcodes will matter again.
Ironically when one dives into computer archeology, old Assembly languages are occasionally referred as bytecodes, the reason being that in CISC designs with microcoded CPUs they were already seen that way by hardware teams.
I'm still not decided on AOT vs JIT being the endgame.
In theory JIT should be higher performance, because it benefits from statistics taken at actual runtime. Given a smart enough compiler. But as a piece of code matures and gets more stable, the envelope of executions is better known and programmers can encode that at compile-time. That's the tradeoff taken by Rust: ask for more proofs from the programmers, and Rust is continuing to pick up speed.
That's also what the Leyden project / condensers [1] is about, if I understand correctly. Pick up proofs and guarantees as early as possible and transform the program. For example by constant-propagating a configuration file taken up during build-time.
Something I've pondered over the years: a programmer's job is not to produce code. It is to produce proofs and guarantees (yet another digression/rant: generating code was never a problem. Before LLMs we could copy-paste code from StackOverflow just fine)
In the end it's only about marginal improvements though. These could be superseded by changes of paradigm like RAM getting some compute capabilities; or programs being split into a myriad of specialized instructions. For example filters, rules and parsing going inside the network card; SQL projections and filters going into the SSD controller; or matrix-multiplication going into integrated GPU/TPU/etc just like now.
> Pointer masking: you can fit a lot in the unused higher bits of an address. Some GCs use them to annotate memory (refered-to/visited/unvisited/etc.), but you have to mask them. A hardware assisted mask could help a lot.
If you're building hardware masking, it should be viable for low bits too. If you define all your objects to be n-byte aligned, it frees up low bits for things too, and might not be an imposition, things like to be aligned.
Remember when for a while Azul tried to sell custom CPUs to support features in their JVM (e.g. some garbage collector features that required hardware interrupts and some other extra instructions). Although they dropped it pretty quickly in favor of just working on software
My brainfog claims some blurry memories of this ... for one, documentation is lacking so much that an opensource JVM using Jazelle never happened; you wanted to develop a JVM on top of it, you'd pay ARM for docs, professional services, and unit licenses. And second, that once things got to the ARM11 series cores, software JITs beat the cr* out of Jazelle. I don't remember any early Android device ever used it.
I find Java Card pretty puzzling. You go from high-level interpreted languages on powerful servers, to Java and C++ on less powerful devices (like old phones for example), to almost exclusively C on Microcontrollers, and then back to Java again on cards. If. it makes sense to write Java code for a device small enough to draw power from radio waves, why aren't we doing that on microcontrollers?
It executes Java Bytecode. Whether Dalvik VM was/is a "Java" VM is hardly relevant there (not the least because "Java" is so much more than Java Bytecode, and Jazelle does nothing to help with anything on top of the latter).
The performance of Dalvik was far below J2ME on Nokia and Sony Ericsson feature phones for a very long time, and Android relied on pushing a lot to C libraries to compensate.
IIRC jazelle left it to the implementers which bytecodes to handle in HW and what to trap to SW. Since SW JIT beat the Jazelle implementtions, by the arm11 times, the cpu implementers would just leave everything for the SW traps... So while the original raspberry pi was arm1176j and J meant Jazelle support, it was all already hollowed out.
There was a successor ThumbEE ("Execution Environment") that was comprehensively documented. But it didn't get much attention either and later chips removed it.
The trend of posting a page here based on some detail from a comment, that was posted in another thread ("What's that touchscreen in my room?" in this case) a few days ago, has become quite frequent and a bit annoying.
To everyone who wants to write: but I didn't read that thread and I find this quite interesting; you are free to find it interesting, but I did read about it 2 days ago and to me it looks like karma farming.
To me it seems like Hackernews, as a whole, goes off on the same kinds of thought-tangents as I do, and that makes the site more interesting. And I was one of the commenters about Jazelle on the thread you mentioned.
Because this is a community and I care about what goes on in it. I think this is precisely not a thought-tangent, it is taking the tangent that occurred elsewhere and posting an article about it to score internet points. If we keep doing that, this becomes even more of an echo chamber and a very boring place.
Not everyone spends so much time on this site that they can easily spot a post as an extension of a related discussion elsewhere on the site. Someone posted this page, it got upvoted by others who found it interesting, and now it's on the front page. What's wrong with that?
Popular content is not necessarily good content (very boring to say this, but just look at reddit). And posting articles to get upvotes, which I'm not saying this post is necessarily doing but at least _some_ are doing, leads to lower quality. HN barely has any methods for maintaining overall quality of the website and it will automatically degrade as it gets larger.
To simply allow these posts and having them hit the front page when they get upvotes is a valid position. But I think it contributes to a website that is less interesting.
I don't think these posts should be removed, but they should at least be frowned upon, and/or linked to the original comment thread.
Regarding the quality of posts in general the problem is not what gets posted, there is a ton of junk in the "new" queue and most of it never makes it to the front page. It's what gets upvoted.
The M1 emulation with Rosetta is actually dynamic recompilation so of you're measuring only that specific small section it's not surprising that Rosetta could have emitted optimal code for that instruction sequence
That seems very similar to ARM introducing an instruction set that can be used to efficiently implement common Java idioms to me, just that x86 bytecode wasn't explicitly conceived as a VM target.
Running bytecode instructions in hardware essentially means a hardware-based interpreter. It likely would have been the best performing interpreter for the hardware, but JIT-compilation to native code still would run circles around it.
During years when this instruction set was relevant (though apparently unutilized), Oracle still had very limited ARM support for Java SE, so having a fast interpreter could have been desirable -- but it makes no sense on beefier ARM systems that are able to support decent JIT or AOT support available nowadays.
I remember reading about Jazelle many years ago - before the release of the iPhone and suchlike. This was the age when people were coming up with things like 'Java Card' - smartcards programmed directly in Java.
I never heard of anyone actually using Jazelle, though - I assume JIT ended up working better.
I'm a little in the realm of speculation here. Part of the issue with Java for embedded devices was "a bad fit". What made Java thrive in the server or even applet spaces wasn't the instruction set but the rich ecosystem around Java. Yet, threading - as "inherent" to Java it is - is provided by the OS and "only" used/wrapped by the JVM. All the libraries ... megabytes of (useful) software, yet not implemented (nor even helped) by hardware acceleration. The "equivalent" of static linking to minimize the footprint never quite existed.
So on a smartcard ... write software in a (uncommon, and when compared with ARM which is a very "rich" assembly language) form of low-level instruction set, and pay both Sun and ARM top$ for the privilege - nevermind the likely "runtime" footprint far exceeding the 256kB RAM you planned for that 5$ card - why? Writing small singlethreaded software in anything that compiles down to a static ARM binary has been easy and quick enough that going off the ARM instruction set looked pointless for most. And learning which parts of "Java" actually worked in such an environment was hard, even (or especially?) for developers who knew (the strengths of) Java well. Because developers and specifiers expected "rich Java", and couldn't care less about the Bytecode. JITs later only hoovered up the ashes.
IIRC people didn't "really believe" that Java could actually be performant because they assumed that since it has a JIT layer, it would never even get close to native code.
But the reality was that JIT allows code to get faster over time, as the JIT improves.
Things like Jazelle let chip manufacturers paper over a paper objection.
> But the reality was that JIT allows code to get faster over time, as the JIT improves.
Ehh .. PGO is only somewhat better for JIT than AOT. More often for purely-numerical code the win is because the AOT doesn't do per-machine `-march=native`. It's the memory model that kills JVM performance for any nontrivial app though.
Specialized hardware has been losing out against general for years.
There were those "LISP machines" in the early 1980s but when Common Lisp was designed they made sure it could be implemented efficiently on emerging 32-bit machines.
Java Card is still around! There's a very high chance it's running on more than one chip inside your phone right now, and at least on one card in your wallet every time you use it.
I think many smartcards, including SIMs, are still programmed in Java. Seems like it is the only standard of programming smartcards that was ever developed
A little related, back in the day Sun Microsystems came up with picoJava, https://en.wikipedia.org/wiki/PicoJava, a full microprocessor specification dedicated to native execution of java bytecode. It never really went anywhere, other than a few engineering experiments, as far as I remember.
For a while Linus Torvalds, of the Linux kernel fame, worked for a company called Transmeta, https://en.wikipedia.org/wiki/Transmeta, who were doing some really interesting things. They were aiming to make a highly efficient processor, that could handle x86 through a special software translation layer. One of the languages they could support was picoJava. IIRC, the processor was never designed to run operating systems etc. natively. The intent was always to have it work through the translation layer, something that could easily be patched and updated to add support for any x86 extensions that Intel or AMD might introduce.
But Java and similar languages extract more freedom-of-operation from the programmer to the runtime: no memory address shenanigans, richer types, and to some extent immutability and sealed chunks of code. All these could be picked up and turned into more performance by the hardware; with some help from the compiler. Sort of like SQL being a 4th-gen language, letting the runtime collect statistics and chose the best course of execution (if you squint at it in the dark with colored glasses)
More recent work about this is to be found on the RISC-V J extension [1], still to be formalized and picked up by the industry. Three features could help dynamic languages:
* Pointer masking: you can fit a lot in the unused higher bits of an address. Some GCs use them to annotate memory (refered-to/visited/unvisited/etc.), but you have to mask them. A hardware assisted mask could help a lot.
* Memory tagging: Helps with security, helps with bounds-checking
* More control over instruction caches
It is sort of stale at the moment, and if you track down the people working on it they've been reassigned to the AI-accelerator craze. But it's going to come back, as Moore's law continues to end and Java's TCO will again be at the top of the bean-counter's stack.
[1] https://github.com/riscv/riscv-j-extension
Java itself got very good. Though Oracle was blocked to leech money, or have return for their investment, depending on the viewpoint.
Ironically when one dives into computer archeology, old Assembly languages are occasionally referred as bytecodes, the reason being that in CISC designs with microcoded CPUs they were already seen that way by hardware teams.
In theory JIT should be higher performance, because it benefits from statistics taken at actual runtime. Given a smart enough compiler. But as a piece of code matures and gets more stable, the envelope of executions is better known and programmers can encode that at compile-time. That's the tradeoff taken by Rust: ask for more proofs from the programmers, and Rust is continuing to pick up speed.
That's also what the Leyden project / condensers [1] is about, if I understand correctly. Pick up proofs and guarantees as early as possible and transform the program. For example by constant-propagating a configuration file taken up during build-time.
Something I've pondered over the years: a programmer's job is not to produce code. It is to produce proofs and guarantees (yet another digression/rant: generating code was never a problem. Before LLMs we could copy-paste code from StackOverflow just fine)
In the end it's only about marginal improvements though. These could be superseded by changes of paradigm like RAM getting some compute capabilities; or programs being split into a myriad of specialized instructions. For example filters, rules and parsing going inside the network card; SQL projections and filters going into the SSD controller; or matrix-multiplication going into integrated GPU/TPU/etc just like now.
[1] https://openjdk.org/projects/leyden/notes/03-toward-condense...
If you're building hardware masking, it should be viable for low bits too. If you define all your objects to be n-byte aligned, it frees up low bits for things too, and might not be an imposition, things like to be aligned.
https://www.cpushack.com/2016/05/21/azul-systems-vega-3-54-c...
more like Wirths law proving itself still
ARM is quite capable in vapourware generation. 64bit ARM was press-released (https://www.zdnet.com/article/arm-to-unleash-64-bit-jaguar-f...) a decade before ARMv8 / aarch64 became a thing.
(I'd love to learn more)
Relegated to the dustbin of history.
I find Java Card pretty puzzling. You go from high-level interpreted languages on powerful servers, to Java and C++ on less powerful devices (like old phones for example), to almost exclusively C on Microcontrollers, and then back to Java again on cards. If. it makes sense to write Java code for a device small enough to draw power from radio waves, why aren't we doing that on microcontrollers?
More from Ars (1999) https://archive.arstechnica.com/cpu/4q99/majc/majc-1.html
[0] https://en.wikipedia.org/wiki/MAJC
It couldn't have, as Dalvik VM is distinct from JVM.
ART is another matter, though.
To everyone who wants to write: but I didn't read that thread and I find this quite interesting; you are free to find it interesting, but I did read about it 2 days ago and to me it looks like karma farming.
To me it seems like Hackernews, as a whole, goes off on the same kinds of thought-tangents as I do, and that makes the site more interesting. And I was one of the commenters about Jazelle on the thread you mentioned.
To simply allow these posts and having them hit the front page when they get upvotes is a valid position. But I think it contributes to a website that is less interesting.
I don't think these posts should be removed, but they should at least be frowned upon, and/or linked to the original comment thread.
Deleted Comment
Retaining and releasing an NSObject took ~6.5 nanoseconds on the M1 when it came out, comparing with ~30 nanoseconds on the equiv gen Intel.
In fact, the M1 _emulated_ an Intel retaining and releasing an NSObject fast than an Intel could!
One source: https://daringfireball.net/2020/11/the_m1_macs
During years when this instruction set was relevant (though apparently unutilized), Oracle still had very limited ARM support for Java SE, so having a fast interpreter could have been desirable -- but it makes no sense on beefier ARM systems that are able to support decent JIT or AOT support available nowadays.
I never heard of anyone actually using Jazelle, though - I assume JIT ended up working better.
So on a smartcard ... write software in a (uncommon, and when compared with ARM which is a very "rich" assembly language) form of low-level instruction set, and pay both Sun and ARM top$ for the privilege - nevermind the likely "runtime" footprint far exceeding the 256kB RAM you planned for that 5$ card - why? Writing small singlethreaded software in anything that compiles down to a static ARM binary has been easy and quick enough that going off the ARM instruction set looked pointless for most. And learning which parts of "Java" actually worked in such an environment was hard, even (or especially?) for developers who knew (the strengths of) Java well. Because developers and specifiers expected "rich Java", and couldn't care less about the Bytecode. JITs later only hoovered up the ashes.
https://www.ptc.com/en/products/developer-tools/perc
https://www.aicas.com/wp/
https://www.microej.com/
https://en.wikipedia.org/wiki/BD-J
https://www.thalesgroup.com/en/markets/digital-identity-and-...
But the reality was that JIT allows code to get faster over time, as the JIT improves.
Things like Jazelle let chip manufacturers paper over a paper objection.
Ehh .. PGO is only somewhat better for JIT than AOT. More often for purely-numerical code the win is because the AOT doesn't do per-machine `-march=native`. It's the memory model that kills JVM performance for any nontrivial app though.
There were those "LISP machines" in the early 1980s but when Common Lisp was designed they made sure it could be implemented efficiently on emerging 32-bit machines.
For a while Linus Torvalds, of the Linux kernel fame, worked for a company called Transmeta, https://en.wikipedia.org/wiki/Transmeta, who were doing some really interesting things. They were aiming to make a highly efficient processor, that could handle x86 through a special software translation layer. One of the languages they could support was picoJava. IIRC, the processor was never designed to run operating systems etc. natively. The intent was always to have it work through the translation layer, something that could easily be patched and updated to add support for any x86 extensions that Intel or AMD might introduce.