I have written a JVM in Rust

I have a few questions about the garbage collection. One of the hard parts of implementing a garbage collector is making sure everything is properly rooted (especially with a moving collector). you have the `do_garbage_collection` method marked unsafe[1], but don't explain what the calling code needs to do to ensure it is safe to call. How do you ensure all references to the heap are rooted? This is not a trivial problem[2][3][4].

Also note that I cloned the repo and tried to run `cargo test` every test fails with 'should be able to add entries to the classpath: InvalidEntry(".../vm/rt.jar")' vm/tests/integration/real_code_tests.rs:15:10

[1] https://github.com/andreabergia/rjvm/blob/be9c54066c64a82879...

[2] https://manishearth.github.io/blog/2021/04/05/a-tour-of-safe...

[3] https://without.boats/blog/shifgrethor-iii/

[4] https://coredumped.dev/2022/04/11/implementing-a-safe-garbag...

munificent · 2 years ago

It's pretty straightforward. Their VM maintains its own notion of a callstack instead of using the native callstack. That lets them iterate over it and find all of the parameters and locals on the VM's callstack and use them as roots.

There is a performance cost for a VM having its own virtual callstacks like this, but it makes GC tracing much simpler. (It also makes implementing interesting concurrency and control flow primitives like coroutines or continuations much easier too.)

celeritascelery · 2 years ago

Seems like that would take care of roots for the bytecode's themselves, but not for "native" functions[1]. Allocating a new object could call gc[2], and native functions are using the native callstack. It seems like it would be easy to allocate in a native function and any unrooted references would be invalidated. In fact I see a case like that here[3]. That method creates a reference with `expect_concrete_object_at` and then calls gc with `new_java_lang_class_object`. It avoids UB by not using `arg` after the call that gc's, but there is nothing stopping you from using `arg` again (and having an invalid reference).

[1] https://github.com/andreabergia/rjvm/blob/main/vm/src/native...

[2] https://github.com/andreabergia/rjvm/blob/be9c54066c64a82879...

[3] https://github.com/andreabergia/rjvm/blob/be9c54066c64a82879...

amelius · 2 years ago

GCs are pretty easy, and just a matter of good accounting. That is, until you start doing concurrent GC then it becomes hellishly difficult.

public class Generic { public static void main(String[] args) { List<String> strings = new ArrayList<String>(10); strings.add("hey"); strings.add("hackernews"); for (String s : strings) { tempPrint(s); } } private static native void tempPrint(String value); }

When I see such cool projects, I feel very overwhelmed. How do you get started with Rust and master basics to even attempt doing such a thing? Can OP explain?

nop_slide · 2 years ago

Likewise. Not to go onto too much of tangent, but on a more personal note I've been generally struggling with this feeling a lot lately.

I've been a professional software developer for almost 10 years, and I _know_ I'm competent (and not an impostor) as demonstrated by my current position and ability to ship things.

However, lately after viewing developer blogs I become overwhelmed that I actually don't know enough and am not a "real" developer. I seem to have formed a notion of an ideal developer in my head and I compare myself against this imagined construct which leads to these feelings. I admire how these people have so much deep knowledge and can express themselves so clearly and concisely, then wonder why I am not like that.

I barely have the energy after work after taking care of my family to do anything further, and I know programming isn't everything but I do have a desire to learn more and improve myself.

I recognize this isn't healthy nor is it rational, but it's just a feeling I can't shake lately.

dist1ll · 2 years ago

What you're describing is very common amongst developers. So common in fact, that I've written a post about this https://alic.dev/blog/comparisons

In short: recognizing your insecurities is the first step. The next step is figuring out what's important to you, shedding impossible to achieve and irrational ambitions, prioritizing your goals in life, and articulating concrete steps to further them.

theLiminator · 2 years ago

Well, you're probably comparing yourself against the top 1% of developers. It's okay to not be the very best, being in the top 30% of this field already is very rewarding.

p91paul · 2 years ago

I happen to personally know the author, and I'm not really surprised he pulled this off. Using him as a baseline of who is a real developer his extremely unhealthy. Please don't :)

bingemaker · 2 years ago

Thanks for sharing

andreabergia · 2 years ago

Well, _I_ feel impostor syndrome half the times I open HN honestly!

I did have a bit of experience with VMs before, I wrote many years ago a short series of posts about it on my blog, and at my previous job I dabbled a bit in JVM byte code to solve one very unusual problem we had for a customer. I also read the _amazing_ https://craftinginterpreters.com/ years ago and that gave me some ideas.

But this project was definitely big and complex. It took me a lot of time, and it got abandoned a couple of times, like many of my side projects. But I'm happy I finished it. :-)

naltun · 2 years ago

Not OP nor am I a Rust expert. I can speak regarding another technology: sockets.

I've been deep-diving into sockets recently. 2 weeks ago I had only a high-level understanding of sockets (learned from casually reading manpages, docs, blog posts, etc.). I decided to read as much as possible because I wanted to understand networking fundamentals, and after a week I learned enough to write some sockets code in Python and C. I know Python quite well, so reviewing the ``sockets'' library made more sense after my deep dive.

If you want to get better at technology A using language X, I suggest either reading/watching as much as you can about tech A, and build stuff with it in language Y. Then you can circle back to learning language X and you've already mastered much of the concepts around technology A.

e: spelling

aardvark179 · 2 years ago

Break things down. A simple language VM is going to have a way to represent objects in memory, a byte code interpreter, a simple garbage collector, and a way to load things.

A byte code interpret is a stack, some way to represent functions on that stack, and then a loop to interpret beach byte code and move the program counter.

sn9 · 2 years ago

How much do you code in your free time? Like average hours per week?

If it's zero (and no judgement from me if it is; plenty of other things to focus on), then it shouldn't be surprising that someone for whom that number is (speculatively) 10-20 hours per week on average for years has impressive side projects.

15155 · 2 years ago

Do some embedded work, implement a bare metal program on an ARM microcontroller in C or Rust. Make a LED blink. Then, make the same LED blink in pure ASM. The RP2040 is easy to bring up.

The magic will fade away quickly.

haspok · 2 years ago

Nice project, congrats!

One thing struck me as a bit odd:

> In particular, it does not support: generics

What kind of support is there for generics in the JVM? Maybe I'm too naive to assume that due to type erasure on bytecode level everything is just an Object, ie. a reference type? Or do you mean the class definition parser - but then, you don't really have any checks in place to see if the class file is valid (other than the basic syntax)?

Thanks!

About the generics - some people have pointed out the same on reddit, and yeah, you are correct. The only thing that should be done is to read the Signature attribute that encodes the generic information about classes, methods, and fields (https://docs.oracle.com/javase/specs/jvms/se7/html/jvms-4.ht...)

As a matter of fact, I just did a test and the following code works! :-)

newmana · 2 years ago

They might be talking about the checkcast operation: https://docs.oracle.com/javase/specs/jvms/se8/html/jvms-6.ht...

This is generated when you do something like: final Main value = list.get(0);

http://henrikeichenhardt.blogspot.com/2013/05/how-are-java-g...

xxs · 2 years ago

The cast is added by javac, so it just needs to verify the object on the stack to be compatible w/ the provided class. That part is very simple.

pretty much this - generics have (rare) implications to the reflection (but it's unsupported as well) but overall they are replaced with the nearest class/interface when compiled.

OTOH lack of string interning is super strange [it's trivial to implement], and w/o it JVM is not a thing. String being equal by reference is important, and part of JLS.

Lack of thread makes the entire endeavor a toy project.

whizzter · 2 years ago

Not entirely correct, last I checked string interning was ONLY guaranteed for those strings defined in source and read in during class loading, strings created via the String constructor (f.ex. via StringBuilder) CAN duplicate those strings that you hardcoded in your sources, to get the "canonical" string in those cases you have to invoke String.intern() if memory serves me correct.

https://docs.oracle.com/en/java/javase/11/docs/api/java.base...()

Also interning strings to optimize equality checks to be able to use pointer comparison is dangerous for external inputs since iirc at some point interned strings could permanently be stored (unless implemented by a WeakSet) and attackers could fill up your heap (or cause other GC issues since the entire interning functionality is a cache) by filling up your interning lists with crap.

senorrib · 2 years ago

“I want to stress that this is a toy JVM, built for learning purposes and not a serious implementation.”

3cats-in-a-coat · 2 years ago

Java strings are compared by reference, if they do not match, they're compared by value. There's no guarantee every single string has a single instance. That would hurt performance.

znpy · 2 years ago

> Lack of thread makes the entire endeavor a toy project.

yeah, as stated by the author in the line that says "I want to stress that this is a toy JVM, built for learning purposes and not a serious implementation."

ChuckMcM · 2 years ago

That is pretty awesome! When I joined the Java effort in '92 (called Oak at the time) the group I was with was looking at writing a full OS in Java. The idea being that you could get to just the minimal set of things needed as "machine code" (aka native methods) you could reduce the attack surface of an embedded OS. (originally Java was targeted to run in things like TV's and other appliances). We were, of course, working in C rather than Rust for the native methods. The JVM in Rust though adds a solid level of memory safety to the entire process.

mshockwave · 2 years ago

> writing a full OS in Java

IMAO, Android kind of achieve that...kind of. They write lots of OS logics in Java (or Kotlin) but mixing lots of system services written in native code at the same time, interconnected by the famous (or infamous?) Bind IPC.

ActorNightly · 2 years ago

Android is mostly things that run java, not java itself. You can look at the source code, there is a relatively small amount of java in there.

tgtweak · 2 years ago

Embedded JVM is actually huge/pervasive and runs things as benign as the chip on your credit card.

grishka · 2 years ago

Android isn't conventional Java. For starters, its runtime uses its own bytecode (dex) that's based on registers instead of a stack. But then, also, many things that aren't related to GUI are C++ with a thin Java wrapper on top.

When I think about a "Java OS", I imagine a JVM running in kernel mode, providing minimal OS functionality (scheduler, access to hardware I/O ports) and there not being any kind of userspace.

soperj · 2 years ago

It's a modified version of the linux kernel no? That'd be mostly C.

bpye · 2 years ago

This idea has definitely been tried a few times [0, 1].

[0] - https://en.wikipedia.org/wiki/Singularity_(operating_system)

[1] - https://en.wikipedia.org/wiki/Midori_(operating_system)

spullara · 2 years ago

There was one for a while though wasn't really targeted at users:

https://en.wikipedia.org/wiki/JavaOS

Still exists and thriving today as javacard.

pjmlp · 2 years ago

Besides the sibling comments,

- SavageJE

- microEJ

- PTC and Aonix bare metal Java runtimes

- SunSPOT mit SquawkVM

techn00 · 2 years ago

See also https://jacobin.org/ for JVM 17 written in Go.

xmcqdpt2 · 2 years ago

Also https://github.com/lihaoyi/Metascala for a JVM implemented in Scala running on the JVM.

dimgl · 2 years ago

Seems… redundant, no?

leshow · 2 years ago

That is a very interesting name for a programming project lol. The Jacobins were a revolutionary political club during the French Revolution in the 1790's. It's also the name of a magazine at https://jacobin.com

snordgren · 2 years ago

It's starts with the letters "ja", that's all that matters for a Java-related project.

I am curious if your ran into limitations due to the lifetimes on this signature

fn execute_instruction( &mut self, vm: &mut Vm<'a>, call_stack: &mut CallStack<'a>, instruction: Instruction, ) -> Result<InstructionCompleted<'a>, MethodCallFailed<'a>>

When I try to add a lifetime to the `Err` variant of a `Result` and that lifetime is invariant (which it is due to `vm` and `call_stack`) it usually means that I can't use the question mark operator or have early returns in the code[1]. This makes error handling more verbose and less readable. Is that your experience as well?

[1] https://users.rust-lang.org/t/nll-and-early-return-not-allow...

EDIT: Looks like this is not an issue because the invariant lifetime 'a is not used for the mutable reference of vm or call_stack. So it's not the invariance that is the problem, but rather how Rust reasons about the lifetime of mutable references, which this avoids.

In that case I don't understand what the point of 'a is on VM and CallStack. You can create[1][2] those with any unbounded lifetime (including 'static[3]), which means it is not constraining anything. What is the lifetime 'a doing here? Why not remove it?

I wanted to express the fact that everything that gets allocated (call stack, frames, classes, and objects) is alive and valid until the "root" VM is, thus I used 'a more or less everywhere.

I also struggled with a got a ton of errors from the borrow checker initially, and I fixed many of those with a lot of explicit lifetimes, but it's not impossible that in some places they are unnecessary.

cmrdporcupine · 2 years ago

Great learning project, I'm glad the author is having fun. Implementing a VM from scratch is a blast, and I have learned so much in the past doing that kind of thing.

If they're interested in bolting on a GC, it couldn't hurt to look at MMtk. (https://www.mmtk.io/) Some high quality collection algorithms, written to be pluggable to various VMs, and written in Rust.

Note that MMTK is x86 only. I was going to use it for a toy project but I have a Mac.

That's a bummer -- I guess I never noticed that, when I played with it before it was on an M1 Mac, but compiled into an x86 Julia executable & running through Rosetta (which surprisingly did not suck).

Haven't read, but I bet it's likely related to expectations around the x86_64 memory model & atomics. In the long run I see no reason why it couldn't be made portable, but I imagine the authors efforts are elsewhere for now.

sbt567 · 2 years ago

Uh, first time hearing mmtk. Thanks for the link!

I only became aware of it because a former employer (RelationalAI) was heavily interested in replacing Julia's GC with it (for some workloads): https://pretalx.com/juliacon2023/talk/BMBEGY/

Very well done. Building VMs is always fun, and I’m sure it was an interesting learning experience when combined with Rust’s type system.

If you’re looking for a job then ping me on Twitter, Mastodon or my work email, I’m sure you can figure them out from my user id here.