I have a few questions about the garbage collection. One of the hard parts of implementing a garbage collector is making sure everything is properly rooted (especially with a moving collector). you have the `do_garbage_collection` method marked unsafe[1], but don't explain what the calling code needs to do to ensure it is safe to call. How do you ensure all references to the heap are rooted? This is not a trivial problem[2][3][4].
Also note that I cloned the repo and tried to run `cargo test` every test fails with 'should be able to add entries to the classpath: InvalidEntry(".../vm/rt.jar")' vm/tests/integration/real_code_tests.rs:15:10
It's pretty straightforward. Their VM maintains its own notion of a callstack instead of using the native callstack. That lets them iterate over it and find all of the parameters and locals on the VM's callstack and use them as roots.
There is a performance cost for a VM having its own virtual callstacks like this, but it makes GC tracing much simpler. (It also makes implementing interesting concurrency and control flow primitives like coroutines or continuations much easier too.)
Seems like that would take care of roots for the bytecode's themselves, but not for "native" functions[1]. Allocating a new object could call gc[2], and native functions are using the native callstack. It seems like it would be easy to allocate in a native function and any unrooted references would be invalidated. In fact I see a case like that here[3]. That method creates a reference with `expect_concrete_object_at` and then calls gc with `new_java_lang_class_object`. It avoids UB by not using `arg` after the call that gc's, but there is nothing stopping you from using `arg` again (and having an invalid reference).
What kind of support is there for generics in the JVM? Maybe I'm too naive to assume that due to type erasure on bytecode level everything is just an Object, ie. a reference type? Or do you mean the class definition parser - but then, you don't really have any checks in place to see if the class file is valid (other than the basic syntax)?
About the generics - some people have pointed out the same on reddit, and yeah, you are correct. The only thing that should be done is to read the Signature attribute that encodes the generic information about classes, methods, and fields (https://docs.oracle.com/javase/specs/jvms/se7/html/jvms-4.ht...)
As a matter of fact, I just did a test and the following code works! :-)
public class Generic {
public static void main(String[] args) {
List<String> strings = new ArrayList<String>(10);
strings.add("hey");
strings.add("hackernews");
for (String s : strings) {
tempPrint(s);
}
}
private static native void tempPrint(String value);
}
pretty much this - generics have (rare) implications to the reflection (but it's unsupported as well) but overall they are replaced with the nearest class/interface when compiled.
OTOH lack of string interning is super strange [it's trivial to implement], and w/o it JVM is not a thing. String being equal by reference is important, and part of JLS.
Lack of thread makes the entire endeavor a toy project.
Not entirely correct, last I checked string interning was ONLY guaranteed for those strings defined in source and read in during class loading, strings created via the String constructor (f.ex. via StringBuilder) CAN duplicate those strings that you hardcoded in your sources, to get the "canonical" string in those cases you have to invoke String.intern() if memory serves me correct.
Also interning strings to optimize equality checks to be able to use pointer comparison is dangerous for external inputs since iirc at some point interned strings could permanently be stored (unless implemented by a WeakSet) and attackers could fill up your heap (or cause other GC issues since the entire interning functionality is a cache) by filling up your interning lists with crap.
Java strings are compared by reference, if they do not match, they're compared by value. There's no guarantee every single string has a single instance. That would hurt performance.
> Lack of thread makes the entire endeavor a toy project.
yeah, as stated by the author in the line that says "I want to stress that this is a toy JVM, built for learning purposes and not a serious implementation."
That is pretty awesome! When I joined the Java effort in '92 (called Oak at the time) the group I was with was looking at writing a full OS in Java. The idea being that you could get to just the minimal set of things needed as "machine code" (aka native methods) you could reduce the attack surface of an embedded OS. (originally Java was targeted to run in things like TV's and other appliances). We were, of course, working in C rather than Rust for the native methods. The JVM in Rust though adds a solid level of memory safety to the entire process.
IMAO, Android kind of achieve that...kind of. They write lots of OS logics in Java (or Kotlin) but mixing lots of system services written in native code at the same time, interconnected by the famous (or infamous?) Bind IPC.
Android isn't conventional Java. For starters, its runtime uses its own bytecode (dex) that's based on registers instead of a stack. But then, also, many things that aren't related to GUI are C++ with a thin Java wrapper on top.
When I think about a "Java OS", I imagine a JVM running in kernel mode, providing minimal OS functionality (scheduler, access to hardware I/O ports) and there not being any kind of userspace.
That is a very interesting name for a programming project lol. The Jacobins were a revolutionary political club during the French Revolution in the 1790's. It's also the name of a magazine at https://jacobin.com
When I try to add a lifetime to the `Err` variant of a `Result` and that lifetime is invariant (which it is due to `vm` and `call_stack`) it usually means that I can't use the question mark operator or have early returns in the code[1]. This makes error handling more verbose and less readable. Is that your experience as well?
EDIT: Looks like this is not an issue because the invariant lifetime 'a is not used for the mutable reference of vm or call_stack. So it's not the invariance that is the problem, but rather how Rust reasons about the lifetime of mutable references, which this avoids.
In that case I don't understand what the point of 'a is on VM and CallStack. You can create[1][2] those with any unbounded lifetime (including 'static[3]), which means it is not constraining anything. What is the lifetime 'a doing here? Why not remove it?
I wanted to express the fact that everything that gets allocated (call stack, frames, classes, and objects) is alive and valid until the "root" VM is, thus I used 'a more or less everywhere.
I also struggled with a got a ton of errors from the borrow checker initially, and I fixed many of those with a lot of explicit lifetimes, but it's not impossible that in some places they are unnecessary.
Great learning project, I'm glad the author is having fun. Implementing a VM from scratch is a blast, and I have learned so much in the past doing that kind of thing.
If they're interested in bolting on a GC, it couldn't hurt to look at MMtk. (https://www.mmtk.io/) Some high quality collection algorithms, written to be pluggable to various VMs, and written in Rust.
That's a bummer -- I guess I never noticed that, when I played with it before it was on an M1 Mac, but compiled into an x86 Julia executable & running through Rosetta (which surprisingly did not suck).
Haven't read, but I bet it's likely related to expectations around the x86_64 memory model & atomics. In the long run I see no reason why it couldn't be made portable, but I imagine the authors efforts are elsewhere for now.
I only became aware of it because a former employer (RelationalAI) was heavily interested in replacing Julia's GC with it (for some workloads): https://pretalx.com/juliacon2023/talk/BMBEGY/
When I see such cool projects, I feel very overwhelmed. How do you get started with Rust and master basics to even attempt doing such a thing? Can OP explain?
Likewise. Not to go onto too much of tangent, but on a more personal note I've been generally struggling with this feeling a lot lately.
I've been a professional software developer for almost 10 years, and I _know_ I'm competent (and not an impostor) as demonstrated by my current position and ability to ship things.
However, lately after viewing developer blogs I become overwhelmed that I actually don't know enough and am not a "real" developer. I seem to have formed a notion of an ideal developer in my head and I compare myself against this imagined construct which leads to these feelings. I admire how these people have so much deep knowledge and can express themselves so clearly and concisely, then wonder why I am not like that.
I barely have the energy after work after taking care of my family to do anything further, and I know programming isn't everything but I do have a desire to learn more and improve myself.
I recognize this isn't healthy nor is it rational, but it's just a feeling I can't shake lately.
What you're describing is very common amongst developers. So common in fact, that I've written a post about this https://alic.dev/blog/comparisons
In short: recognizing your insecurities is the first step. The next step is figuring out what's important to you, shedding impossible to achieve and irrational ambitions, prioritizing your goals in life, and articulating concrete steps to further them.
Well, you're probably comparing yourself against the top 1% of developers. It's okay to not be the very best, being in the top 30% of this field already is very rewarding.
I happen to personally know the author, and I'm not really surprised he pulled this off. Using him as a baseline of who is a real developer his extremely unhealthy. Please don't :)
Well, _I_ feel impostor syndrome half the times I open HN honestly!
I did have a bit of experience with VMs before, I wrote many years ago a short series of posts about it on my blog, and at my previous job I dabbled a bit in JVM byte code to solve one very unusual problem we had for a customer. I also read the _amazing_ https://craftinginterpreters.com/ years ago and that gave me some ideas.
But this project was definitely big and complex. It took me a lot of time, and it got abandoned a couple of times, like many of my side projects. But I'm happy I finished it. :-)
Not OP nor am I a Rust expert. I can speak regarding another technology: sockets.
I've been deep-diving into sockets recently. 2 weeks ago I had only a high-level understanding of sockets (learned from casually reading manpages, docs, blog posts, etc.). I decided to read as much as possible because I wanted to understand networking fundamentals, and after a week I learned enough to write some sockets code in Python and C. I know Python quite well, so reviewing the ``sockets'' library made more sense after my deep dive.
If you want to get better at technology A using language X, I suggest either reading/watching as much as you can about tech A, and build stuff with it in language Y. Then you can circle back to learning language X and you've already mastered much of the concepts around technology A.
Break things down. A simple language VM is going to have a way to represent objects in memory, a byte code interpreter, a simple garbage collector, and a way to load things.
A byte code interpret is a stack, some way to represent functions on that stack, and then a loop to interpret beach byte code and move the program counter.
How much do you code in your free time? Like average hours per week?
If it's zero (and no judgement from me if it is; plenty of other things to focus on), then it shouldn't be surprising that someone for whom that number is (speculatively) 10-20 hours per week on average for years has impressive side projects.
Do some embedded work, implement a bare metal program on an ARM microcontroller in C or Rust. Make a LED blink. Then, make the same LED blink in pure ASM. The RP2040 is easy to bring up.
Also note that I cloned the repo and tried to run `cargo test` every test fails with 'should be able to add entries to the classpath: InvalidEntry(".../vm/rt.jar")' vm/tests/integration/real_code_tests.rs:15:10
[1] https://github.com/andreabergia/rjvm/blob/be9c54066c64a82879...
[2] https://manishearth.github.io/blog/2021/04/05/a-tour-of-safe...
[3] https://without.boats/blog/shifgrethor-iii/
[4] https://coredumped.dev/2022/04/11/implementing-a-safe-garbag...
There is a performance cost for a VM having its own virtual callstacks like this, but it makes GC tracing much simpler. (It also makes implementing interesting concurrency and control flow primitives like coroutines or continuations much easier too.)
[1] https://github.com/andreabergia/rjvm/blob/main/vm/src/native...
[2] https://github.com/andreabergia/rjvm/blob/be9c54066c64a82879...
[3] https://github.com/andreabergia/rjvm/blob/be9c54066c64a82879...
One thing struck me as a bit odd:
> In particular, it does not support: generics
What kind of support is there for generics in the JVM? Maybe I'm too naive to assume that due to type erasure on bytecode level everything is just an Object, ie. a reference type? Or do you mean the class definition parser - but then, you don't really have any checks in place to see if the class file is valid (other than the basic syntax)?
About the generics - some people have pointed out the same on reddit, and yeah, you are correct. The only thing that should be done is to read the Signature attribute that encodes the generic information about classes, methods, and fields (https://docs.oracle.com/javase/specs/jvms/se7/html/jvms-4.ht...)
As a matter of fact, I just did a test and the following code works! :-)
This is generated when you do something like: final Main value = list.get(0);
http://henrikeichenhardt.blogspot.com/2013/05/how-are-java-g...
OTOH lack of string interning is super strange [it's trivial to implement], and w/o it JVM is not a thing. String being equal by reference is important, and part of JLS.
Lack of thread makes the entire endeavor a toy project.
https://docs.oracle.com/en/java/javase/11/docs/api/java.base...()
Also interning strings to optimize equality checks to be able to use pointer comparison is dangerous for external inputs since iirc at some point interned strings could permanently be stored (unless implemented by a WeakSet) and attackers could fill up your heap (or cause other GC issues since the entire interning functionality is a cache) by filling up your interning lists with crap.
yeah, as stated by the author in the line that says "I want to stress that this is a toy JVM, built for learning purposes and not a serious implementation."
IMAO, Android kind of achieve that...kind of. They write lots of OS logics in Java (or Kotlin) but mixing lots of system services written in native code at the same time, interconnected by the famous (or infamous?) Bind IPC.
When I think about a "Java OS", I imagine a JVM running in kernel mode, providing minimal OS functionality (scheduler, access to hardware I/O ports) and there not being any kind of userspace.
[0] - https://en.wikipedia.org/wiki/Singularity_(operating_system)
[1] - https://en.wikipedia.org/wiki/Midori_(operating_system)
https://en.wikipedia.org/wiki/JavaOS
- SavageJE
- microEJ
- PTC and Aonix bare metal Java runtimes
- SunSPOT mit SquawkVM
fn execute_instruction( &mut self, vm: &mut Vm<'a>, call_stack: &mut CallStack<'a>, instruction: Instruction, ) -> Result<InstructionCompleted<'a>, MethodCallFailed<'a>>
When I try to add a lifetime to the `Err` variant of a `Result` and that lifetime is invariant (which it is due to `vm` and `call_stack`) it usually means that I can't use the question mark operator or have early returns in the code[1]. This makes error handling more verbose and less readable. Is that your experience as well?
[1] https://users.rust-lang.org/t/nll-and-early-return-not-allow...
In that case I don't understand what the point of 'a is on VM and CallStack. You can create[1][2] those with any unbounded lifetime (including 'static[3]), which means it is not constraining anything. What is the lifetime 'a doing here? Why not remove it?
[1] https://github.com/andreabergia/rjvm/blob/be9c54066c64a82879...
[2] https://github.com/andreabergia/rjvm/blob/be9c54066c64a82879...
[3] https://github.com/andreabergia/rjvm/blob/be9c54066c64a82879...
I also struggled with a got a ton of errors from the borrow checker initially, and I fixed many of those with a lot of explicit lifetimes, but it's not impossible that in some places they are unnecessary.
If they're interested in bolting on a GC, it couldn't hurt to look at MMtk. (https://www.mmtk.io/) Some high quality collection algorithms, written to be pluggable to various VMs, and written in Rust.
Haven't read, but I bet it's likely related to expectations around the x86_64 memory model & atomics. In the long run I see no reason why it couldn't be made portable, but I imagine the authors efforts are elsewhere for now.
If you’re looking for a job then ping me on Twitter, Mastodon or my work email, I’m sure you can figure them out from my user id here.
I've been a professional software developer for almost 10 years, and I _know_ I'm competent (and not an impostor) as demonstrated by my current position and ability to ship things.
However, lately after viewing developer blogs I become overwhelmed that I actually don't know enough and am not a "real" developer. I seem to have formed a notion of an ideal developer in my head and I compare myself against this imagined construct which leads to these feelings. I admire how these people have so much deep knowledge and can express themselves so clearly and concisely, then wonder why I am not like that.
I barely have the energy after work after taking care of my family to do anything further, and I know programming isn't everything but I do have a desire to learn more and improve myself.
I recognize this isn't healthy nor is it rational, but it's just a feeling I can't shake lately.
In short: recognizing your insecurities is the first step. The next step is figuring out what's important to you, shedding impossible to achieve and irrational ambitions, prioritizing your goals in life, and articulating concrete steps to further them.
I did have a bit of experience with VMs before, I wrote many years ago a short series of posts about it on my blog, and at my previous job I dabbled a bit in JVM byte code to solve one very unusual problem we had for a customer. I also read the _amazing_ https://craftinginterpreters.com/ years ago and that gave me some ideas.
But this project was definitely big and complex. It took me a lot of time, and it got abandoned a couple of times, like many of my side projects. But I'm happy I finished it. :-)
I've been deep-diving into sockets recently. 2 weeks ago I had only a high-level understanding of sockets (learned from casually reading manpages, docs, blog posts, etc.). I decided to read as much as possible because I wanted to understand networking fundamentals, and after a week I learned enough to write some sockets code in Python and C. I know Python quite well, so reviewing the ``sockets'' library made more sense after my deep dive.
If you want to get better at technology A using language X, I suggest either reading/watching as much as you can about tech A, and build stuff with it in language Y. Then you can circle back to learning language X and you've already mastered much of the concepts around technology A.
e: spelling
A byte code interpret is a stack, some way to represent functions on that stack, and then a loop to interpret beach byte code and move the program counter.
If it's zero (and no judgement from me if it is; plenty of other things to focus on), then it shouldn't be surprising that someone for whom that number is (speculatively) 10-20 hours per week on average for years has impressive side projects.
The magic will fade away quickly.