My favorite thing about this kind of code is that people are constantly inventing new techniques to do time integration. It's the sort of thing you'd think was a solved problem but then when you read about time integration strategies you realize how rich the space of solutions are. And they are more like engineering problems than pure math, with tradeoffs and better fits based on the kind of system.
Decades ago ... I think it was Computer Recreations column in "Scientific American" ... the strategy, since computers were less-abled then, was to advance all the bodies by some amount of time — call it a "tick" — when bodies got close, the "tick" got smaller: therefore the calculations more nuanced, precise. Further apart and you could run the solar system on generalities.
One way of doing that is by each time step delta perform two integrations, one which give you x_1, and another x_2, with different accuracies. The error is estimated by the difference |x_1 - x_2| and you make the error match your tolerance by adjusting the time step.
Naturally, this difference becomes big when two objects are close together, since the acceleration will induce a large change of velocity, and a lower time step is needed to keep the error under control.
Every time I see anything on the N-Body problem I am reminded by my final high school project... I had 2-3 weeks of time to write an n-body simulation. Back then I used C++ and my hardware was really bad (2 GHz single core CPU or so…). The result was not really impressive because it did not really work. :D But I was able to show that my code correctly predicted that the moon and the earth would eventually collide without any initial velocity given to both bodies. I went into this totally naive and blind but it was a ton of fun.
> my hardware was really bad (2 GHz single core CPU or so…)
laughs in 32kb of RAM
Sounds like a great project, though. There's a lot of fundamental concepts involved in something like this, so it's a great learning exercise (and fun as well!) :)
Once you have the matrix implementation in Step 2 (Implementation 3) it's rather straightforward to extend your N-body simulator to run on a GPU with Jax --- you can just add `import jax.numpy as jnp` and replace all the `np.`s with `jnp`s.
For a few-body system (e.g., the Solar System) this probably won't provide any speedup. But once you get to ~100 bodies you should start to see substantial speedups by running the simulator on a GPU.
Oh for sure there is a ton that is going on behind the scenes when you substitute `np` --> `jnp`. But as someone who worked on gravitational dynamics a decade ago, it's really incredible that so much work has been done to make running numerical calculations on a GPU so straightforward for the programmer.
Obviously if you want maximum performance you'll probably still have to roll up your sleeves and write CUDA yourself, but there are a lot of situations where you can still get most of the benefit of using a GPU and never have to worry yourself over how to write CUDA.
People think high-level libraries are magic fairy dust - "abstractions! abstractions! abstractions!". But there's no such thing as an abstraction in real life, only heuristics.
Not true for regular GPUs like RTX5090. They have atrocious float64 performance compared to CPU. You need a special GPU designed for scientific computations (many float64 cores)
This is confusing: GPUs seem great for scientific computations. You often want f64 for scientific computation. GPUs aren't good with f64.
I'm trying to evaluate if I can get away with f32 for GPU use for my molecular docking software. Might be OK, but I've hit cases broadly where f64 is fine, but f32 is not.
I suppose this is because the dominant uses games and AI/ML use f32, or for AI even less‽
Next logical optimization: Barnes Hut? Groups source bodies using a recursive tree of cubes. Gives huge speedups with high body counts. FMM is a step after, which also groups target bodies. Much more complicated to implement.
That's mentioned on the "Conclusions" page of TFA:
> Large-scale simulation: So far, we have only focused on systems with a few objects. What about large-scale systems with thousands or millions of objects? Turns out it is not so easy because the computation of gravity scales as
. Have a look at Barnes-Hut algorithm to see how to speed up the simulation. In fact, we have documentations about it on this website as well. You may try to implement it in some low-level language like C or C++.
C or C++? Ha! I've implemented FMM in Fortran 77. (Not my choice; it was a summer internship and the "boss" wanted it that way.) It was a little painful.
Supercomputers will simulate trillions of masses. The HACC code, commonly used to verify the performance of these machines, uses a uniform grid (interpolation and a 3D FFT) and local corrections to compute the motion of ~8 trillion bodies.
I was fascinated with doing particle simulations in "C" language as a teenager in the late 1980s on my VGA monitor! I would do both gravity and charged particles.
Once particles accelerate they'll just 'jump by' each other of course rather than collide, if you have no repulsive forces. I realized you had to take smaller and smaller time-slices to get things to slow down and keep running when the singularities "got near". I had a theory even back then that this is what nature is doing with General Relativity making time slow down near masses. Slowing down to avoid singularities. It's a legitimate theory. Maybe this difficulty is why "God" (if there's a creator) decided to make everything actually "waves" as much as particles, because waves don't have the problem of singularities.
In Step 2 (Gravity), why are we summing over the cube of the distance between the bodies in the denominator?
Edit: To answer myself, I think this is because one of the factors is to normalize the vector between the two bodies to length 1, and the other two factors are the standard inverse square relationship.
Oops, pretty big mistake on my part. I gave the formula for force and labeled it acceleration. Way too late to edit, but the correctly labeled formulas should have been:
F = G * m1 * m2 / |r|^2 * r_unit
F = G * m1 * m2 / |r|^3 * r
Or as acceleration, by dividing out an `m` using `F=ma`:
The force itself is `G * m1 * m2 / (r^2)`. That's a pure magnitude. The direction of the force is just the unit vector going from m1 to m2. You need it to be a unit vector or else you're multiplying up to something higher than that force. However, I don't get why you'd ever cube the 'r'. Never seen that. I don't think it's right, tbh.
Back in the early 1990s I was going through books in the college library and found one on numerical results of different 3-body starting configurations.
I vividly remember the Pythagorean three-body problem example, and how it required special treatment for the close interactions.
Which made me very pleased to see that configuration used as the example here.
https://alvinng4.github.io/grav_sim/5_steps_to_n_body_simula...
Naturally, this difference becomes big when two objects are close together, since the acceleration will induce a large change of velocity, and a lower time step is needed to keep the error under control.
But in our simulation (reality, or whatever), uncertainty increases the closer we look.
Does that suggest we don’t live in a simulation? Or is “looking closer increases uncertainty” something that would emerge from a nested simulation?
laughs in 32kb of RAM
Sounds like a great project, though. There's a lot of fundamental concepts involved in something like this, so it's a great learning exercise (and fun as well!) :)
For a few-body system (e.g., the Solar System) this probably won't provide any speedup. But once you get to ~100 bodies you should start to see substantial speedups by running the simulator on a GPU.
For the programmer, yes it is easy enough.
But there is a lot of complexity hidden behind that change. If you care about how your tools work that might be a problem (I'm not judging).
Obviously if you want maximum performance you'll probably still have to roll up your sleeves and write CUDA yourself, but there are a lot of situations where you can still get most of the benefit of using a GPU and never have to worry yourself over how to write CUDA.
I'm trying to evaluate if I can get away with f32 for GPU use for my molecular docking software. Might be OK, but I've hit cases broadly where f64 is fine, but f32 is not.
I suppose this is because the dominant uses games and AI/ML use f32, or for AI even less‽
Deleted Comment
> Large-scale simulation: So far, we have only focused on systems with a few objects. What about large-scale systems with thousands or millions of objects? Turns out it is not so easy because the computation of gravity scales as . Have a look at Barnes-Hut algorithm to see how to speed up the simulation. In fact, we have documentations about it on this website as well. You may try to implement it in some low-level language like C or C++.
Once particles accelerate they'll just 'jump by' each other of course rather than collide, if you have no repulsive forces. I realized you had to take smaller and smaller time-slices to get things to slow down and keep running when the singularities "got near". I had a theory even back then that this is what nature is doing with General Relativity making time slow down near masses. Slowing down to avoid singularities. It's a legitimate theory. Maybe this difficulty is why "God" (if there's a creator) decided to make everything actually "waves" as much as particles, because waves don't have the problem of singularities.
Edit: To answer myself, I think this is because one of the factors is to normalize the vector between the two bodies to length 1, and the other two factors are the standard inverse square relationship.
https://benchmarksgame-team.pages.debian.net/benchmarksgame/...
2022 "N-Body Performance With a kD-Tree: Comparing Rust to Other Languages"
https://ieeexplore.ieee.org/document/10216574
2025 "Parallel N-Body Performance Comparison: Julia, Rust, and More"
https://link.springer.com/chapter/10.1007/978-3-031-85638-9_...
Deleted Comment
I vividly remember the Pythagorean three-body problem example, and how it required special treatment for the close interactions.
Which made me very pleased to see that configuration used as the example here.