The Janet Street folks, who created this, also did an interesting episode[0] of their podcast where they discuss performance considerations when working with OCaml. What I was curious about was applying a GC language to a use case that must have extremely low latency. It seems like an important consideration, as a GC pause in the middle of high-frequency trading could be problematic.
I actually asked Ron Minsky about exactly this question on Twitter[0]:
Me: [W]hy not just use Rust for latency sensitive apps/where it may make sense? Is JS using any Rust?
Minsky: Rust is great, but we get a lot of value out of having the bulk of our code in a single language. We can share types, tools, libraries, idioms, and it makes it easier for folk to move from project to project.
And we're well on our way to getting the most important advantages that Rust brings to the table in OCaml in a cleanly integrated, pay as you go way, which seems to us like a better outcome.
There are also some things that we specifically don't love about Rust: the compile times are long, folk who know more about it than I do are pretty sad about how async/await works, the type discipline is quite complicated, etc.
But mostly, it's about wanting to have one wider-spectrum language at our disposal.
Well on their way.. by having to write a ton of C for the interpreter. I think it’s really imprudent for them not to be using Rust yet for critical sections.
GC compactions were indeed a problem for a number of systems. The trading systems in general had a policy of not allocating after startup. JS has a library, called "Zero" that provides a host of non-allocating ways of doing things.
> What I was curious about was applying a GC language to a use case that must have extremely low latency. It seems like an important consideration, as a GC pause in the middle of high-frequency trading could be problematic.
Regarding a run-time environment using garbage collection in general, not OCaml specifically, GC pauses can be minimized with parallel collection algorithms such as found in the JVM[0]. They do not provide hard guarantees however, so over-provisioning system RAM may also be needed in order to achieve required system performance.
Another more complex approach is to over-provision the servers such that each can drop out of the available pool for a short time, thus allowing "offline GC." This involves collaboration between request routers and other servers, so may not be worth the effort if a deployment can financially support over-provisioning servers such that there is always an idle CPU available for parallel GC on each.
You just let the garbage accumulate and collect it whenever markets are closed. In most cases, whenever you need ultra low latency in trading, you usually have very well defined time constraints (market open/close).
Maybe it's different for markets that are always open (crypto?) but most HFT happens during regular market hours.
Are you aware of how many allocations the average program executes in the span of a couple of minutes? Where do you propose all of that memory lives in a way that doesn’t prevent the application from running?
Haven't looked at the link, but I think for a scenario like trading where there are market open and close times, you can just disable the GC, and restart the program after market close.
Anonymous labeled structs and enums are some of my top wished-for features in programming languages! For instance, in Rust you can define labelled and unlabelled (i.e. tuple) structs
In Dart, we merged tuples and records into a single construct. A record can have positional fields, named fields, or both. A record type can appear anywhere a type annotation is allowed. So in Dart these are both fine:
The curly braces in the record type annotation distinguish the named fields from the positional ones. I don't love the syntax, but it's consistent with function parameter lists where the curly braces delimit the named parameters.
It's interesting that languages which start with purely nominal structs tend to acquire some form of structurally typed records in the long run. E.g. C# has always had (nominally typed) structs, then .NET added (structurally typed) tuples, and then eventually the language added (still structurally typed) tuples with named items on top of that.
I think the main dividing line here is whether you want to lean into strict typing or whether you prefer a more loose typing structure. The extremes of both (where, for instance, the length of an array is part of its type definition or there are not contractual guarantees about data) are both awful. I think the level of type strictness you desire as a product is probably best dictated by team and project size (which you'll note changes over the lifetime of the product) with a lack of typing making it much easier to prototype early code while extremely strict typing can serve as a strong code contract in a large codebase where no one person can still comprehend the entirety of it.
It's a constant push and pull of conflicting motivations.
> Because sum:int * product:int is a different type from product:int * sum:int, the use of a labeled tuple in this example prevents us from accidentally returning the pair in the wrong order, or mixing up the order of the initial values.
Hmm, I think I like F#'s anonymous records better than this. For example, {| product = 6; sum = 5 |}. The order of the fields doesn't matter, since the value is not a tuple.
One reason why they're not the same is because the memory representation is different (sort of). This will break FFIs if you allow reordering the tuple arbitrarily.
Labeled tuples are effectively order-independent. Your implementation's order has to match your interface's order, but callers can destruct the labeled tuples in any order and the compiler will do the necessary reordering (just like it does for destructing records, or calling functions with labeled arguments). I don't think this is materially different from what you're describing in F#, except that labeled tuples don't allow labeling a single value (that is, there's no 1-tuple, which is also the case for normal tuples).
I wasn't aware that this fork supported SIMD! Between this, unboxed types and the local mode with explicit stack allocation, OxCaml almost entirely replaces my past interest in F#; this could actually become usable for gamedev and similar consumer scenarios if it also supported Windows.
Yeah, this would be great! Currently only 128-bit SSE/NEON is working but AVX is coming very soon. There's also nothing blocking Windows, but it will require some work. (I added the SIMD support in OxCaml)
FWIW, the "Get OxCaml" page actually says that SIMD on ARM isn't supported yet. If it actually works it would be worth removing that from the known issues list https://oxcaml.org/get-oxcaml/
Cool to hear there aren't any technical blockers to add Windows support! You just convinced me into giving OxCaml a try for a hobby project. 128-bit SSE is likely to be enough for my use case and target specs.
Because alerts are promoted to errors, they break existing package installs unnecessarily. The OCAMLPARAM environment variable just forces that alert to be disabled and allows the package installation to continue.
Probably spoilt here but being used to the excellent vscode plugin (well vscodium in my case) for Golang but... any plans to integrate with vscode ecosystem? Makes setup so straightforward!
The OCaml vscode plugin seems to have already integrated a lot of new syntaxes (dune, menhir, reason), so if OxCaml gains traction it should only be a matter of time.
(can't really speak for that myself, though, I use emacs)
If you follow the installation instructions on oxcaml.org, you’ll get a patched Merlin with LSP support etc. It’s not perfect, but does mostly work out of the box with VSCode and the OCaml Platform extension.
I had a similar thought to this, but then I thought, who is worse: programmers who keep bloating up existing languages with new features, or programmers who create yet another new language to add to the already crowded field? (I'm in that latter category.)
I guess programmers are just genetically incapable of leaving their tools the way they are.
What are the chances that they are releasing this so that LLMs can index this information for free and they can use public models in their codebase rather than finetuning public models?
Given how poor LLMs are at regular OCaml, which has so much more training data that OxCaml, probably none. An MCP for docs would have been more productive for that purpose.
Not good at all. It's not a strong enough signal. For instance, LLMs are absolute dogshit at completing Gleam, even if given files with the exact pattern they need to mimic just lines away, or given explicit instructions on common mistakes it makes.
[0] https://signalsandthreads.com/performance-engineering-on-har...
The real issue is being a GC language, without support for explicit manipulation of stack and value types.
Want a GC language, with productivity of GC languages, with the knobs to do low level systems coding?
Cedar, Oberon language family, Modula-3, D, Nim, Eiffel, C#, F#, Swift, Go.
Deleted Comment
Regarding a run-time environment using garbage collection in general, not OCaml specifically, GC pauses can be minimized with parallel collection algorithms such as found in the JVM[0]. They do not provide hard guarantees however, so over-provisioning system RAM may also be needed in order to achieve required system performance.
Another more complex approach is to over-provision the servers such that each can drop out of the available pool for a short time, thus allowing "offline GC." This involves collaboration between request routers and other servers, so may not be worth the effort if a deployment can financially support over-provisioning servers such that there is always an idle CPU available for parallel GC on each.
0 - https://docs.oracle.com/en/java/javase/17/gctuning/parallel-...
So if you want hard guarantees, you reach out to real time JVM implementations like the commercial ones from PTC and Aicas.
Maybe it's different for markets that are always open (crypto?) but most HFT happens during regular market hours.
Deleted Comment
https://github.com/ocaml/ocaml/pull/13498
https://discuss.ocaml.org/t/first-alpha-release-of-ocaml-5-4...
- https://www.youtube.com/watch?v=WM7ZVne8eQE
- https://tyconmismatch.com/papers/ml2024_labeled_tuples.pdf
https://dart.dev/language/records
I see that this is different from Rust's existing product types, in which First and Fourth are always different types.
Second though, can you give me some examples where I'd want this? I can't say I have ever wished I had this, but that might be a different experience.
I think the main dividing line here is whether you want to lean into strict typing or whether you prefer a more loose typing structure. The extremes of both (where, for instance, the length of an array is part of its type definition or there are not contractual guarantees about data) are both awful. I think the level of type strictness you desire as a product is probably best dictated by team and project size (which you'll note changes over the lifetime of the product) with a lack of typing making it much easier to prototype early code while extremely strict typing can serve as a strong code contract in a large codebase where no one person can still comprehend the entirety of it.
It's a constant push and pull of conflicting motivations.
Hmm, I think I like F#'s anonymous records better than this. For example, {| product = 6; sum = 5 |}. The order of the fields doesn't matter, since the value is not a tuple.
env OCAMLPARAM="alert=-unsafe_multidomain,_," opam install cohttp-lwt-unix
Because alerts are promoted to errors, they break existing package installs unnecessarily. The OCAMLPARAM environment variable just forces that alert to be disabled and allows the package installation to continue.
Dead Comment
http://t3x.org/mlite/index.html
(can't really speak for that myself, though, I use emacs)
Can’t wait for the next level
I guess programmers are just genetically incapable of leaving their tools the way they are.
At the very least I'll give OxCaml a try to compare. Best case I drop mine and use this, worst case I learn what works and what doesn't.