Readit News logoReadit News
Posted by u/bedobi 3 years ago
What's special about Erlang and Elixir?
I'm intrigued by the claims about them but also don't entirely understand the marketing

"Elixir is a dynamic, functional language for building scalable and maintainable applications."

OK, there are lots of languages that are either or both dynamic and functional, and they can all used to build scalable and maintainable transactions.

"Elixir runs on the Erlang VM, known for creating low-latency, distributed, and fault-tolerant systems."

Let's skip over "low-latency", because virtually all programming languages and VMs are "low-latency".

"distributed" and "fault-tolerant" are what's most often touted as "special".

I'm not sure what "distributed" means here? Any app in most languages can be coded to be "distributed" both on the scale of within a single machine and at the scale of deploying multiple small instances of it. (eg Kubernetes, running it on multiple EC2s etc etc)

As for "fault tolerant": "The unavoidable truth about software in production is that things will go wrong. Even more when we take network, file systems, and other third-party resources into account. To react to failures, Elixir supervisors describe how to restart parts of your system when things go awry, going back to a known initial state that is guaranteed to work"

Don't entirely understand this either. In a run of the mill corporate Java app, if you experience an exception eg calling another service, unless you're doing something very strange, your app will not stop running, it will just keep error logging those exceptions until the problem is resolved. (whether by the target service coming back online or you send out a fix for the call or whatever) Really the only time an app will outright crash and completely stop is if it experiences errors attempting to boot in the first place, and the correct solution there is to not allow instances that don't respond 200 OK on /health to take traffic. You certainly don't want to attempt restart the system as it won't do any good.

So I'm left wondering, am I missing something or is there really not much to the claims about Erlang and Elixir? Don't get me wrong, this is not me criticizing Erlang or Elixir or the hype machine around them, every language has to have marketing and that's fine. It's also not me advocating corporate Java (shudder) - I'm just using it as an example of a common technology that also seems to tick all the same boxes as Erlang and Elixir as far as these claims go.

notemaker · 3 years ago
This is a great talk that answers many of your questions, https://youtu.be/JvBT4XBdoUE

I am so far merely an enthused observer, so I can't give any insights into the experience of delivering business software with it. Here's my observations though

- Engineers that work with this stack love it. The two most common issues I've seen are lack of static typing and difficulty in hiring

- Great for IO bound tasks due to collaborative scheduling. For CPU bound tasks performance is better offloaded to e.g Rust (see rustler)

- Exception handling is done by processes having supervisors. My mental model for this is similar to pods & deployments à la K8s, but at any abstraction level you want inside your component.

- Elixir pleases many crowds: ruby, FP, performance, resilience

- Phoenix (an elixir library) is seen as a next step from ruby on rails.

- Concurrency model is considered to be simple & powerful

bedobi · 3 years ago
Big thanks for the reply

> Great for IO bound tasks due to collaborative scheduling

Right, so this seems similar-ish to Loom in Java, Coroutines in Kotlin etc, ie Erlang and Elixir have already had for a long time and fully utilized by default this concept which is only recently making it into many other languages?

> Exception handling is done by processes having supervisors. My mental model for this is similar to pods & deployments à la K8s, but at any abstraction level you want inside your component.

Right, based on this and what others are saying about the same thing, basically Erlang and Elixir have already had for a long time and fully utilized by default this concept which other languages don't really have natively but instead rely on external cloud infra to provide?

In any case I guess my question is if you're eg experiencing a timeout because your code is calling another service that isn't responding, you can shut down that calling process and restart it all you want, it's not going to really fix the problem, you still need to address why that other service isn't responding? (though it may prevent the VM from being eaten up exclusively by waiting for responses that will never come - is that the idea?)

cschmatzler · 3 years ago
Hiring isn’t really that bad. Definitely fewer candidates, but they tend to be higher quality.
japhib · 3 years ago
My company uses Elixir exclusively in the backend and we just hire solid backend developers, and teach them to use Elixir. That's how I was a couple of years ago. The learning curve is pretty gentle.
jacquesm · 3 years ago
satvikpendem already linked that talk (see other comment)
elcritch · 3 years ago
> Let's skip over "low-latency", because virtually all programming languages and VMs are "low-latency".

Perhaps under minimal load. However, the BEAM VM's style of micro preemption allows it to retain really low latency under load, much better than traditional VM's on average.

> "distributed" and "fault-tolerant" are what's most often touted as "special".

The distributed piece is that theres not much difference between communicating to a local actor or a remote one. Also you get a lot of built in tools so you don't need a message bus, a cache, and a task manager, etc. Its convenient if you can stay below ~100 nodes.

Fault tolerance is just how the supervisors encourage structuring the application. Java and C++ land tends to blur together things and it's difficult to disentangle the errors. Also, errors recovery tends to focus less on just "restarting" and more on trying to recover the current state.

toast0 · 3 years ago
> Its convenient if you can stay below ~100 nodes.

There's not much of a node limit. When I was at WhatsApp (through 2019) we ran thousands of nodes in some of our dist clusters. Unmodified pg2 was challenging at that scale (and modified pg2 was still sometimes challenging), but the new pg module doesn't rely on global:set_lock (or any form of global locking), and eliminates that scaling hurdle. Mnesia would likely get unmanagable with many nodes in a shared schema, but I'm not really sure why one would want to have that many nodes in a shared schema (with persistent nodes, we'd usually run 4 nodes in a mnesia schema cluster, with ocassional additional nodes added to facilitate resplitting data across more schema clusters. with ephemeral nodes, I believe it was typically 6 nodes).

I was always confused when other groups suggested a node limit like 100 nodes. I just don't know where it comes from, and it doesn't match my experience. Hard limits come from the atom table and port limits, both of which are pretty easy to expand. Certainly if we could manage on the order of millions of connections to clients (that specific number dropped over time as the server did more work per connection and then when switching to the smaller node types that were the path of least resistance in FB hosting), a couple thousand for dist wasn't going to impinge on port limits.

lliamander · 3 years ago
If I was starting a project from scratch, there are three reasons I would go with the BEAM:

- The runtime has so much devops/SRE stuff built right in. Hot code reloading, trace debugging, a remote shell. It's very easy for developers to do their own ops. Check out the book Erlang in Anger for more info.

- The language design (for all BEAM) languages is top-notch. Subjectively I appreciate the functional style of those languages, but aside from that they are also very expressive and have relatively few sharp edges compared to most languages

- Writing idiomatic Erlang/Elixir code means you will be able to have a system that is responsive even under load.

> In a run of the mill corporate Java app, if you experience an exception eg calling another service, unless you're doing something very strange, your app will not stop running, it will just keep error logging those exceptions until the problem is resolved. (whether by the target service coming back online or you send out a fix for the call or whatever) Really the only time an app will outright crash and completely stop is if it experiences errors attempting to boot in the first place, and the correct solution there is to not allow instances that don't respond 200 OK on /health to take traffic. You certainly don't want to attempt restart the system as it won't do any good.

I don't have the time to describe in detail, but I will just say that coming from a Erlang system to a Java/K8s/Microservices architecture, is was surprising for me just how much extra work it takes to replicate things that are built into Erlang.

tel · 3 years ago
A lot of what Erlang/Elixir offer have been cribbed by cloud architecture, so in terms of direct comparisons you may find that a sufficiently well-architected microservices system has similar properties and will make Erlang/Elixir less desirable.

But Erlang/Elixir can probably accomplish many of those properties at a fraction of the cost.

The core advantages of these languages are (a) a runtime which is highly optimized for running a large set of preemptable concurrent tasks with very minimal shared memory and (b) an ecosystem designed around building fault-tolerant applications out of many independent communicating processes that "supervise" one another to detect failures and gracefully heal them.

There are many details about how those things work, but I'll reiterate: together they offer a system with many robustness properties similar to that of an idealized large microservices architecture but with much less complexity and cost.

jacquesm · 3 years ago
> A lot of what Erlang/Elixir offer have been cribbed by cloud architecture, so in terms of direct comparisons you may find that a sufficiently well-architected microservices system has similar properties and will make Erlang/Elixir less desirable.

> But Erlang/Elixir can probably accomplish many of those properties at a fraction of the cost.

And far more elegantly and easily.

tel · 3 years ago
Completely agreed.

I'd go further as to say that a "sufficiently well-architected cloud architecture" is nearly unaffordable by most teams and thus, without loss of much generality, will not be obtained by you.

And a similar Erlang/Elixir system can just be made by one person.

bedobi · 3 years ago
I see, this is super helpful and interesting, big thanks
jacquesm · 3 years ago
What's different is not so much the language (though it is likely unlike anything you've ever used before) but the extreme focus on reliability, error handling, OTP, BEAM and all the other Erlang goodies. All of the parts are impressive individually, they work together to create something that is in my opinion unparalleled in the software world. Read this paper to get a much better idea of what makes this a very unique combination of elements: https://erlang.org/download/armstrong_thesis_2003.pdf

Good luck!

erenyeager · 3 years ago
It’s an expressive functional language with a ruby-like feel and very ergonomic. Also the ecosystem with things like phoenix, nerves, etc is getting better each day. The creator of elixir (Jose Valim) put a lot of love into the language and helped organize a lot of the community. It’s just a joy to use for me, outside of usecases of BEAM and such.
darkmarmot · 3 years ago
I’m not aware of another systems language that pulls off nine nines of reliability. Having been paid to write Java, C#, and Elixir systems, I’ve found the BEAM to provide vastly better fault tolerance. My current project has requirements for zero downtime and zero data loss, actively replicates across 4 data centers across the US and handles the real-time medical data millions of hospital patients.
bedobi · 3 years ago
> vastly better fault tolerance

I really want to get this but I don't

if you're eg experiencing a timeout because your code is calling another service that isn't responding, you can shut down that calling process and restart it all you want, it's not going to really fix the problem, you still need to address why that other service isn't responding? (though it may prevent the VM from being eaten up exclusively by waiting for responses that will never come - is that the idea?)

sshine · 3 years ago
An answer stolen from here [1]:

Erlang is fault tolerant with the following things in mind:

- Erlang knows that errors WILL happen, and things will break, so instead of guarding against errors, Erlang lets you have strong tools to minimize impact of errors and recover from them as they happen.

- Erlang encourages you to program for success case, and crash if anything goes wrong without trying to recover partially broken data. The idea behind this is that partially incorrect data may propagate further in your system and may get written to database, and thus presents risk to your system. Better to get rid of it early and only keep fully correct data.

- Process isolation in Erlang helps with minimizing impact of partially wrong data when it appears and then leads to process crash. System cleans up the crashed code and its memory but keeps working as a whole.

- Supervision and restart strategies help keep your system fully functional if parts of it crashed by restarting vital parts of your system and bringing them back into service. If something goes very wrong such that restarts happen too much, the system is considered broken beyond repair and thus is shut down.

[1]: https://stackoverflow.com/questions/3760881/how-is-erlang-fa...

jacquesm · 3 years ago
You should really do a write-up if your employer allows this.
darkmarmot · 3 years ago
We’ve actually released multiple open source Elixir libraries (I wrote the HL7 one, but one for MLLP - those are for medical messaging). My coworker has done several talks on the system, one linked here: https://www.erlang-solutions.com/blog/how-hca-healthcare-use...
choiway · 3 years ago
You're not missing anything if you can and don't mind building a framework of tools that run long running processes, handle the start up, failure and state of those processes and be able to introspect the status and internal state of those processes in a standard way.

The way isolated processes can be created and managed allows for the "let it crash" ideology in Erlang. For example, when you visit a Phoenix Framework site, you have your own process. If I was visiting the same site and encountered a state that crashed my process your process would be unaffected. The "exception" would only affect me until you ran into the same state that caused the crashed.

If I reported the crash to the developer, the developer could fix the bug and soft start the entire application without affecting your process. This isn't possible because of the language per se but really because of the entire Erlang ecosystem around fault tolerance.

*edited for more stuff