It seems that when implemented correctly, running full-stack Elixir can greatly simplify complicated distributed systems. In-memory storage can replace a cache, and the architecture itself can replace a message queue. Horizontal or even vertical scaling can replace complicated scaling mechanisms. There are plenty of advantages to running data pipelines in parallel too.
There's also a huge trend now away from managed cloud services (ZIRP), and it seems like Elixir with a database on the same machine or network could deliver great performance and scalability.
So why don't more people use Elixir? I've read some complaints about the lack of static types or not having as many developers. But if those were the only constraints, one would think that they could be overcome, or some service could just greatly simplify Elixir deployments.
I built 3 production projects in elixir around 2015-2018, and it was a blast to learn and work with. Those were interesting projects that were a great fit for the stack. One was March Madness bracket game which required a huge throughput on day one, and another was a football game audience play calling each play from their phone as the team is playing live at the stadium. This one needed a lot of timing coordination and poor connection handling. I even put out an open source fast/customizable leaderboard on top of ETS.
However, I didn’t feel the need for this stack on most projects, and if I’m totally honest, I never got good at the novel way of building applications in it.
While it was very enjoyable, everything felt a little awkward. Even 3 years in I constantly felt that I’m not doing it right, and I’m fighting the language to do data transformations in a purely functional style. I never got used to writing Ecto queries, and always had to look up their syntax. Plus, there didn’t seem to be a good architecture story, just isolated praise of OTP. And Phoenix further fueled the confusion, making it hard to understand whether I’m supposed to reason about my app like a Rails app (just build controllers, models, and views) with no regard for processes, or I should orchestrate some creative supervision trees, that I can’t even tell how they would be arranged in a typical web app. Going back to Ruby on Rails was kind of back to super productive comfort zone after a bit, and I just continued staying there.
I’m still looking back at Elixir with nostalgia, wondering if I’m going to have a chance to go back to it and really make it an extension of my arms/brain like Ruby had become. And whether I can do all that amazing supervision-based architecting I keep theorizing in my mind. I loved Dave Thomas’s vision (and agreed with his controversial takes) and really wished I could get as good as Sasa Juric at really deeply reasoning in this framework. Maybe one day.
You should give it another try. :)
You jumped in quite early, when we were still collectively figuring out what it meant to build Phoenix applications, and many things were in flush back then. You probably went through Ecto 1 -> 2 -> 3 and Phoenix was migrating to contexts.
But I also have to say there was a lot of FOMO in relation to OTP back then: people felt they had to build these amazing supervision designs, otherwise they were not using the language correctly. But the truth is that they are building blocks for frameworks and certain libraries, in the same way threads/mutexes are for other languages. Of course, GenServers are higher-level, more expressive, and bring more properties, but the overall idea is that they are about infrastructure, and not design. Phoenix, Ecto, Broadway, etc should be spawning 99% of the processes that you need for you.
> But I also have to say there was a lot of FOMO in relation to OTP back then: people felt they had to build these amazing supervision designs, otherwise they were not using the language correctly.
Exactly!
Contexts really threw me off, because I couldn't figure out how/why would I set a boundary up front, if I'm not even sure how the app is going to turn out.
I never felt truly proficient. I recall years ago interviewing for a role and straight up telling the interviewer I wasn’t very good with Elixir. That’s just how it made me feel. In retrospect I was fine, I built some interesting stuff. I probably could have promoted myself for that. But the language kind of left me feeling like I wasn’t getting it.
Even so, great fun to learn and apart from those aspects, I enjoyed building with it a lot.
There also is the human aspect of Elixir or arguably any language/framework comparison and inertia to change. One could argue all the technical merits of one vs. another but at the end of the day, successful business outcomes are not going to change much _because_ of the technology choice.
It's hard to get a company that has a lot of Ruby/Rails apps and knowledge to commit to training existing developers on something else with the net result being, its just a different language/framework. When you're building data intensive web applications, Rails will scale. The vast majority of cost with data intensive apps is disk storage which it doesn't really matter what web tech stack you're running.
As someone who likes working with off-meta languages, it's clear that the default state for any new language is to die an early death. The fact that Elixir hasn't died is testament to its strengths.
The idea that we'd need to keep a snapshot of modules from a specific date in order to run certain software is ridiculous, and the idea of running it in a container like Docker is just giving in to bad practices. Too often it means things are too fragile to update when there are security issues.
I haven't seen this happen with Elixir. What do they do better / differently from Ruby that updating doesn't cause the house of cards to come crashing down?
Docker is another technology that is sort of ridiculous from this perspective. Deploying an entire OS just to make dealing with dependencies easier
The number of people who are calculating which language to write in by thinking “how can I write a fault-tolerant distributed system with less time and energy so that I can quickly release performant products” is minuscule. The lack of popularity of Elixir is evidence of this I think.
In regards to your point on ZIRP: billions of dollars have been poured in to LLMs that are biased toward legacy languages like Python and JavaScript. Even the file structure of these languages is conducive to LLMs. A HTTP server can be and is frequently defined in a single file or function, where everything from socket creation to connection handling to request parsing to database queries to response can be composed in literally a few lines of code. This immense expressiveness is a testament to the power of HTTP. But as I’m sure we’re all aware there are limits to what can be accomplished with a single machine serving these stateless requests, and the limit can be reached very quickly. But LLMs gravitate towards producing these haiku-like incantations, it’s trivial for them.
Elixir’s power comes from having a well-defined API and explicit failure modes for each of these layers, each in their own expressive modules. This makes it difficult for LLMs to write Phoenix code when they’ve been optimized to output a 3-line FastAPI decorator-definition-query endpoint. Each of these layers in Phoenix is by itself quite simple, each layer has about 2-4 required functions to be implemented, and Phoenix can generate all of these for you. But ChatGPT doesn’t seem to be able to grok it all at once the way a good engineer can after readings the docs for an afternoon.
Will Elixir survive an era of programming where RTFM is a lost art? I suppose we will find out soon enough!
I am working on an Elixir project that in no circumstance should be taxing but it's falling over miserably at like 100 - 200 events per minute. The detail is it is distributed-ish IoT. I didn't write it, the person(s) who did write it are gone, and no one else writes Elixir. I've gotten some good gains already but I'd like to squeeze the 80 - 90% of juice I think I can get before resorting to beefier hardware.
I've gotten into instrumenting and measuring it and I have some ideas but I'd love to hear others point me to other ideas. The real problem is that the hardware is miserably underpowered and it is real-time, by that I mean I can't defer, schedule for later, or de-prioritize anything.
---
To actually contribute, I really like Elixir. I am not yet sure why I would advocate for it over something more 'simple' like nodejs (My background is, accidentally, Javascript World) but it's certainly a very nice language to write in. It feels magical but not too magical where you get scared it's trapping you into its web.
Before anyone jumps too much on me for it, I gauge "simplicity" by how many people can I hire to write it. You can barely swing a cat without hitting 3 competent Javascript developers. I tried for many years to hire another golang dev so I could write it professionally, I only encountered a few despite having been in most interviews my employers would do. With that said, it may just be that the Venn diagram between "writes Javascript" and "writes golang" is small.
One possibility is you're using a single process instead of parallelizing things. For example, you may want to use one process per event, etc. Though if the hardware is very underpowered and say single core, I could see it becoming problematic.
Unfortunately.
From metrics, computing AWS signatures takes up an absurdly large amount of CPU time. The actual processing of events is quite minimal and honestly well-architected, a lot of stuff is loaded into memory rather than read from disk. There's syncing that happens fairly frequently from the internet which refreshes the cache.
The big problem is each event computes a new signature to send back to the API. I do have to wonder if the AWS signature is 99% of the problem and once I take that burden off, the entire system will roar to life. That's what makes me so confused because I had heard Erlang / Elixir could do on the scale of significantly more per minute even with pretty puny hardware.
One thing I am working on is batching then I am considering dropping the AWS signatures in favor of short-lived tokens since either way, it's game over if someone gets onto the system anyway since they could exploit the privilege. The systems are air-gapped anyway so the risk is minimal in my opinion.
> One possibility is you're using a single process instead of parallelizing things. For example, you may want to use one process per event, etc.
This is done by pushing it to a task ie: `Task.Supervisor.async_nolink`? That's largely where I found my gains actually.
It took a dive into how things schedule, because a big issue that was happening was the queue would get massively backed up, and I realized that I needed to apparently toggle on a flag telling it to pack the scheduler more (`+scl true`). I also looked into the wake-up lengths of threads. I am starting to get my head around "dirty schedulers" but I am not entirely sure how to affect those or how I can besides it doing it forever me.
The other one just for posterity is that I believe events get unnecessarily queued because they don't / didn't have locks. So if event A gets queued then creates a timer to re-queue it in 5 minutes, event A (c|w)ould continue to get queued despite the fact the first event A hadn't been processed yet. So the queue would just continue to compound and starve itself.
I actually think Elixir really doesn't have great performance. TechEmpower, which is IMO the most real world standardized tests out there shows that Phoenix doesn't even complete. And Elixir+Plug+Ecto performs worse than Rails, which is an entire framework.
Everyone in Elixir land tells me "Oh those benchmarks don't matter". Yet they are heavily talked about, and referred to here and other places. They only don't matter if you perform terribly on them I suppose.
And they say "Oh we didn't care to put much effort into it", Yet Jose Valim himself tried to work on it and didn't fix it. He's written extensively about how this type of test doesn't really fit elixir, etc, but ultimately it's just doing DB queries, why does this not work?
I really think Elixir is mostly propaganda at this point. It's a huge mental paradigm shift and I have seen myself that it wasn't performant, and as you said you keep thinking "Oh I must be doing it wrong".
I just cannot fathom why anyone in a decent sized company would use with all the negatives it has going for it. YMMV
Proof below
https://www.techempower.com/benchmarks/#section=data-r22&hw=...
This is inaccurate. I have started looking again into solutions only last week [1]. My suspicion was always the database pool size was too small but, when I tried to contribute 4+ years ago, fine tuning was hard because it took too long, so I didn't pursue it further [2].
My discontent with the benchmarks is that they are not measuring what people effectively run in production. Since you mentioned Rails, here is how a Rails application looks like:
https://github.com/TechEmpower/FrameworkBenchmarks/blob/mast...
But almost nobody runs a Rails application like that in production.
And then you look at the configuration of most stacks and they have been explicitly fine tuned to match exactly the concurrency traffic and patterns that the benchmark sends through. But, for most web applications, your web traffic is not homogeneous and you don't have a predetermined number of concurrent requests.
I still believe those benchmarks are not indicative of what you will actually see in production. Most companies who have gone from Rails to Phoenix, for example, report a 10-20x reduction in operation size and costs. But it is clear at this point people put way too much stock on these benchmarks. The irony of it all is that, if someone copies these setups into their actual applications, they will most likely perform worse. Oh well.
[1]: https://github.com/TechEmpower/FrameworkBenchmarks/pull/9302
[2]: https://github.com/TechEmpower/FrameworkBenchmarks/pull/5432 - here you can see me increasing the database pool to 40... but most benchmarks today run with 512-1024 connections (which, once again, is most likely not what you would do in prod). In any case, we need to bump our numbers accordingly.
Why isn't it used? It's niche and betting on such a small community is risky for the majority of companies. Why use Elixir when you could hire 10 engineers to pump out javascript. That's the mentality of most.
Hiring for elixir was great, it self selected people who wrote code as their craft. You kind of have to be curious about code to even be aware of Elixir, know how to write elixir. These types of devs would pick it up really quickly because the language is just so damn ergonomic.
I'm using Elixir now, and I wake up so happy that I get paid to do this. I am really blessed.
I worked at a company that hired like this. On the whole it was good, but it wasn’t a panacea. A surprising number ended up being academic types that were extremely smart but could never actually finish anything. Amazing guys to talk to at lunch though!
As a startup CTO currently building with Elixir, I can say that it's absolutely possible to hire people who know how to deliver. I completely agree with the parent comment re: self-selecting people who write code as craft. I've put together an excellent team of developers who deeply care about what they do and are focused on shipping.
I’ve often thought that would be true for niche languages. I interviewed at a fintech whose backend is written in Haskell, and definitely got that vibe from the interviewers.
Personally? I like good typesystems and very sharp tooling. It has seemed to me that Elixir has neither, and doesn’t really offer me a particularly big advantage that would outweigh those disadvantages.
Also, I don't find typing to be an issue in Elixir due to the way the language works (pattern-matching, guard clauses, immutable data, etc.). Maybe I just have a talent for remembering types, though, because strong typing has always been a huge meh for me.
It's useful to remind that any full rewrite without notable changes in feature scope will probably wind up performing better than the original organically developed solution, even if you never change tech stacks. That's just the nature of going in with a total picture of the problem space and knowledge of the warts already encountered.
The main factors being:
- How large is the pool of available candidates for this language? A recruiting risk.
- How mature is this language? A business continuity risk.
I worked at a company with a massive Erlang codebase. Really nasty, not really following good OTP practices, etc. HUGE hiring problem and it took 6-12 months to spin up new devs to the point that they could really have ownership over things. And this is not a system that even remotely needed this.
Elixir might seem better until you write enough of it to realize that you do basically have to learn Erlang or else you'll always be at a disadvantage when it comes to really understanding it.