Distributed Erlang - Readit News

toast0 · 9 months ago

> This can lead to scalability issues in large clusters, as the number of connections that each node needs to maintain grows quadratically with the number of nodes in the cluster.

No, the total number of dist connections grows quadratically with the number of nodes, but the number of dist connections each node makes grows linearally.

> Not only that, in order to keep the cluster connected, each node periodically sends heartbeat messages to every other node in the cluster.

IIRC, heat beats are once every 30 seconds by default.

> This can lead to a lot of network traffic in large clusters, which can put a strain on the network.

Lets say I'm right about 30 seconds between heart beats, and you've got 1000 nodes. Every 30 seconds each node sends out 999 heartbeats (which almost certainly fit in a single tcp packet each, maybe less if they're piggybacking on real data exchanges). That's 999,000 packets every second, or 33k pps across your whole cluster. For reference, GigE line rate with full 1500 mtu packets is 80k pps. If you actually have 1000 nodes worth of work, the heartbeats are not at all a big deal.

> Historically, a "large" cluster in Erlang was considered to be around 50-100 nodes. This may have changed in recent years, but it's still something to be aware of when designing distributed Erlang systems.

I don't have recent numbers, but Rick Reed's presentation at Erlang Factory in 2014 shows a dist cluster with 400 nodes. I'm pretty sure I saw 1000+ node clusters too. I left WhatsApp in 2019, and any public presentations from WA are less about raw scale, because it's passe.

Really, 1000 dist connections is nothing when you're managing 500k client connections. Dist connections weren't even a big deal when we went to smaller nodes in FB.

It's good to have a solid backend network, and to try to bias towards fewer larger nodes, rather than more smaller nodes. If you want to play with large scale dist, so you spin up 1000 low cpu, low memory VMs, you might have some trouble. It makes sense to start with small nodes and whatever number makes you comfortable for availability, and then when you run into limits, reach for bigger nodes until you get to the point where adding nodes is more cost effective: WA ran dual xeon 2690 servers before the move to FB infra; facebook had better economics with smaller single Xeon D nodes; I dunno what makes sense today, maybe a single socket Epyc?

sausagefeet · 9 months ago

> That's 999,000 packets every second, or 33k pps across your whole cluster. For reference, GigE line rate with full 1500 mtu packets is 80k pps. If you actually have 1000 nodes worth of work, the heartbeats are not at all a big deal.

Using up almost half of your pps every 30 seconds for cluster maintenance certainly seems like it's more than "not a big deal", no?

toast0 · 9 months ago

(Oops, I meant to say 999k packets every 30 seconds. Thanks everyone for running with the pps number)

If your switching fabric can only deal with 1Gbps, yes, you've used it halfway up with heartbeats. But if your network is 1x 48 port 1G switch and 44x 24 port 1G switches, you won't bottleneck on heartbeats, because that spine switch should be able to simultaneously send and receive at line rate on all ports whicj is plenty of bandwidth. You might well bottleneck on other transmissions, but the nice thing about dist heartbeats is on a connection, each node is sending heartbeats on a timer and will close the connection if it doesn't see a heartbeat in some timeframe; it's a requirement for progress, it's not a requirement for a timely response, so you can end up with epic round trip times for net_adm:ping ... I've seen on the order of an hour once over a long distance dist connection with an unexpected bandwidth constraint.

It would probably be a lot more comfortable if your spine switch was 10g and your node switches had a 10g uplink, and you may want to consider LACP and double up all the connections. You might also want to consider other topologies, but this is just an illustration.

davisp · 9 months ago

> If you actually have 1000 nodes worth of work, the heartbeats are not at all a big deal.

I think you’re missing the fact that the heart beats will be combined with existing packets. Hence the quoted bit. If you’ve got 1000 nodes, they should be doing something with that network such that an extra 50 bytes (or so) every 30s would not be an issue.

desdenova · 9 months ago

If you're at the point where you decided the 1000 nodes are required, you should probably have already considered the sensible alternatives first, and concluded this was somehow better.

The network saturation is just a necessary cost of running such a massive cluster.

I really have no idea what kind of system would require 1000 nodes, that couldn't be replaced by 100, 10x larger, nodes instead. And at that point, you should probably be thinking of ways to scale the network itself as well.

immibis · 9 months ago

That's for 1000 nodes all sharing a single 1Gbps connection and sending 1500-byte heartbeats for some reason. Make them 150 bytes (more realistic) and now it's almost half the throughput of the 100Mbps router that for some reason you are sending every single packet through, for 1000 nodes.

anoother · 9 months ago

On a gigabit connection. It's hard to imagine clusters running on anything below 10Gb as an absolute minimum.

sgarland · 9 months ago

If you’re running 1Gbe in prod at any kind of scale, something has gone horribly awry.

crabbone · 9 months ago

I doubt these are even sent if there's actual traffic going through the network. I mean, why do you need to ping nodes, if they are already talking to you?

fwip · 9 months ago

Wouldn't you have a lot more than one gigabit connection? I'm struggling to imagine the 1000-node cluster layout that sends all intra-cluster traffic through a single cable.

worthless-trash · 9 months ago

It is possible to have a heartbeat network a data network if so required.

Deleted Comment

vereis · 9 months ago

ahoy!! thanks for spotting the issue I wrote the post in a stream of consciousness after a long day! I'll make that edit and call it out!

the statement about what historically constitutes a large erlang cluster was an anecdote told to by Francesco Cesarini during lunch a few years ago — I'm not actually sure of the time frame (or my memory!)

likewise I'll update the post to reflect that! thanks ( ◜ ‿ ◝ ) ♡

znpy · 9 months ago

> WA ran dual xeon 2690 servers before the move to FB infra; facebook had better economics with smaller single Xeon D nodes; I dunno what makes sense today, maybe a single socket Epyc?

that cpu is 8c/16t [1] ... having two of them in a system would mean 16c/32t.

AMD recently launched a 192c/384t monster cpu [2], and having two of them would mean having 384c/768t.

So most likely a 100 nodes-clusters today, powered by those amd cpus would give you the same core count of a 1200-node cluster at the time.

references:

[1]: https://www.intel.com/content/www/us/en/products/sku/64596/i...

[2]: https://www.youtube.com/watch?v=S-NbCPEgP1A

Deleted Comment

pstoll · 9 months ago

Ah the "erlang = distributed" thing.

The primitives of sending messages across a "cluster" are built in the language, yep. And lightweight processes with a per-process garbage collector is magic on minimizing global gc pauses.

But all the hard work to make a distributed system work are not "batteries included" for Erlang. Back pressure on messages, etc. you end up havig to build it yourself.

We hit a limit at around 200 physical servers in a cluster back in 2015. Maybe it's gotten better since then. shrug.

As the author calls out, it was built for the 80s, when a million dollar switch shouldn't fail. It isn't built with https://en.wikipedia.org/wiki/STONITH in mind i.e. any node is trash, assume it will go away and maybe new ones will come back.

Rock on, Erlang dude, rock on.

qart · 9 months ago

What's with the font? It seems to be flickering. Especially the headings. Is there a way to turn it off?

dan353hehe · 9 months ago

I think it’s meant to replicate a CRT display? I has a real hard time with those headings buzzing slightly.

rbanffy · 9 months ago

Speaking as someone who spent a lot of time in front of CRTs, this is NOT what an average CRT looks like. This would be reason to send it to maintenance back then - it looks like breaking contacts or problems with the analog board.

A good CRT would show lines, but they'd be steady and not flicker (unless you self-inflict an interlaced display on a short phosphor screen). It might also show some color convergence issues, but that, again, is adjustable (or you'd send it to maintenance).

This looks like the kind of TV you wouldn't plug a VIC-20 to.

emmanueloga_ · 9 months ago

Uses @keyframes and text-shadow to try and mimic a CRT effect but makes the text unreadable (for me at least). The browser readability mode does work on the page though.

johnisgood · 9 months ago

I think so too, and works on my old LCD monitor. If I focus on it, I can see the subtle changes, but other than that, it does not make it any less readable for me.

amw · 9 months ago

OP, if you're reading this, the animation pegs my CPU. Relying on your readers to engage reader mode is going to turn away a lot of people who might otherwise enjoy your content.

SrslyJosh · 9 months ago

Unfortunately, scrolling is choppy for me even with reader mode turned on. Great article, though.

dtquad · 9 months ago

I think it looks cool. Of course there should be a reader/accessibility mode but otherwise we need more creativity on the web.

MisterTea · 9 months ago

Maybe you need creativity but I just need information and presentation isn't nearly as important as legibility.

leoff · 9 months ago

type this in your console `document.body.style.cssText = 'animation: none !important'`

funkydata · 9 months ago

If one of your browsers has this feature, just toggle reader view.

penguin_booze · 9 months ago

On Firefox, Ctrl+Alt+R - also known as Reader View.

Deleted Comment

desdenova · 9 months ago

If a person considers putting this type of effect on text reasonable, I really don't think they have anything of value to say in the text content anyways.

behnamoh · 9 months ago

fn/n where n is parity of fn was confusing for me at first, but then I started to like the idea, esp. in dynamic languages it's helpful to have at least some sense of how things work by just looking at them.

this made me think about a universal function definition syntax that captures everything except implementation details of the function. something like:

    fn (a: int, b: float, c: T) {foo: callable, bar: callable, d: int} -> maybe(bool | [str] | [T])

which shows that the function fn receives a (integer), b (float), c (generic type T) and maybe returns something that's either a boolean or a list of strings or a list of T, or it throws and doesn't return anything. the function fn also makes use of foo and bar (other functions) and variable d which are not given to it as arguments but they exist in the "context" of fn.

wbadart · 9 months ago

As I understand it, Erlang inherited its arity notation from Prolog (which early versions of Erlang were implemented in).

https://en.m.wikipedia.org/wiki/Erlang_(programming_language...

elcritch · 9 months ago

Well Erlang and Elixir both have a type checker, dialyzer, which has a type specification not to far from what you proposed. Well excluding the variable or function captures.

Elixir’s syntax for declaring type-specs can be found at https://hexdocs.pm/elixir/1.12/typespecs.html

the_duke · 9 months ago

Elixir is also working on a proper type system with compile time + runtime type checks.

arlort · 9 months ago

minor nitpick, I think you meant arity. If it's a typo sorry for the nitpick if you had only ever heard it spoken out loud before: arity is the number of parameters (in the context of functions), parity is whether a number is even or odd

johnisgood · 9 months ago

The distinctions are important, would not consider it a minor nitpick. :P

itishappy · 9 months ago

Looks similar to how languages with effect systems work. (Or appear to work, I haven't actually sat down with one yet.) They don't capture the exact functions, rather captuing the capabilities of said functions, which are the important bit anyway. The distinction between `print` vs `printline` doesn't matter too much, the fact you're doing `IO` does. It seems to be largely new languages trying this, but I believe there's a few addons for languages such as Haskell as well.

https://koka-lang.github.io/koka/doc/index.html

https://www.unison-lang.org/docs/fundamentals/abilities/

https://dev.epicgames.com/documentation/en-us/uefn/verse-lan...

derefr · 9 months ago

> fn/n where n is parity of fn was confusing for me at first, but then I started to like the idea, esp. in dynamic languages it's helpful to have at least some sense of how things work by just looking at them.

To be clear, this syntax is not just "helpful", it's semantically necessary given Erlang's compilation model.

In Erlang, foo/1 and foo/2 are separate functions; foo/2 could be just as well named bar/1 and it wouldn't make a difference to the compiler.

But on the other hand, `foo(X) when is_integer(X)` and `foo(X) when is_binary(X)` aren't separate functions — they're two clause-heads of the same function foo/1, and they get unified into a single function body during compilation.

So there's no way in Erlang for a (runtime) variable [i.e. a BEAM VM register] to hold a handle/reference to "the `foo(X) when is_integer(X)` part of foo/1" — as that isn't a thing that exists any more at runtime.

When you interrogate an Erlang module at runtime in the erl REPL for the functions it exports (`Module:module_info(exports)`), it gives you a list of {FunctionName, Arity} tuples — because those are the real "names" of the functions in the module; you need both the FunctionName and the Arity to uniquely reference/call the function. But you don't need any kind of type information; all the type information is lost from the "structure" of the function, becoming just validation logic inside the function body.

---

In theory, you could have compile-time type safety for function handles, with type erasure at runtime ala Java's runtime erasure of compile-time type parameters. I feel like Erlang is not the type of language to ever bother to add support for this — as it doesn't even accept or return function handles from most system functions, instead accepting or returning {FunctionName, Arity} tuples — but in theory it could.

toast0 · 9 months ago

> But Erlang itself, as a language, doesn't even have function-reference literals; you usually just pass {FunctionName, Arity} tuples around.

I don't really understand what a literal is, but isn't fun erlang:node/0 a literal? It operates the same way as 1, which I'm quite sure is a numeric literal:

    Eshell V12.3.2.17  (abort with ^G)
    1> F = fun erlang:node/0.
    fun erlang:node/0
    2> X = 1.
    1
    3> F.
    fun erlang:node/0
    4> X.
    1

I set the variable to the thing, and then when I output the variable, I see the same value I set. AFAIK, both 1 and fun erlang:node/0 must both be literals, as they behave the same; but I'm happy to learn otherwise.

jlarocco · 9 months ago

Meh, if I need that much detail I'll just read the function.

And a function returning one of three types sounds like a TERRIBLE idea.

c0balt · 9 months ago

> And a function returning one of three types sounds like a TERRIBLE idea.

Should have been an enum :)

openrisk · 9 months ago

Quite readable even for those not familiar with Erlang.

Is there a list of projects that shows distributed erlang in action (taking advantage of its strengths and avoiding the pitfals)?

vereis · 9 months ago

I recommend taking a look at the various open source Riak applications too! Might not be updated to any sort of recent versions of erlang but was a great resource to me early on.

zaik · 9 months ago

ejabberd and RabbitMQ are written in Erlang

brabel · 9 months ago

And CouchDB

anacrolix · 9 months ago

awful font.

distributed Erlang is like saying ATM machine

vereis · 9 months ago

honest q: how do you distinguish between single node erlang applications vs clustered erlang applications?

toast0 · 9 months ago

This returns true if you're on a single node erlang application :P

    erlang:node() == nonode@nohost andalso erlang:nodes(known) == [nonode@nohost]

A single node erlang application would be one that doesn't use dist at all. Although, if it includes anything gen_event or similar, and it happens to be dist connected, unless it specifically checks, it will happily reply to remote Erlang processes.

hsavit1 · 9 months ago

yeah, isolated + distributed processes is THE THING about Erlang, OTP

vereis · 9 months ago

my personal reality is that the majority of projects I've consulted on have seldom actually leveraged distributed erlang for anything. the concurrency part yes, clustering for the sake of availability or spreading load yes, but actually doing anything more complex than that has been the exception! ymmv tho!

throwawaymaths · 9 months ago

I mean you can certainly run one off scripts in elixir and Erlang.