https://jack-vanlightly.com/blog/2023/4/24/why-apache-kafka-... talks about the case of a single failure and it shows how (a) Raft without fsync() loses ACK-ed messages and (b) Kafka without fsync() handles it fine.
This post on the other hand talks about a case where we have (a) one node being network partitioned, (b) the leader crashing, losing data, and combing back up again, all while (c) ZooKeeper doesn't catch that the leader crashed and elects another leader.
I think definitely the title/blurb should be updated to clarify that this is only in the "exceptional" case of >f failures.
I mean, the following paragraph seems completely misleading:
> Even the loss of power on a single node, resulting in local data loss of unsynchronized data, can lead to silent global data loss in a replicated system that does not use fsync, regardless of the replication protocol in use.
The next section (and the Kafka example) is talking about loss of power on a single node combined with another node being isolated. That's very different from just "loss of power on a single node".
When we combine network partitioning with single local data suffix loss it either leads to a consistency violation or to a system being unavailable desperate the majority of the nodes being are up. At the moment Kafka chooses availability over consistency.
Also I read Kafka source and the role of network partitioning doesn't seem to be crucial. I suspect that it's also possible to cause similar problem with a single node power-outage https://twitter.com/rystsov/status/1641166637356417027 and unfortunate timing
Cutting-edge? pBFT (Practical Byzantine Fault Tolerance) was published in 1999. The first Tendermint release was in 2015. With few exceptions, almost all big proof of stake blockchains are powered by variations of pBFT and have been for many years.