Readit News logoReadit News
throwaway892238 · 3 years ago
Yes!!! I have been saying for years that lower level protocols are a bad joke at this point, but nobody in the industry wants to invest in making things better. There are so many improvements we could be making, but corporations don't see any "immediate shareholder value", so they sit around happy as pigs in shit with the status quo.

What's kind of hilarious about this paper is, these are just the network-layer problems! It completely ignores that the "port number" abstraction for service identification has completely failed due to the industry glomming onto HTTP as some sort of universal tunnel encapsulation for all application-layer protocols. And then there's all the non-backend problems!

And that's just TCP. We still lack any way to communicate up and down the stack of an entire transaction, for example for debugging purposes. We should have a way to forward every single layer of the stack across each hop, and return back each layer of the stack, so that we can programmatically determine the exact causes of network issues, automatically diagnose them, and inform the user how to solve them. But right now, you need a human being to jump onto the user's computer and fire up an assortment of random tools in mystical combinations and use human intuition to divine what's going on, like a god damn Networking Gandalf. And we've been doing it this way for 40+ years.

alexgartrell · 3 years ago
> corporations don't see any "immediate shareholder value", so they sit around happy as pigs in shit with the status quo.

This is ridiculous.

Hyperscalars see an immediate ROI from efficiency/reliability improvements and actively invest in TCP alternatives all of the time. It's just really hard.

Networking companies see an ability to differentiate their products from their peers and work on this kind of thing as well. I did a 3 second google for "QUIC acceleration Mellanox" and got a hit on Nvidia's blog right away.

You just can't trivially replace something with an investment totally 50 years of clock time and thousands of years of engineer time. It will either take a long time or a massive shift in needs/technology. FWIW, I wouldn't be surprised if the high-performance RDMA networks being put together for AI workloads were the thing that grew into the "next" thing.

oconnor663 · 3 years ago
> 50 years of clock time and thousands of years of engineer time

It's not just the size of the investment, it's that it's the protocol everyone uses to talk to other people's machines, and you can't upgrade or replace other people's machines.

samgaw · 3 years ago
> FWIW, I wouldn't be surprised if the high-performance RDMA networks being put together for AI workloads were the thing that grew into the "next" thing.

Maybe we were just early in giving (HFT) customers RDMA back in ~2007[1][2] but I don't see it entering the mainstream anytime soon. And after a relatively short 20 years of adoption, the "next" thing for hyperscalers is not going to be the next thing for everyone else.

[1] https://downloads.openfabrics.org/Media/IB_LowLatencyForum_2...

[2] https://www.thetradenews.com/wombat-and-voltaire-break-milli...

kortilla · 3 years ago
> There are so many improvements we could be making, but corporations don't see any "immediate shareholder value", so they sit around happy as pigs in shit with the status quo.

This is severe bullshit on two fronts:

- there is an immediate return on value - Google was driving this a decade+ ago for improvements in the data center (things like doubled+cancelable rpc, tcp cubic, quic, etc)

- academia constantly attempts to make these improvements as well because researchers are super incentivized to dethrone tcp for the glory. There are constant attempts to re-invent various layers (IP, tcp, the non existent upper layers of the OSI, etc) that come out of academic conferences every year.

The reason we’re still here is because our current stacks have been heavily optimized and tooled for production workloads. NICs can transparently re-assemble TCP segments for the OS and they can segment before transmit. You have to have a damn good value prop to throw away everything from software and hardware to careers and curriculum. It has to be a shitload better than the security nightmare of “return back each layer of the stack”.

twawaaay · 3 years ago
I don't think you realise why this is so hard.

The basic reason is that software at every level expects TCP/IP. And you can't drop in a translation layer because it will require at least the same amount of overhead as "real" TCP/IP.

It is not a local problem, it is a global problem that affects basically every single piece of non-trivial software in existence.

Even if you construct your datacenter with the new protocol you will run into problems that you can't run anything in it. Want Python? Sorry, have to rewrite it. And every Python library. And every Python application. Then you need to deal with problems that people who can run their scripts on their machines can't run them in datacenter. And so on.

The reason nobody wants to do this is that they would be investing huge amount of money to solve a problem for everybody else. Because the only way to make TCP/IP replacement work is to make it completely free and available to everybody.

There are much better ways to allocate your funds and precious top level engineers that let them distance themselves from competition temporarily.

bsder · 3 years ago
> corporations don't see any "immediate shareholder value", so they sit around happy as pigs in shit with the status quo.

And yet every time hardware designers get the chance they redesign Ethernet and IPv4--poorly.

See: HDMI 2.0+, USB 3.0+, Thunderbolt 3.0+, etc.

My suspicion is that this paper works fine beween pairs of peers and immediately goes straight to hell after that. It is extremely suspicious that there is zero mention of SCTP and only compares to TCP and not UDP.

The problem with RPC is that multiple organizations must agree on meaning. And that's just not going to fly. It is damn near a miracle that a huge number of institutions all agree on the Ethernet/IP command "Please take this bag of bytes closer to the machine named: <string of bytes>."

dooglius · 3 years ago
Why do you say that these protocols are worse than Ethernet/IPv4? I'm not intimately familiar with any at L2/L3, but I don't think any have hacks as bad as ARP. (USB does have some weirdness at L1 though I know.)
idlehand · 3 years ago
Never thought about that before. Ethernet supports extremely high levels of data transmission. ISB C for intrgrated charging ND data transfer makes sense, but why are there HDMI cables?
friendzis · 3 years ago
> It completely ignores that the "port number" abstraction for service identification has completely failed due to the industry glomming onto HTTP as some sort of universal tunnel encapsulation for all application-layer protocols

I think this is more of an artefact of horizontal scaling and port-contention. De-facto standard discovery mechanism DNS does not work with ports, so "well-known port" abstraction kinda fails. Http as tunnel mostly avoids/sidesteps this problem.

> We should have a way to forward every single layer of the stack across each hop, and return back each layer of the stack, so that we can programmatically determine the exact causes of network issues, automatically diagnose them, and inform the user how to solve them.

This is weird take or I don't understand it. If you can communicate with an edge node in another network, but the edge node has issues communicating with some inner node (on your behalf), then, as a user, you have no hope of fixing that connectivity issue anyway, regardless of whether layered approach is used or not. This may be related to previous point about http as universal tunnel. Yes, this is a problem, but in a way that communications are effectively terminated at the edge node and monstrosity of stuff happens behind the scenes

patrec · 3 years ago
> De-facto standard discovery mechanism DNS does not work with ports

Yes, it does, see SRV records.

AtNightWeCode · 3 years ago
I would say ports are mainly a problem on layers below transport even though some tech overuse ports.
simplotek · 3 years ago
> Yes!!! I have been saying for years that lower level protocols are a bad joke at this point, but nobody in the industry wants to invest in making things better. There are so many improvements we could be making, but corporations don't see any "immediate shareholder value", so they sit around happy as pigs in shit with the status quo.

If this was true then how do you explain that the likes of AWS, the same company who ended up investing in developing their own processor line, doesn't seem to agree that none of the pet peeves you mentioned are worth fixing?

emn13 · 3 years ago
It's not obvious to me that replacing TCP really is harder than designing your "own" chip. Scarequotes here because those graviton chips (that's what you're referring to, I think?) are of course ARM chips, so they're not designing something fresh; they're adapting a very mature design to their own needs. In terms of interoperability, a custom chip based on a standard design is probably a simpler, more locally addressable problem than new network protocols.

Isn't it plausible that graviton was designed yet TCP retained simply because graviton as a project is easier to complete successfully?

gjulianm · 3 years ago
> We should have a way to forward every single layer of the stack across each hop, and return back each layer of the stack, so that we can programmatically determine the exact causes of network issues, automatically diagnose them, and inform the user how to solve them. But right now, you need a human being to jump onto the user's computer and fire up an assortment of random tools in mystical combinations and use human intuition to divine what's going on, like a god damn Networking Gandalf. And we've been doing it this way for 40+ years.

I work in a company that builds network troubleshooting/observability tools and we have some pretty experienced analysts to tell you what's wrong with the network. With that context, your idea of having any tool automatically diagnosing network issues is a pipe dream.

The problem with networks is that they're very complex systems, with multiple elements along the way, made by different manufacturers, often with different owners, failures aren't always easily reproducible, and with human configuration (and therefore errors) almost every step of the way. Even if a tool that "returns each layer of the stack" would be useful, it still would be far from enough to diagnose issues.

EricE · 3 years ago
"The problem with networks is that they're very complex systems, with multiple elements along the way, made by different manufacturers, often with different owners" Ah, how people forget the early days of networking. I remember vividly the early days of the Networld/Interop trade show - Interop was in the name because if, as a vendor, your equipment couldn't integrate with the show network they would throw your booth off the show floor.

That's how bad interoperability in the early days was!

duped · 3 years ago
Every major corporation has multiple research organizations doing nothing but invest in things that don't have immediate shareholder value.

What you're talking about though isn't just coming up with new ideas or even new products. It's replacing hundreds of billions in infrastructure wholesale. The scale at which these changes needs to happen to be practical are at the cluster level in a single data center. If you can propose something that fits that bill there are a few companies willing to pay you millions in salary as an engineering fellow to do it.

simplotek · 3 years ago
> What you're talking about though isn't just coming up with new ideas or even new products. It's replacing hundreds of billions in infrastructure wholesale.

I'd put it differently: it's paying up hundreds of billions in infrastructure to have some sort of gain.

And which gain is that exactly?

I see a lot of "the world is dumb but I am smart" comments in this thread but I saw no one presenting any clear advantage or performance improvement claim about hypothetical replacements. I see a lot of "we need to rewrite things" comments but not a single case being made with a clear tradeoff being presented. Every single criticism of TCP/IP in this thread sounds like change for the sake of change, and changes that aren't even presented with tangible improvements in mind or a clear performance gain.

Wouldn't that explain why TCP is so prevalent, and no one in their right mind thinks of replacing it?

arka2147483647 · 3 years ago
> It completely ignores that the "port number" abstraction for service identification has completely failed due to the industry glomming onto HTTP as some sort of universal tunnel encapsulation for all application-layer protocols. And then there's all the non-backend problems!

The paper argues the in '3.1 Stream orientation' section, that stream orientation is a problem for TCP, and says that most apps send messages instead, and the better protocol should handle messages, natively, etc. Which is a good point I think.

But back to TCP. What do you do, if you need to send Messages between applications in TCP? Preferably those Messages would be encrypted also.

You could make up your own protocol, but you probably would rather not! So you use something that is readily available, and does messages, encryption, etc. Would be nice if there were also a ready to use load balancers, caches, tools to debug it, etc

Now, what would be such a protocol.

Why HTTPS, of course.

So I kind of think that the lack of a low level Message Protocol has lead us, as an industry, to coalesce these features bit-by-bit on top of HTTP. It's not perfect by any means, but it does the job.

pclmulqdq · 3 years ago
HTTPS adds a tremendous amount of overhead to give you messaging. It's a lot better from a hyperscaler's perspective to replace TCP and not use the byte stream abstraction. After all, networks send messages. It's silly to throw that away at one layer and try to get it back at the next layer.
ajross · 3 years ago
Surely most of your ideas are already being deployed in QUIC/HTTP3. It just happens inside a UDP datagram, for compatibility. Really you're not going to see any new IP protocol layers, there's too much quirky hardware on the network that wouldn't be able to handle it. If we can't even get IPv6 to work all the way to the client, we're never seeing new values for the protocol byte.
vlovich123 · 3 years ago
Don’t the hyperscaled cloud providers run totally segmented networks? What’s stopping them from using something proprietary internally and just exposing TCP at the end for termination of client connections?
bigDinosaur · 3 years ago
Your ideas are interesting, can you link to or explain a concrete example though? The idea of everything magically debugging itself doesn't apply to a single piece of software I've ever seen, so I'm curious what kind of design would lead to that being possible.
Areading314 · 3 years ago
Heres an example of an improvement to sending large files over long distances -- Tsunami protocol. It tries to get a best of both worlds to limit the detrimental effect of synchronous roundtrips in the TCP protocol for file transfers:

https://tsunami-udp.sourceforge.net/

jstimpfle · 3 years ago
> It completely ignores that the "port number" abstraction for service identification has completely failed due to the industry glomming onto HTTP

If you have ever used multiple TCP or UDP connections in parallel on a single machine (doesn't matter if server or client) then you should realize that ports are required.

Apart from that, you can run HTTP on other ports than 80. You can also use HTTP to load balance or do service discovery by means of redirects. (Caveat, I don't work in this field and can't say how solid the approach works in practice).

robertlagrant · 3 years ago
> There are so many improvements we could be making, but corporations don't see any "immediate shareholder value", so they sit around happy as pigs in shit with the status quo.

This is just not true. Stuff needs to be funded and worth doing, and the internet, like almost everything, is built on making things worth paying for, but there are also loads of improvements everywhere are being made.

fragmede · 3 years ago
What is QUIC in your book?

Given, say, $50 million of dev time, what would you go about fixing? And in what way?

OmarAssadi · 3 years ago
In addition to QUIC, KCP [1] is another reliable low-latency protocol that sits on top of UDP that might be interesting. And unlike RFC 9000/9001 (QUIC), encryption is optional. I haven't really seen it mentioned much outside of primarily China-focused projects, like V2Ray [2], but there is also some English information in their Git repo [2].

[1]: <https://github.com/skywind3000/kcp>

[2]: <https://www.v2fly.org/en_US/>

[3]: <https://github.com/skywind3000/kcp/blob/master/README.en.md>

Luker88 · 3 years ago
IMHO QUIC is nice, but a disappointment, since it could have been so much more.

Does not handle unreliable messages, still only (multi)streaming, no direct support for multicast, 0-rtt which need a lot of stuff to be manually done TheRightWay or risk amplification attacks, the (imho) under-researched (and removed) forward error correction, and more.

I just restarted working on what I consider to be the solution to this, federated authentication and a bit more, but $50M is too far to be even a dream since I am not google.

Areading314 · 3 years ago
Doesn't QUIC still run over TCP? I thought it was a replacement for HTTP not TCP (Edit: looks like it replaces TCP and HTTP)
still_grokking · 3 years ago
> There are so many improvements we could be making, but corporations don't see any "immediate shareholder value", so they sit around happy as pigs in shit with the status quo.

I would affirm that. It's imho true for almost everything in IT tech.

How computers "work" today is just pure madness when looked anyhow closer.

Everything's a result of some "historic accidents" back in the days, and from that the usual race to the bottom caused by market powers.

Nobody is willing to touch any of the lower layers no matter how crazy they are form today's viewpoint. We just shovel new layers on top to paper over the mistakes of the past. Nothing gets repaired, or actually what would be more more important, rethought form the ground up in light of new technological possibilities and changed requirements.

I understand from the economic standpoint how this comes. But I'm also quite sure we didn't make any fundamental improvements in the last 50 years of computing.

That's a very bad sign when everything in a field that's not even really 100 years old is frozen in time since 50 years because everything's so fragile and complex that fundamental changes aren't possible. This looks like a text book example of a house of cards…

Given how vital IT tech is to modern life I fear that this will crash at some point in the worst way possible.

And even if it won't crash, which is really strongly hope, we will never have nice things again as nothing of the old rotten things can be reasonably changed.

KaiserPro · 3 years ago
> We should have a way to forward every single layer of the stack across each hop, and return back each layer of the stack, so that we can programmatically determine the exact causes of network issues

Thats virtual networking. but that introduces latency if its not well configured.

> But right now, you need a human being to jump onto the user's computer and fire up an assortment of random tools in mystical combinations and use human intuition to divine what's going on, like a god damn Networking Gandalf

not really, assuming you have the right fabric, its nowhere near as hard as that. Plus you seem to be forgetting that there is more to the network than TCP. There is a whole physical layer that has lots of semantics that greatly affect how easy it is to debug higher levels.

starfallg · 3 years ago
> We still lack any way to communicate up and down the stack of an entire transaction, for example for debugging purposes. We should have a way to forward every single layer of the stack across each hop, and return back each layer of the stack, so that we can programmatically determine the exact causes of network issues, automatically diagnose them, and inform the user how to solve them. But right now, you need a human being to jump onto the user's computer and fire up an assortment of random tools in mystical combinations and use human intuition to divine what's going on, like a god damn Networking Gandalf. And we've been doing it this way for 40+ years.

This violates the principle of encapsulation that the entire field of networking is based on, not to mention an massive security hole.

Deleted Comment

cdogl · 3 years ago
I’ll defer to experts on the network-layer problems but im not sure what you see as the problem with converging on HTTP. It’s awkward and inelegant, but as an a backend application developer I never feel like it gets in my way.
guenthert · 3 years ago
> It completely ignores that the "port number" abstraction for service identification has completely failed due to the industry glomming onto HTTP as some sort of universal tunnel encapsulation for all application-layer protocols.

Nobody forces them though. It would be much easier to publish a standard port number mapping than to develop a (or multiple) new protocols. Now you just need to motivate people to use it.

peter_retief · 3 years ago
At last hopefully there is light at the end of the tunnel. Big question for me is who is going to build it?

Dead Comment

KaiserPro · 3 years ago
I get where this is coming from, but no. We don't need to replace TCP in the datacentre.

Why?

because for things that are low latency, need rigid flow control, or other 99.99% utilisation case, one doesn't use TCP. (Storage, which is high throughput, low latency and has rigid flow control, doesn't [well ignore NFS and iscisi] use TCP)

Look if it really was that much of a problem then everyone in datacentres would move to RDMA over infiniband. For shared memory clusters, thats what's been done for years. but for general purpose computing its pretty rare. Most of the time its not worth the effort. Infiniband is cheap now, so its not that hard to deploy RDMA[1] type interconnects. Having a reliable layer2 with inbuilt flow control solves a number of issues, even if you are just slamming IP over the top.

shit, even 25/100gig is cheap now. so most of your problems can be solved by putting extra nics in your servers and have fancypants distributed LACP type setups on your top of rack/core network.

The biggest issue is that its not the network that's constraining throughput, it either processing or some other non network IO.

[1]I mean it is hard, but not as hard as implementing a brand new protocol and expecting it to be usable and debuggable.

pclmulqdq · 3 years ago
The reality of today's large datacenters is that almost all of them have almost all of their traffic on TCP unless the owners of the datacenter have made a conscious effort to not use TCP. The highest-traffic applications, usually databases and storage systems, pretty much all use TCP unless you are buying a purpose-built HPC scale-out storage system (like a Lustre cluster). Most people who build a datacenter today use databases or object stores for storage, not Lustre or dedicated fiber channel SANs. On top of that, pub/sub systems all use TCP today, logging tends to be TCP, etc.
KaiserPro · 3 years ago
Fibre channel is dead, long live fibre channel.

I agree a lot of things are on TCP, but I don't think its a massive problem, unless you are running close to the limit of your core network. And one solution to that is to upgrade your core network....

Failing that, implementing some load balancing/partitioning systems to make sure data-processing affinity is best matched. This the better solution, because it yields other advantages as well. But its not the easiest, unless you have a good scheduler

wmf · 3 years ago
You're missing the fact that Stanford is the farm team for Google and Google is hyperscale. At scale, your "just spend more money" solutions are in fact more expensive than creating a new protocol. And like k8s, the new protocol can be sold to startups so they can "be like Google".
KaiserPro · 3 years ago
You're missing the point that maybe, just maybe, I'm part of a team that looks after >5 million servers.

You might also divine that while TCP can be a problem, a bigger problem is data affinity. Shuttling data from a next door rack costs less than one that's in the next door hall, and significantly less than the datacentre over. With each internal hop, the risk of congestion increases.

You might also divine that changing everything from TCP to a new, untested protocol across all services, with all that associated engineering effort, plus translation latency, might not be worth it. Especially as now all your observability and protocol routing tools don't work.

quick maths: a faster top of rack switch is possibly the same cost as 5 days engineering wage for a mid level google employee. How many new switches do you think you could buy with the engineering effort required to port everything to the new protocol, and have it stable and observable?

As a side note "oh but they are google" is not a selling point. Google has google problems half of which are things related to their performance/promotion system which penalises incremental changes in favour of $NEW_THING. HTTP2.0 was also a largely google effort designed to tackle latency over lossy network connections. which it fundamentally didn't do because a whole bunch of people didn't understand how TCP worked and were shocked to find out that mobile performance was shit.

still_grokking · 3 years ago
Google does not use K8s internally.

They never did, they won't ever do that!

K8s does not scale. Especially not to "Google scale".

First step to "be like Google" would be to ditch all that (docker-like) "container" madness and just compile static binaries. Than use something like Mesos to distribute workloads. Build literally everything as custom made on purpose solutions, and avoid mostly anything off the shelf.

"Being like Google" means not using any third party cloud stuff, but build your own in-house.

But this advice wouldn't sell GCP accounts. So Google does not tell you that. They telling you instead some marketing balderdash "how to be like Google".

ksec · 3 years ago
AWS is True HyperScale. Even more so than Google. And yet their spend more money solution on hardware seems to work fine.
TheRealDunkirk · 3 years ago
God, I love it when the talk turns hyper-technical around here, and the Jedi masters turn up.

Dead Comment

amluto · 3 years ago
The paper explicitly addresses Infiniband.
KaiserPro · 3 years ago
not really. they conflate infiniband with RoCE which given they have different semantics on congestion control, I'd say is a bit of a whoopsey.

if they are using RoCE, are they using DCB to avoid loss(well make it "lossless")? the paper implies otherwise.

bayindirh · 3 years ago
IB does not work TCP/IP by default. You can either run TCP over IB, which has a performance penalty, or you can directly run in Ethernet mode, which is something completely different.
colinmhayes · 3 years ago
> The biggest issue is that its not the network that's constraining throughput, it either processing

To be fair the paper talks a bit about how TCP makes multithreading slower compared to a message based system.

counttheforks · 3 years ago
> Storage, which is high throughput, low latency and has rigid flow control, doesn't [well ignore NFS and iscisi] use TCP)

So storage doesn't use TCP, except for the protocols that are actually used, which do use TCP?

KaiserPro · 3 years ago
Depends on what you are using, for connecting block stores, you'll use some sort of fabric. That is Fibre channel, SAS, NVME over something or other

If you are using GPFS, then you can do stuff over IB, but I don't know how that works. Lustre I imagine does lustre things over RDMA.

For everything else, NFS all the things. pNFS means that you can just throw servers at the problem and let the network figure it out.

But again, if IO speed is critical, you move IO over to a dedicated fabric of somesort. for most thing NFS is good enough. (except databases, its possible but not great. but then depending on your docker setup, you might be kneecaping your performance because overlayfs is causing io amplification)

josephg · 3 years ago
> The data model for TCP is a stream of bytes. However, this is not the right data model for most datacenter applications. Datacenter applications typically exchange discrete messages to implement remote procedure calls

This isn't just a datacenter problem. Every single network protocol I've ever created or implemented is message based, not stream based. Every messaging system. Every video game. Every RPC transport.

But, because we can't have nice things, message framing has to be re-implemented on top of TCP in a different, custom way by every single protocol. I've basically got message framing-over-TCP in muscle memory at this point, in each of the variants you commonly see.

The only kinda-sorta exceptions I know about are HTTP/1.1 and telnet. But HTTP/1.1 is still a message oriented protocol; just with file-sized messages. (And even this stops being true with http2 anyway).

In my opinion, the real problem is the idea that "everything is a file". Byte streams aren't a very useful abstraction. "Everything is a stream of messages" would be a far better base metaphor for computing.

yaantc · 3 years ago
SCTP [1] is there to provide a reliable message based protocol. And it does work inside a datacenter. The issue is outside the datacenter: it doesn't work reliably across the Internet due to middle boxes dumping anything not TCP or UDP...

But inside a controlled environment like a datacenter, it works. It's been used in the telecommunication world to carry control messages in the radio access and core networks for example. So it's been tested at scale for critical applications.

[1] https://en.wikipedia.org/wiki/Stream_Control_Transmission_Pr...

cryptonector · 3 years ago
> The only kinda-sorta exceptions I know about are HTTP/1.1 and telnet. But HTTP/1.1 is still a message oriented protocol; just with file-sized messages. (And even this stops being true with http2 anyway).

No, HTTP/2 and QUIC do not change the semantics of HTTP.

Also, you can have endless streams with HTTP/1.1: just use chunked encoding to POST/PUT and use Range: bytes=0- and chunked encoding for GET and chunked encoding for POST response bodies. In HTTP/2 there's only the equivalent of chunked encoding -- there's no definite content length in HTTP/2.

josephg · 3 years ago
Correct me if I’m wrong, but doesn’t h2 still break up requests and responses into smaller message frames in order to do multiplexing?

Those message frames are what I’m talking about - as I understand it, they are, yet again, a message oriented protocol layered on top of tcp.

amluto · 3 years ago
Once chunked encoding is in the picture, even HTTP/1.1 sends messages, not streams, under the hood.
MisterTea · 3 years ago
> In my opinion, the real problem is the idea that "everything is a file".

Files are just an indexed list of bytes that can represent anything. Think of them as objects.

> Byte streams aren't a very useful abstraction. "Everything is a stream of messages" would be a far better base metaphor for computing.

I don't understand this. A stream of messages is a stream of bytes. Its bytes all the way down.

Aissen · 3 years ago
Because most protocols can handle message loss, with retransmit and proper ordering ? And we haven't started to talk about congestion yet… TCP is useful, and while I'd like to see a message get rid of one (or more) of those constraints to go with a a custom protocol, I feel like they'd be re-implementing the features in the end because these are very useful properties to have…

Edit: the proposal in the article is actually quite, sensible, but requires redesigning your apps… And I'd like to see how it performs: TCP is a hugely optimized beast (when it works well), with hardware offloads, kernel optimizations, etc.

Dead Comment

ithkuil · 3 years ago
Even TCP itself uses discrete messages under the hood :-)
josephg · 3 years ago
Hah yes; although TCP frames can be arbitrarily refragmented and rejoined as they travel through the network before reaching your destination.

If you ever play an indie game which seems unusually janky over wifi, its probably because the code isn't correctly rejoining fragmented network packets at the application level. Ethernet is remarkably well behaved in this regard. Wifi is much better at shaking out buggy code.

pgorczak · 3 years ago
Finding the right abstraction isn’t easy in a network stack. TCP’s is less useful for application logic but it reflects the way the protocol works internally. Bunch of Bytes goes in on one side, Bytes stream out on the other side in chunks whose size depends on congestion, physical layer and other facts. A message based API hides these facts or leaks them depending on how you look at it.
bruce343434 · 3 years ago
What’s the issue with fitting messages in streams?
bheadmaster · 3 years ago
One of the issues I can think of is head-of-line blocking [0].

If you're sending messages of different priorities over the same channel, an error in sending a low-priority message, high-priority messages will have to wait until the low-priority message is properly re-transmitted.

[0] https://en.wikipedia.org/wiki/Head-of-line_blocking

https://en.wikipedia.org/wiki/Head-of-line_blocking

imtringued · 3 years ago
I don't bother implementing that crap anymore. I just websockets which are message oriented out of the box.
langsoul-com · 3 years ago
What do you make to require implementing or making network protocols?
josephg · 3 years ago
Multiplayer video games, realtime updating webpages, database bindings, sync protocols (I play around around with CRDTs a lot), p2p distributed systems, whatever really!

Implementing a wire protocol seems to come up about once every couple of years. And for context, I've been programming now for about 30 years.

Luker88 · 3 years ago
A few years ago (when QUIC was coming out) I was developing the theory of a new transport/encryption/authentication protocol. The focus was as much on transport as in the built-in federated authentication.

There was not much interest in the field and I had a lot of the theory and formal proofs, but no implementation.

This month I found a cofounder and we are reordering a lot of the information and presentation, we should start asking for funds in more or less a month.

I still believe my solution to be much more complete than anything in use today (again: on paper), but since there seems to be some interest today, I'll ask here: Can anyone suggest some seed funds to check for a starttup? We will be based half US, half EU.

For more details, fenrirproject.org (again: old stuff there, ignore the broken code)

mikepurvis · 3 years ago
Very interesting stuff, but not going to lie, I have trouble imagining how you'd build a viable company around this kind of thing.

Infrastructure/protocol companies are always going to be a very tough sell (Sandstorm) unless there's a compelling freemium model like with GitLab, Cloudbees, Sentry, etc.

Luker88 · 3 years ago
The infrastructure/protocol will need to remain open, since this kind of thing works better the bigger the user base is. it will probably spin off to its own foundation as soon as it is viable.

The income will come from another project built directly on this, on managing the domain and its users/devices, plus other stuff, mainly for businesses.

I don't see much need to go into details right now, but we have a clear distinction in mind between what is infrastructure and what will be the product.

Again, still in the housekeeping phase, just looking for potential future funds once we finish this phase

kanwisher · 3 years ago
Why don’t you actually build gasp a prototype before asking for money
Luker88 · 3 years ago
yeah, thank you for the kind comments about not needing money (aka: my time as no value) and asking me to build the prototype with irony.

As I said, the project was started a few years back, and since I did not have the time to work on it, maybe it means my life does not give me the time and money to build this on the side.

But I'll always find it funny how half of the people go "you need to have solid theory proofs before" and the other half goes "where is the working code".

As I said we just started some housekeeping and are not ready to start, and as many point out, it's hard to make money on infrastructure. I know, I did not ask how to make money on this. The idea is to keep the base as open as possible and make money on other service built on this. I was only asking on pointers to funds interested on tech loosely connected to this. And if they don't like our current state or something else fine, no need for you to do their job, and in a witty way, too.

throwaway41597 · 3 years ago
Depending on the scope, it's not always possible to self-fund while starting a project.

Deleted Comment

coder543 · 3 years ago
I'm just going to mention that NATS can be used as a general purpose transport, with encryption and a surprisingly capable authentication and authorization system. It also supports federating into clusters and superclusters. NATS has also come a long way in the last several years, in case anyone is thinking of some experiences they had years ago when it didn't have all these features.

The question would have to be "what does your idea/project offer that NATS doesn't already offer?"

I have no affiliation with NATS, but I wish that people were paying more attention to it. It solves a lot of problems people have.

hknmtt · 3 years ago
the thing you want to make requires ZERO money.
tptacek · 3 years ago
"We hypothesize that flow-consistent routing is responsible for virtually all of the congestion that occurs in the core of datacenter networks".

Flow-consistent routing is the constraint that packets for a given TCP 4-tuple get routed through the same network path, rather than balanced across all viable paths; locking a flow to a particular path makes it unlikely that segments will be received out of order on the destination, which TCP handles poorly.

hinkley · 3 years ago
Or, by sending the traffic over all routes, there is no way to keep one server from monopolizing all traffic, because each route is oblivious to the stress currently being experienced by all its peers. It has to set a policy using local data, not global data.

The usual failure mode for clever people thinking about software is taking their third person omniscient view of the system status and thinking they can write software that replicated what a human would do in that situation. We are still so very far from human level intuition and reasoning.

wmf · 3 years ago
Ultimately one server cannot inject more than one link worth of traffic (e.g. 100 Gbps) into the network which is a tiny fraction of total capacity. Researchers have gotten really good results with "spray and pray" for sub-RTT flows combined with latency and queue depth feedback for multi-RTT flows.
throwaway892238 · 3 years ago
And we could totally construct systems that take some approximation of a global internet state into local routing decisions. But that might devalue some incumbent player's position in the market (or create a new privileged set of players) so even if we made a POC, it wouldn't get adopted.
ghshephard · 3 years ago
This is true, and the congestion mentioned here was subtle and not called out - typically flows are handled in a stateless manner by load balancers that hash on some set of MAC/IP/PORT features of the packet. This is where congestion occurs and the paper mentions it here:

    All that is needed for congestion is for two large flows
    to hash to the same intermediate link; this hot spot will persist 
    for the life of the flows and cause delays for any other
    messages that also pass over the affected link.
It makes logical sense, but I'd love to see the evidence for this.

topranks · 3 years ago
“Elephant” flows are a definitely a thing.

It all depends on the application and overall use in of the network.

With sufficient flows and a mix of sizes it’ll still tend to even out. But if you’ve significant high-throughout, long lived flows this is definitely something you might hit.

mlerner · 3 years ago
I wrote a summary of one of the approaches for replacing TCP mentioned in the paper (called Homa) here: https://www.micahlerner.com/2021/08/15/a-linux-kernel-implem...
defrost · 3 years ago
Jumping to the end:

> TCP is the wrong protocol for datacenter computing.

> Every aspect of TCP’s design is wrong: there is no part worth keeping.

I cannot disagree and Ousterhout argues well.

> Homa offers an alternative that appears to solve all of TCP’s problems.

I'm well behind the curve on protocols and now I have something to learn more about.

> The best way to bring Homa into widespread usage is integrate it with the RPC frameworks that underly most large-scale datacenter applications.

More or less the case for whatever replaces TCP in a tight computing warehouse setup.

colechristensen · 3 years ago
> Every aspect of TCP’s design is wrong

The driver of most of a global network of computers which has been wildly successful beyond dreams before it was real… probably deserves a better deal than “every aspect is wrong”. It has worked fanatically well and chasing the long tail of performance improvements isn’t equivalent to determining what has gotten us here is wrong.

teraflop · 3 years ago
You're cherry-picking an interpretation of a single sentence, when it should be read in the context of the preceding one: Ousterhout says every aspect of TCP's design is wrong for (modern) datacenter computing. He's not saying bad decisions were made at the time it was designed, nor even that it's badly designed for other use cases today.

The first few paragraphs of the article give even more context:

> The TCP transport protocol has proven to be phenomenally successful and adaptable. [...] It is an extraordinary engineering achievement to have designed a mechanism that could survive such radical changes in underlying technology.

> However, datacenter computing creates unprecedented challenges for TCP. [...] The datacenter environment, with millions of cores in close proximity and individual applications harnessing thousands of machines that interact on microsecond timescales, could not have been envisioned by the designers of TCP, and TCP does not perform well in this environment

hinkley · 3 years ago
My Distributed Computing professor said, “now we are going to discuss why Ethernet is a terrible protocol but we use it anyway.”

Like democracy, everything else we’ve tried is even worse.

petesergeant · 3 years ago
"Specifically, Homa aims to replace TCP, which was designed in the era before modern data center environments existed. Consequently, TCP doesn’t take into account the unique properties of data center networks (like high-speed, high-reliability, and low-latency). Furthermore, the nature of RPC traffic is different - RPC communication in a data center often involve enormous amounts of small messages and communication between many different machines."[0]

0: https://www.micahlerner.com/2021/08/15/a-linux-kernel-implem...

Spivak · 3 years ago
Everything about the protocol being wrong for the specific case of machines directly wired to one another over a high speed reliable network is not an admonishment of the protocol in general. And the protocol, being an abstract concept, doesn’t have feeling to hurt.
curious_cat_163 · 3 years ago
"Although Homa is not API-compatible with TCP, it should be possible to bring it into widespread usage by integrating it with RPC frameworks."

I was about to rant that Prof. Ousterhout should just deploy some of his students and get that transport - RPC integration done and prove out his point. But, then I tried to look for it first and found this:

https://www.usenix.org/system/files/atc21-ousterhout.pdf

Has anybody tried it in an actual data-center?