Why TCP needs 3 handshakes

TL;DR: "Because if TCP handshake were 2-way only, the receiver couldn't confirm if it could send packets successfully or sender could receive them".

And that sounds bogus to me. Connection initiation isn't about testing if packets can reach or not. It's about acknowledgement, building a two peer consensus about how to handle upcoming packets from a peer, not reliability checks. In that sense, I don't even think the third step is necessary, but apparently, it's needed to handle the case of both endpoints going into a timeout loop, this article explains it perfectly: https://www.baeldung.com/cs/handshakes

f1shy · a year ago

Thanks for the link with the correct explanation:

Particularly, the two-way handshake presents potential problems when the ACK message from the server delays too much. Thus, if a connection timeout occurs, the client sends another SYN message with a new sequence number (Z, for example) to the server. However, if the server previously sent an ACK (which is delayed), it’ll discard this new SYN message. The client, in turn, receives the delayed ACK and assumes that it refers to the last sent SYN message. Here’s where the error happens: the client will send messages with the sequence number Z, while the server expects messages following the sequence number X.

tsimionescu · a year ago

This is a bogus explanation, one that has nothing to do with TCP.

In TCP every segment has a SEQ number and an ACK number, regardless of it being a handshake segment or a data segment. This completely negates the described problem: the server's first response to a SYN includes "SEQ=Y, ACK=X". If the client which just sent "SYN seq=Z" receives the ACK for SEQ=X, it will drop this ACK and wait for a new one. Also, a server which receives a new SYN has no reason to drop it, it will send a new ACK instead.

Deleted Comment

zokier · a year ago

That is possibly even more wrong explanation. Hosts do not blindly assume what is getting acknowledged by ACKs

dcuthbertson · a year ago

Thanks for explaining the why. I'm glad it falls somewhere between the X and the Z.

phicoh · a year ago

The thing the consider is who is the first to send data over the connection.

If it is the client, then a two-way handshake is enough: client sends a SYN, server sends a SYN+ACK, then the clients data which ACKs the server's SYN. These days that is the most common model. HTTP and TLS work like that.

However, what if the server sends first? For example the banner of SMTP. In that case the client sends a SYN, the server sends a SYN+ACK, the client sends an ACK and only then the server start sending data.

In general, an operating system doesn't know what is going to happen. So the client's kernel will just send the ACK immediately even if it will be followed by a data segment shortly after.

toast0 · a year ago

You can only reach consensus if the peers can each receive packets that the other sent.

So reaching consensus and testing for reachability in both directions are the same.

In a client/server scenario, the client knows the connection is good when it receives the SYN+ACK, but the server doesn't know until it receives the resulting ACK. So the third packet is necessary to communicate consensus to the server; it doesn't need to be a pure ACK though, it can have data, if the client's stack makes it possible to queue outgoing data before the SYN+ACK is received.

sedatk · a year ago

> You can only reach consensus if the peers can each receive packets that the other sent. So reaching consensus and testing for reachability in both directions are the same.

No, that's not the intent of a handshake. Assume a hypothetical Internet where every node has guaranteed connectivity to every other node that never fails. Do we suddenly lose the need to do a 3-way handshake? No. It's not about testing connectivity, that's semantically wrong. And what's the meaning of testing connectivity for a connection of an arbitrary length and with a quality of unknown degree throughout? It doesn't make any sense.

There is no "knowing the connection is good", there is a process of building up consensus. The peers are only interested in an answer to this question: "Does the other party assume a connection?".

If we knew the answer to that question beforehand, we wouldn't need a handshake at all. Reachability, transmissibility are all irrelevant. And, UDP actually works like that. Both endpoints assume willingness to connect. That's why you don't need a handshake with UDP.

TCP-FO worked liked that too, removing the need for a TCP handshake completely, because it could persist the consensus information.

m463 · a year ago

without a 3-way handshake, wouldn't it be easier to do spoofing? or a man-in-the-middle attack?

I think now there are ways to do the 3-way handshake in hardware at hardware speeds, and only involve software if the connection has been vetted. This can protect against Denial-Of-Service attacks.

tsimionescu · a year ago

There is no significant difference between a two-way handshake and a three-way handshake, given the other parts of the TCP protocol, IF the client is the one that sends the first piece of data after the handshake. It is in fact very common for optimized hosts to send data with the "third part of the handshake", which makes it perfectly equivalent to using a two-way handshake. This happens because the first segment the client sends after the handshake must ACK the server's sequence number, regardless of whether this is "the third part of the handshake" or "the first data packet after a two-way handshake".

The problem appears if the server would like to be the first to send data on a new connection. If the server included its data with the SYN-ACK, that would work perfectly well with benign clients, but it would be a vector for DoS attacks. An attacker could send a small packet with a forged source IP, and cause the server to send a large response to the victim's IP. So, the server can't safely send data until it receives an ACK with its secret SEQ number from the client.

toast0 · a year ago

> I think now there are ways to do the 3-way handshake in hardware at hardware speeds, and only involve software if the connection has been vetted. This can protect against Denial-Of-Service attacks.

Do we need hardware for this? Syncookies have provided a software method to handle large volumes of inbound syn without memory restrictions since the late 90s, and it's been in all major platforms except Mac since the late 2000s; Apple forked FreeBSD's tcp months before FreeBSD added syncookies, and last I checked, Apple never pulled them in. My testing is a bit old, but I had more trouble generating line rate syns at 2x10g than handling them several years ago. Are syncookies in software enough at 100g? I'm not sure, but I'd assume so. There's plenty of things to hardware accelerate on a NIC, but syn handling doesn't seem worthwhile IMHO.

sedatk · a year ago

No and no. IP spoofing was already trivial with 3-way handshake. That's why random TCP sequence numbers were introduced. It's now harder, but for some scenarios, SYN cookies might also be needed.

yencabulator · a year ago

Nobody's going to bother with new stuff for better spoof protection of TCP handshakes when TLS removes the actual attack and HTTP/3 obsoletes the whole mechanism.

zokier · a year ago

Yeah, as a proof it is dubious as it doesn't really define what "established" means in this context.

The baeldung article is just wrong

> The client, in turn, receives the delayed ACK and assumes that it refers to the last sent SYN message

ACKs have acknowledgement numbers, so this sort of confusion can not happen.

There's this interesting comment by "John Day" on that page, does anyone have more context/detail?

whycombagator · a year ago

No, but I found the comment more interesting after learning what his background is (based on the name/email left in the comment):

> John Day has been involved in research and development of computer networks since 1970, when his group at the University of Illinois was the 12th node on ARPANet (precursor to the Internet) and has developed and designed protocols for everything from the data link layer to the application layer. Also making fundamental contributions to research on distributed databases. He managed the development of the OSI reference model, naming and addressing, and a major contributor to the upper-layer architecture. He was a major contributor to the development of network management architecture, working in the area since 1984 and building and deploying LAN products and a network management system, a decade ahead of comparable systems. Mr. Day has published Patterns in Network Architecture: A Return to Fundamentals (Prentice Hall, 2008), which has been characterized (embarrassingly) as “the most important book on network protocols in general and the Internet in particular ever written.” The book analyzes the fundamental flaws in the Internet and proposes what appears to be the only path forward. Today Mr. Day splits his time between making this new path a reality and teaching at Boston University. Mr. Day is also a recognized scholar in the history of cartography focusing on 17thC China, and is past President of the Boston Map Society.

despair3435 · a year ago

You can find the original paper by Watson online which explains it in more detail. The 3 way handshake is in fact not necessary. I believe the delta-t protocol was one of the available protocols in OSI as well. TCP/IP being the standard now is not due to the fact that it was technically the best. In fact, there are multiple shortcomings.

The delta-t protocol is also used in RINA, which was invented by John Day. It is also used in Ouroboros (https://arxiv.org/pdf/2001.09707), and I can confirm it works. ;)

tonyg · a year ago

The actual delta-t protocol spec has historically been quite hard to find, but is freely available from here: https://www.osti.gov/biblio/5542785

Also related and of interest in this connexion: CurveCP and its handshake, https://curvecp.org/packets.html

Joel_Mckay · a year ago

There were several competing standards on the early networks, and almost every ambitious commercial entity wanted to embed their licensed IP into the webs core transport layer or lower on the OSI stack.

We take for granted the inter-connectivity of most modern equipment, but to this day companies still try to create synthetic technology monopolies to cash-in. i.e. to sustain a tenuous service commodity out of something that has essentially been free since the mid 1990s.

https://xkcd.com/927/

Philosophically it doesn't matter TCP is imperfect, but rather that the inter-connectivity is compatible with the inertia of the installed infrastructure.

One can indeed optimistically ignore the TCP connection drop and syn part of standards to tunnel/reverse-proxy though certain censorship firewalls... but it still does not make it safe for the people that live under such regimes.

Does this make it more or less clear? =3

Bluecobra · a year ago

> We take for granted the inter-connectivity of most modern equipment, but to this day companies still try to create synthetic technology monopolies to cash-in. i.e. to sustain a tenuous service commodity out of something that has essentially been free since the mid 1990s.

This is why as a network engineer I always advocate for open standards everywhere I can to avoid vendor lock-in. The classic one is using OSPF instead of EIGRP on Cisco routers (or their other proprietary protocols). Nowadays this is much tricker with cloud computing and black box stuff like SDN/SDWAN.

Deleted Comment

veblen · a year ago

Imagine two blind people who want to have a conversation. Before they start, each person needs to ensure that the other can both speak and hear. Typically, one person begins by asking, 'Can you hear me?' to check if the other can hear them. The second person responds with 'Yes,' confirming that they can hear. Then, the second person asks, 'Can you hear me?' and the first person replies, 'Yes,' completing the process.

In total, there are four exchanges (two questions and two answers). However, if you look closely, the second person's reply of 'Yes' already confirms that they can both hear and speak. Therefore, the second 'Can you hear me?' is unnecessary. With just three exchanges (one question and two answers), both people know that they can send and receive messages.

blamarvt · a year ago

What if the first blind person was also deaf and just trolling the other blind person so wouldn't the second "Can you hear me?" be needed?

colejohnson66 · a year ago

Then you set the “evil” bit

kbmr · a year ago

Troll Control Protocol

Actually, the second answer is also unnecessary. The conversation can go like this:

A: Can you hear me?

B: Yes

A: What time is it?

B: 5 o'clock

A: Thank you, goodbye!

B: Goobye!

Nothing is lost compared to:

A: Yes

[...]

arder · a year ago

The problem is the other way around.

A: Can you hear me? B: Yes B: What time is it? A: ...

At the point that B has replied Yes, B knows that it can hear A and that it can send to A but it doesn't know that A can hear B. As long as A makes the first move in the rest of the conversation that's fine - the next message from A confirms that B's "Yes" was received, but if A has nothing to say then B has to send it's next query and hope that A received the Yes successfully. If it didn't then B thinks the connection is established but it actually hasn't been.

rishav_sharan · a year ago

Isn't the whole point here that handshakes are cheaper compared to the actual content? If you use the content as the handshake itself, you can end up with huge content only to find out that the conversation didn't work.

That conversation can also go like;

A: What is the full unabridged story of War and Peace?

B: <...>

A: I am going to a take a nap for now, goodbye!

B: ... Goodbye!

lnenad · a year ago

TCP is two way and in your example B has no idea that A can receive the messages it is sending. Example: What if B needs to ask A about the date in the same conversation, it doesn't know for sure it would get a reply (it can try but that's not TCP then).

dullcrisp · a year ago

Well there the “what time is it?” serves as the “great, I can hear you too!” but I take it that the point is that B needs some reply from A to know that they can hear them.

GoatOfAplomb · a year ago

You could also just start with "What time is it?" and see what you get back, right?

stiglitz · a year ago

TCP Fast Open does you one better:

A: If you can hear me, what time is it?

B: Yeah I can hear you; it’s 5.

Pikamander2 · a year ago

> Theoretically, even more than three handshakes would not guarantee a "completely reliable" TCP connection. However, through three handshakes, it can at least be confirmed that the connection is "basically usable." Increasing the number of handshakes would merely increase the confidence level in the "connection availability."

This sounds like a variation of the Two Generals' Problem: https://en.wikipedia.org/wiki/Two_Generals%27_Problem

stavros · a year ago

Kind of, but not exactly. The article treats the channels as immutable, ie you can tell whether a channel is working or not by sending one packet. In this assumption, you'd have to send three packets for both sides to discover if all four ways work (server send, server receive, client send, client receive), but after that you'd need no more assurance.

In the two generals problem, the channel can fail at any time (which is what happens in real life), so no amount of handshakes can assure you. Because of the above, I don't agree with their conclusion that more handshakes is better. Either you assume immutable channels, so you only need three, or mutable channels that can fail any time, so you need infinite.

cortesoft · a year ago

> Either you assume immutable channels, so you only need three, or mutable channels that can fail any time, so you need infinite.

This is true if you only care about having 100% confidence. Sending more handshakes allows you to do statistical analysis to give you more confidence that the connection is reliable.

im3w1l · a year ago

Well it's quite common to assume a channel has failed if its inactive too long, with periodic keep-alive messages to ensure it doesn't go inactive in the case where there is nothing to say.

userbinator · a year ago

I've always found the bitfields in the TCP header to be either too loosely defined or the standard insufficiently lax, since there are many combinations that make theoretical sense but don't work well or at all in practice. E.g. SYN+ACK+FIN+DATA would be perfect for a very short-lived connection that transfers a single segment. IMHO they should've either just used a "state" field that enumerates all possible states a packet is intended to have, or usefully defined all the possible combinations of the flags.

krackers · a year ago

gorfian_robot · a year ago

back in the day we had some chats with Vint Cerf during the development of Delay-tolerant networking (DTN) for primarily for use in space scenarios (though there are other scenarios). no way SYN, SYN-ACK, ACK type was gonna cut it. I found a light overview here: https://www.quantamagazine.org/vint-cerfs-plan-for-building-...

stevecalifornia · a year ago

Pilots also do a 3 handshake exchange when handing over control to the other pilot.

Pilot: "You have the plane." Co-Pilot: "I have the plane." Pilot: "You have the plane."

The third statement is the pilot acknowledging that the transfer of control has completed. Without it, the co-pilot doesn't positively know that the pilot knows that control has been handed over.

As an aside, Air France 447 was a crash where the co-pilot was pulling back on the stick without the captain knowing. The captain couldn't understand why his controls weren't having the intended effect. Both pilots were making inputs, unaware of each other.