I've just spent the last month learning exactly why I definitely do want a TCP over TCP VPN. The short answer is almost every cloud vendor assumes you're doing TCP, and they've taken the "unreliable" part of UDP to heart. It is practically impossible run any modern VPN on most cloud providers anymore.
Over the last month, I've been attempting to set up a fast Wireguard VPN tunnel between AWS and OVH. AWS killed all internet access on the instance with zero warning and sent us an email indicating that they suspected the instance was compromised and being used as part of a DDOS attack. OVH randomly performs "DDOS mitigation" anytime the tunnel is under any load. In both cases we were able to talk to someone and have the issue addressed, but I wanna stress: this is one stream between two IPs -- there's nothing that makes this anything close to looking like a DDOS. Even after getting everything properly blessed, OVH drops all UDP traffic over 1 Gbps. It took them a month of back-and-forth troubleshooting to tell us this.
The really terrible part is "TCP over TCP is bad" is now so prevalent there's basically no good VPN options for it if you need it. Wireguard won't do it directly, but there's hacks involving udp2raw. I tried it, and wasn't able to achieve more than 100 Mbps. OpenVPN can do it, but is single-threaded and won't reasonably do more than 1 Gbps without hardware acceleration, which didn't appear to work on EC2 instances. strongSwan cannot be configured to do unencapsulated ESP anymore -- they removed the option -- so it's UDP encapsulated only. Their reasoning is UDP is necessary for NAT traversal, and of course everybody needs that. It's also thread-per-SA so also not fast. The only solution I've found than can do something not UDP is Libreswan, which can still do unencapsulated ESP (IP Protocol 50) if you ask nicely. It's also thread-per-SA, but I've managed to wring 2 - 3 Gbps out of a single core after tinkering with the configuration.
For the love of all that's good in the world, just add performant TCP support to Wireguard. I do not care about what happens in non-optimal conditions.
The whole point of this article is that performant Wireguard-over-TCP support in Wireguard simply does not work. You're not fighting the prevalence of an idea, you're fighting an inherent behavior of the system as currently constituted.
In more detail, let's imagine we make a Wireguard-over-TCP tunnel. The "outer" TCP connection carrying the Wireguard tunnel is, well, a TCP connection. So Wireguard can't stop the connection from retransmitting. Likewise, any "inner" TCP connections routed through the Wireguard tunnel are plain-vanilla TCP connections; Wireguard cannot stop them from retransmitting, either. The retransmit-in-retransmit behavior is precisely the issue.
So, what could we possibly do about this? Well, Wireguard certainly cannot modify the inner TCP connections (because then it wouldn't be providing a tunnel).
Could it work with a modified outer TCP connection? Maybe---perhaps Wireguard could implement a user-space "TCP" stack that sends syntactically valid TCP segments but never retransmits, then run that on both ends of the connection. In essence, UDP masquerading as TCP. But there's no guarantee that this faux-TCP connection wouldn't break in weird ways because the network (especially, as you've discovered, any cloud provider's network!) isn't just a dumb pipe: middleboxes, for example, expect TCP to behave like TCP.
Good news (and oops), it looks like I've just accidentally described phantun (and maybe other solutions): https://github.com/dndx/phantun I'd be curious if this manages to sidestep the issues you're seeing with AWS and OVH.
> The retransmit-in-retransmit behavior is precisely the issue.
But you're concerned about an issue I do not have. In practice retransmits are rare between my endpoints, and if they did occur poor performance is acceptable for some period of time. I just need it to me fast most of the time. To reiterate: I do not care about what happens in non-optimal conditions.
But IP over TCP is in principle non-performant. There's no (non-evil) magic Wireguard could perform to get around that.
Adding TCP support to Wireguard would add a whole bunch of complexity that it doesn't need – for a very niche use case (i.e. where you absolutely have to get an IP VPN to work over a restrictive firewall).
> Wireguard won't do it directly, but there's hacks involving udp2raw.
Which significantly does not do UDP over TCP in the problematic sense (it just masquerades UDP as TCP, without providing a second set of TCP control loops on top of the first one).
> AWS killed all internet access on the instance with zero warning and sent us an email indicating that they suspected the instance was compromised and being used as part of a DDOS attack.
It makes no sense for that to be due to Wireguard usage, though (not saying I don't believe you that it happened, just their explanation or your assumption of their motivation seems strange). Things like Tailscale use Wireguard and should be common enough for AWS to know about them by now, I'd assume?
Worst case, can't you run a minimal turn server and have TCP over Wireshark/UDP over turn/tcp?
For a site to site VPN, something where you use transparent proxying at the routers to turn TCP into TCP over SOCKS (over TLS) might work. TCP proxying with 1:1 sockets avoids most of the issues with TCP over TCP, at the expense of needing to keep socket buffers at the proxy hosts.
We've run Wireguard tunnels that max out at 1 Gbps in AWS for years with no issues (on the AWS side, anyways). It seems like things get hairy once you want to do more than that.
I did not. I'm not terribly familiar with it, but it doesn't look like I can do general routing with it, right? My end goal is to route between two subnets.
> strongSwan cannot be configured to do unencapsulated ESP anymore -- they removed the option
wait, what? Pretty sure I still used unencapsulated ESP a few months ago… though I wouldn't necessarily notice if it negotiates UDP after some update I guess… starts looking at things
Edit: strongswan 6.0 Beta documentation still lists "<conn>.encap default: no" as config option — this wouldn't make any sense if UDP encapsulation was always on now. Are you sure about this?
Sorry, I misremembered the issue. Looking at my notes the issue is they don't allow disabling their NAT-T implementation, which detects NAT scenarios and automatically forces encapsulation on port 4500/udp. The issue is that every public IP on an EC2 instance is a 1:1 NAT IP. Every packet sent to the public IP is forwarded to the private IP -- including ESP -- but it is technically NAT and looks like NAT to strongSwan.
There's an issue open for years; it will probably never be fixed:
If you are in a situation where you have to anyway, you can use multiple TCP sockets and round robin them (with Nagle off) such that you are always sending just one packet over each. You'll get overhead and some unneeded acks, but no front of line blocking of the second layer of TCP mechanics going on.
I notice that the earliest version of this post[0] is dated 1999, whilst the latest version is modified in 2001 (see the main link). Which year would be appropriate to mark it on HN? 1999? 2001?
Port forwarding is TCP "next to" TCP, so that's fine, yes!
It can even be beneficial in some cases: If a host has an old/bad TCP stack not able to deal well with some network situation (latency, packet loss, you name it), port forwarding from a closer/less affected host can resolve the issue.
Happened to me once for the terrible old eBook delivery server from my public library when a continent away: It handled the long latency so poorly, a 30 MB download would have taken two hours. SSH forwarding brought that down to seconds.
Over the last month, I've been attempting to set up a fast Wireguard VPN tunnel between AWS and OVH. AWS killed all internet access on the instance with zero warning and sent us an email indicating that they suspected the instance was compromised and being used as part of a DDOS attack. OVH randomly performs "DDOS mitigation" anytime the tunnel is under any load. In both cases we were able to talk to someone and have the issue addressed, but I wanna stress: this is one stream between two IPs -- there's nothing that makes this anything close to looking like a DDOS. Even after getting everything properly blessed, OVH drops all UDP traffic over 1 Gbps. It took them a month of back-and-forth troubleshooting to tell us this.
The really terrible part is "TCP over TCP is bad" is now so prevalent there's basically no good VPN options for it if you need it. Wireguard won't do it directly, but there's hacks involving udp2raw. I tried it, and wasn't able to achieve more than 100 Mbps. OpenVPN can do it, but is single-threaded and won't reasonably do more than 1 Gbps without hardware acceleration, which didn't appear to work on EC2 instances. strongSwan cannot be configured to do unencapsulated ESP anymore -- they removed the option -- so it's UDP encapsulated only. Their reasoning is UDP is necessary for NAT traversal, and of course everybody needs that. It's also thread-per-SA so also not fast. The only solution I've found than can do something not UDP is Libreswan, which can still do unencapsulated ESP (IP Protocol 50) if you ask nicely. It's also thread-per-SA, but I've managed to wring 2 - 3 Gbps out of a single core after tinkering with the configuration.
For the love of all that's good in the world, just add performant TCP support to Wireguard. I do not care about what happens in non-optimal conditions.
/rant
In more detail, let's imagine we make a Wireguard-over-TCP tunnel. The "outer" TCP connection carrying the Wireguard tunnel is, well, a TCP connection. So Wireguard can't stop the connection from retransmitting. Likewise, any "inner" TCP connections routed through the Wireguard tunnel are plain-vanilla TCP connections; Wireguard cannot stop them from retransmitting, either. The retransmit-in-retransmit behavior is precisely the issue.
So, what could we possibly do about this? Well, Wireguard certainly cannot modify the inner TCP connections (because then it wouldn't be providing a tunnel).
Could it work with a modified outer TCP connection? Maybe---perhaps Wireguard could implement a user-space "TCP" stack that sends syntactically valid TCP segments but never retransmits, then run that on both ends of the connection. In essence, UDP masquerading as TCP. But there's no guarantee that this faux-TCP connection wouldn't break in weird ways because the network (especially, as you've discovered, any cloud provider's network!) isn't just a dumb pipe: middleboxes, for example, expect TCP to behave like TCP.
Good news (and oops), it looks like I've just accidentally described phantun (and maybe other solutions): https://github.com/dndx/phantun I'd be curious if this manages to sidestep the issues you're seeing with AWS and OVH.
But you're concerned about an issue I do not have. In practice retransmits are rare between my endpoints, and if they did occur poor performance is acceptable for some period of time. I just need it to me fast most of the time. To reiterate: I do not care about what happens in non-optimal conditions.
> it looks like I've just accidentally described phantun (and maybe other solutions): https://github.com/dndx/phantun
I'll definitely look into that. They specifically mention being more performant than udp2raw, so that's nice.
Deleted Comment
But IP over TCP is in principle non-performant. There's no (non-evil) magic Wireguard could perform to get around that.
Adding TCP support to Wireguard would add a whole bunch of complexity that it doesn't need – for a very niche use case (i.e. where you absolutely have to get an IP VPN to work over a restrictive firewall).
> Wireguard won't do it directly, but there's hacks involving udp2raw.
Which significantly does not do UDP over TCP in the problematic sense (it just masquerades UDP as TCP, without providing a second set of TCP control loops on top of the first one).
> AWS killed all internet access on the instance with zero warning and sent us an email indicating that they suspected the instance was compromised and being used as part of a DDOS attack.
It makes no sense for that to be due to Wireguard usage, though (not saying I don't believe you that it happened, just their explanation or your assumption of their motivation seems strange). Things like Tailscale use Wireguard and should be common enough for AWS to know about them by now, I'd assume?
No it's not. In principle it risks meltdown, which is different. A link that occasionally breaks can be performant while it's working.
This comes with TCP implementation https://github.com/TunSafe/TunSafe/blob/master/docs/WireGuar...
Bad news is that runs only between TunSafe instances.
For a site to site VPN, something where you use transparent proxying at the routers to turn TCP into TCP over SOCKS (over TLS) might work. TCP proxying with 1:1 sockets avoids most of the issues with TCP over TCP, at the expense of needing to keep socket buffers at the proxy hosts.
I was quite hoping that the advent of QUIC would let us all use UDP again, albeit on one port.
wait, what? Pretty sure I still used unencapsulated ESP a few months ago… though I wouldn't necessarily notice if it negotiates UDP after some update I guess… starts looking at things
Edit: strongswan 6.0 Beta documentation still lists "<conn>.encap default: no" as config option — this wouldn't make any sense if UDP encapsulation was always on now. Are you sure about this?
There's an issue open for years; it will probably never be fixed:
https://wiki.strongswan.org/issues/1265
Why TCP over TCP is a bad idea (2001) - https://news.ycombinator.com/item?id=25080693 - Nov 2020 (68 comments)
Why TCP Over TCP Is a Bad Idea (2001) - https://news.ycombinator.com/item?id=9281954 - March 2015 (43 comments)
Why TCP Over TCP Is A Bad Idea - https://news.ycombinator.com/item?id=2409090 - April 2011 (26 comments)
https://xeiaso.net/blog/anything-message-queue/
[0] https://web.archive.org/web/20000310230940/http://sites.inka...
"TCP over TCP" specifically means a TCP stream whose payload represents a sequence of TCP packets.
It can even be beneficial in some cases: If a host has an old/bad TCP stack not able to deal well with some network situation (latency, packet loss, you name it), port forwarding from a closer/less affected host can resolve the issue.
Happened to me once for the terrible old eBook delivery server from my public library when a continent away: It handled the long latency so poorly, a 30 MB download would have taken two hours. SSH forwarding brought that down to seconds.
Deleted Comment
Deleted Comment