While WireGuard makes every sense for an FPGA due to its minimal design, I wonder why there isn't much interest in using QUIC as a modern tunneling protocol, especially for corporate use cases. QUIC already provides an almost complete WireGuard-alternative via its datagrams that can be easily combined with TUN devices and custom authentication schemes (e.g. mTLS, bearer tokens obtained via OAuth2 and OIDC authentication, etc...) to build your own VPN. While I am not sure about performance, at least when compared to kernel-mode WireGuard, since QUIC is obviously a more complex state machine that's running in userspace and it depends on the implementation and optimizations offered by the OS (e.g. GRO/GSO), QUIC isn't just a yet another tunneling protocol, it actually offers lots of benefits such as working well with dynamic endpoints with DNS instead of just using static IP addrs, it uses modern TLSv1.3 and therefore it's compliant with FIPS for example, it uses AES which can be accelerated by the underlying hardware (e.g. AES-NI), it currently has implementations in almost every major programming language, it can work well in the future with proxies and load balancers, you can bring your own custom, more fine-grained authentication scheme (e.g. bearer tokens, mTLS, etc...), it masquerades as just another QUIC/HTTP3 traffic that's used by almost all major websites now and therefore less susceptible to dropping by any nodes in between, and other less obvious benefits such as congestion control and PMTUD.
Why would anyone want to use a complex kludge like QUIC and be at the mercy of broken TLS libraries, when Wireguard implementations are ~ 5k LOC and easily auditable?
Have all the bugs in OpenSSL over the years taught us nothing?
FWIW QUIC enforces TLS 1.3 and modern crypto. A lot smaller surface area and far fewer foot-guns. Combined with memory safe TLS implementations in Go and Rust I think it's fair to say things have changed since the heartbleed days.
QUIC allows identities to be signing keys, which are used to build public key infrastructure. You need to be able to sign things to do web-of-trust, or make arbitrary attestations.
Wireguard has a concept of identity as long term key pairs, but since the algorithm is based on Diffie-Hellman, and arriving at a shared secret ephemeral key, it's only useful for establishing active connections. The post-quantum version of Wireguard would use KEMs, which also don't work for general purpose PKI.
What we really need is a signature based handshake and simple VPN solution (like what Wireguard does for the Noise Protocol Framework), that a stream multiplexing protocol can be layered on top of. QUIC gets the layers right, in the right order (first encrypt, then deal with transport features), but annoyingly none of the QUIC implementations make it easy to take one layer without the other.
I've recently spent a bunch of time working on a mesh networking project that employs CONNECT-IP over QUIC [1].
There's a lot of benefits for sure, mTLS being a huge one (particularly when combined with ACME). For general purpose, spoke and hub VPN's tunneling over QUIC is a no-brainer. Trivial to combine with JWT bearer tokens etc. It's a neat solution that should be used more widely.
However there are downsides, and those downsides are primarily performance related. For a bunch of reasons, some just including poorly optimized library code, others involving relatively high message parsing/framing/coalescing/fragmenting costs, and userspace UDP overheads. On fat pipes today you'll struggle to get more than a few gbits of throughput @ 1500 MTU (which is plenty for internet browsing for sure).
For fat pipes and hardware/FPGA acceleration use cases, google probably has the most mature approach here with their datacenter transport PSP [2]. Basically a stripped down per flow IPsec. In-kernel IPsec has gotten a lot faster and more scalable in recent years with multicore/multiqueue support [3]. Internal benchmarking still shows IPsec on linux absolutely dominating performance benchmarks (throughput and latency).
For the mesh project we ended up pivoting to a custom offload friendly, kernel bypass (AF_XDP) dataplane inspired by IPsec/PSP/Geneve.
I'm available for hire btw, if you've got an interesting networking project and need a remote Go/Rust developer (contract/freelance) feel free to reach out!
The purpose of Wireguard is to be simple. The purpose of QUIC is to be compatible with legacy web junk. You don't use the second one unless you need the second one.
WireGuard-over-QUIC does not make any sense to me, this lowers performance and possibly the inner WireGuard MTUs. You can just replace WireGuard with QUIC altogether if you just want obfuscation.
The assumed mentality of “being flexible” is the very reason WireGuard was created to fight against in the first place, otherwise why bother? IPSec is already standardized and with wide-spread hardware implementation (both FPGA and ASIC) and flexible.
I think standards operate according to punctuated equilibrium so the market will only accept one new standard every ten years or so. I could imagine something like PQC causing a shift to QUIC in the future.
Why are you taking from people their will to experiment and design new stuff? Are they using your money or time? Is this just out of grumpiness, envy, condescension or what?
Quic is a corporate supported black hole. Corporations are anti-human. Its a wonder that there is still some freedom to make useful protocols on the internet and that people are nice enough to do that
Very cool project - hoping to see follow-up designs that can do more than 1Gbps per port!
I recently built a fully Layer2-transparent 25Gbps+ capable wireguard-based solution for LR fiber links at work based on Debian with COTS Zen4 machines and a purpose-tailored Linux kernel build - I'd be curious to know what an optimized FPGA can do compared to that.
Yes, Jumbo frames unlock a LOT of additional performance - which is exactly what we have and need on those links. Using a vanilla wg-bench[0] loopback-esque (really veths across network namespaces) setup on the machine, I get slightly more than 15Gbps sustained throughput.
Just to elaborate for others, MACSec is a standard (802.1ae) and runs at line rate. Something like a Juniper PTX10008 can run it at 400Gbps, and it’s just a feature you turn on for the port you’d be using for the link you want to protect anyway (PTXs are routers/switches, not security devices).
If I need to provide encryption on a DCI, I’m at least somewhat likely to have gear that can just do this with vendor support instead of needing to slap together some Linux based solution.
Unless, I suppose, there’s various layer 2 domains you’re stitching together with multiple L2 hops and you don’t control the ones in the middle. In which case I’d just get a different link where that isn’t true.
SpiralHDL is so cool. There's been so so much consolidation in the semiconductor market, and that's scary. But it feels like there's such an amazing base of new open design systems to work from now, that getting new things started should be so possible! There's just a little too much gap in actually getting the Silicon Foundry model back up, things all a bit too encumbered still. Fingers crossed that chip making has its next day.
> However, the Blackwire hardware platform is expensive and priced out of reach of most educational institutions. Its gateware is written in SpinalHDL, a nice and powerfull but a niche HDL, which has not taken roots in the industry. While Blackwire is now released to open-source, that decision came from their financial hardship -- It was originaly meant for sale.
1. None of the commercial tools support them. All other HDLs compile to SV (or plain Verilog) and then you're wasting hours and hours debugging generated code. Not fun. Ask me how I know...
2. SV has an absolute mountain of features and other HDLs rarely come close. Especially when it comes to multi-clock designs (which are annoying and awkward but very common), and especially verification.
The only glimpse of hope I see on the horizon is Veryl, which hews close enough to SV that interop is going to be easy and the generated code is going to be very readable. Plus it's made by very experienced people. It's kind of the Typescript of SystemVerilog.
What are the benefits of SV for multi-clock design? I found migen (and amaranth) to be much nicer for multi-clock designs, providing a stdlib for CDCs and async FIFOs and keeping track of clock domains seperately from normal signals.
My issue with systemverilog is the multitude of implementation with widely varying degrees of support and little open source. Xsim poorly supports more advanced constructs and crashes with them, leaving you to figure out which part causes issues. Vivado only supports a subset. Toolchains for smaller FPGAs (lattice, chinese, ...) are much worse. The older Modelsim versions I used were also not great. You really have to figure out the basic common subset of all the tools and for synthesis, that basically leaves interfaces and logic . Interfaces are better than verilog, but much worse than equivalents in these neo-HDLs(?).
While tracing back compiled verilog is annoying, you are also only using one implementation of the HDL, without needing to battle multiple buggy, poorly documented implementation. There is only one, usually less buggy, poorly documented implementation.
SpinalHDL's multiple clock domain support via lexical scoping is excellent.
Save for things like SV interfaces (which are equivalently implemented in a far better way using Scala's type system), SpinalHDL can emit pretty much any Verilog you can imagine.
I can't think of a scenario where this is useful. They claim "Full-throttle, wire-speed hardware implementation of Wireguard VPN" but then go on implementing this on a board with a puny set of four 1 Gbps ports... The standard software implementation of Wireguard (Linux kernel) can already saturate Gbps links (wirespeed, check) and can even approach 10 Gbps on a mid-range CPU: https://news.ycombinator.com/item?id=42172082
If they had produced a platform with four 10 Gbps ports, then it would become interesting. But the whole hardware and bitstream would have to be redevelopped almost from scratch.
It's an educational project. No need to put it on blast over that. CE/EE students can buy a board for a couple hundred bucks and play around with this to learn.
A hypothetical ASIC implementation would beat a CPU rather soundly on a per watt and per dollar basis, which is why we have hardware acceleration for other protocols on high end network adaptors.
Personally, if I could buy a Wireguard appliance that was decent for the cost, I'd be interested in that. I ran a FreeBSD server in my closet to do similar things back in the day and don't feel the need to futz around with that again.
I agree that if the goal is to be educational, it's an excellent interesting project. But there is no need to make dishonest claims on their web page like "the software performance is far below the speed of wire"
There’s a strong air of grantware to it. The notion that it could be end-to-end auditable from the RTL up is interesting, though, and generally Wireguard performance will tank with a large routing table and small MTUs like you might suffer on a VPN endpoint server while this project seems to target line speed even at the absolute worst case routing x packets scenario.
Amusingly, a lot of people have always been convinced that doing 10 Gbps is impossible on VPN. I recall a two-year old post on /r/mikrotik where everyone was telling OP it was impossible with citations and sources of why but then it worked
Why would you even need dedicated hardware for just 40 Gb/s? That is within single-core decryption performance which should be the bottleneck for any halfway decent transport protocol. Are we talking 40 Gb/s at minimum packet size so you need to handle ~120 M packets/s?
Because the entire stack is auditable here. There's no Cisco backdoor, no Intel ME, no hidden malware from a zombie NPM package. It's all your hardware.
bps are easy. packets per second is the crunch. Say you've got 64 bytes per packet, which would be a worst-case-scenario - you're down to 150Mpacket/sec. Sending one byte after another is the easy bit, the decisions are made per-packet.
My dude: As far as I know, it's the first implementation of Wireguard in an FPGA.
It does not have to be all things for all people today. It can be improved. (And it appears to be open-source under a BSD license; anyone can begin making improvements immediately if they wish.)
Concepts like "This proof-of-concept wasn't explored with multiple 10Gbps ports! It is therefore imperfect and thus disinteresting!" are... dismaying, to say the least.
It would be an interesting effort if it only worked with two 10Mbps ports, just because of the new way in which it accomplishes the task.
I don't want to live in a world where the worth of all ideas is reduced a binary concept, where all things are either perfect or useless.
(Fortunately for me, I do not live in such a world that is as binary as that.)
This is conceptually interesting but seems quite a ways from a real end to end implementation - a bit of a smell of academic grantware that I hope can reach completion.
Fully available source from RTL up (although the license seems proprietary?) is very interesting from an audit standpoint, and 1G line speed performance, although easily achieved by any recent desktop hardware, is quite respectable in worst case scenarios (large routing table and small frames). The architecture makes sense (software managed handshakes configure a hardware packet pipeline). WireGuard really lacks acceleration in most contexts (newer Intel QAT supposedly can accelerate ChaCha20 but trying to figure out how one might actually make it work is truly mind bending), so it’s a pretty interesting place to do a hardware implementation.
The safe assumption to make when met with a contradiction in licensing would be to assume that the more restrictive license holds, no? Especially when the permissive license is a general repo-wide license and the restrictive license is specifically applied to certain files.
So for all intents and purposes, in my opinion, large parts of this Wireguard FPGA project are under this weird proprietary Chili Chips license. In fact, the license is so proprietary that the people who made this wireguard FPGA repository and made it visible to the public are seemingly in violation of it.
It puts us in a weird spot as well: I'm now the "holder of" a file and am obligated to keep all information within it confidential and to protect the file from disclosure. So I guess I can't share a link to the repo, since that would violate my obligation to protect the files within it from disclosure.
I would link to the files in question, but, well, that wouldn't protect them from disclosure now would it.
"With traditional solutions (such as OpenVPN / IPSec) starting to run out of steam" -- and then zero explanation or evidence of how that is true.
I can see an argument for IPSec. I haven't used that for many years. However, I see zero evidence that OpenVPN is "running out of steam" in any way shape or form.
I would be interested to know the reasoning behind this. Hopefully the sentiment isn't "this is over five years old so something newer must automatically be better". Pardon me if I am being too cynical, but I've just seen way too much of that recently.
Seems like you just haven’t been paying attention. Even commercial VPNs like PIA and others now use Wireguard instead of traditional VPN stacks. Tailscale and other companies in that space are starting to replace VPN stacks with Wireguard solutions.
The reasons are abundant, the main ones being performance is drastically better, security is easier to guarantee because the stack itself is smaller and simpler, and it’s significantly more configurable and easier to obtain the behavior you want.
I use and advocate for wireguard but I don't see it's adoption in bigger orgs, at least the ones I've worked in. Appreciate this situation will change over time, but it'll be a long tail.
OpenVPN makes SNAT relatively trivial, from what I can tell. So I can VPN into a network, use a node on the network as my exit node, and access other devices on that network, with source-based NAT set up on the exit node to make it appear as if my traffic is coming from the exit node.
Wireguard seems to make this much more difficult from what I can tell, though I don't know enough about networking to know if that's fundamental to wireguard or just a result on less mature tooling.
Wireguard is slowly eating the space alive and thats a good thing.
Here's a very educational comparison between Wireguard, OpenVPN and IPSec. It shows how easy wireguard is to manage compared to the other solutions and measures and explains the noticeable differences in speed: https://www.youtube.com/watch?v=LmaPT7_T87g
I wouldn't say they're running out of steam (they never had any) but OpenVPN was always poorly designed and engineered and IPSec has poor interop because there are so many options.
Unfortunately (luckily?) I don’t have enough knees about IPsec, but usually things make a lot more sense once you actually know the exact architecture and rationale behind it
IPSec isn’t running out of steam anytime soon. Every commercial firewall vendor uses it, and it’s mandatory in any federal government installation.
WireGuard isn’t certified for any federal installation that I’m aware of and I haven’t heard of any vendors willing to take on the work of getting it certified when its “superiority” is of limited relevance in an enterprise situation.
OpenVPN has both terrible configuration and performance compared to just about anything else. I've seen it really drop off to next to no usage both in companies and for personal use over the past few years as wireguard based solutions have replaced it.
Have all the bugs in OpenSSL over the years taught us nothing?
Wireguard has a concept of identity as long term key pairs, but since the algorithm is based on Diffie-Hellman, and arriving at a shared secret ephemeral key, it's only useful for establishing active connections. The post-quantum version of Wireguard would use KEMs, which also don't work for general purpose PKI.
What we really need is a signature based handshake and simple VPN solution (like what Wireguard does for the Noise Protocol Framework), that a stream multiplexing protocol can be layered on top of. QUIC gets the layers right, in the right order (first encrypt, then deal with transport features), but annoyingly none of the QUIC implementations make it easy to take one layer without the other.
TweetNaCL to the rescue.
[0]https://datatracker.ietf.org/wg/masque/about/
There's a lot of benefits for sure, mTLS being a huge one (particularly when combined with ACME). For general purpose, spoke and hub VPN's tunneling over QUIC is a no-brainer. Trivial to combine with JWT bearer tokens etc. It's a neat solution that should be used more widely.
However there are downsides, and those downsides are primarily performance related. For a bunch of reasons, some just including poorly optimized library code, others involving relatively high message parsing/framing/coalescing/fragmenting costs, and userspace UDP overheads. On fat pipes today you'll struggle to get more than a few gbits of throughput @ 1500 MTU (which is plenty for internet browsing for sure).
For fat pipes and hardware/FPGA acceleration use cases, google probably has the most mature approach here with their datacenter transport PSP [2]. Basically a stripped down per flow IPsec. In-kernel IPsec has gotten a lot faster and more scalable in recent years with multicore/multiqueue support [3]. Internal benchmarking still shows IPsec on linux absolutely dominating performance benchmarks (throughput and latency).
For the mesh project we ended up pivoting to a custom offload friendly, kernel bypass (AF_XDP) dataplane inspired by IPsec/PSP/Geneve.
I'm available for hire btw, if you've got an interesting networking project and need a remote Go/Rust developer (contract/freelance) feel free to reach out!
1. https://www.rfc-editor.org/rfc/rfc9484.html
2. https://cloud.google.com/blog/products/identity-security/ann...
3. https://netdevconf.info/0x17/docs/netdev-0x17-paper54-talk-s...
Dead Comment
I recently built a fully Layer2-transparent 25Gbps+ capable wireguard-based solution for LR fiber links at work based on Debian with COTS Zen4 machines and a purpose-tailored Linux kernel build - I'd be curious to know what an optimized FPGA can do compared to that.
25G is a lot for WireGuard [1].
1. https://www.youtube.com/watch?v=oXhNVj80Z8A
[0]: https://github.com/cyyself/wg-bench
Just to elaborate for others, MACSec is a standard (802.1ae) and runs at line rate. Something like a Juniper PTX10008 can run it at 400Gbps, and it’s just a feature you turn on for the port you’d be using for the link you want to protect anyway (PTXs are routers/switches, not security devices).
If I need to provide encryption on a DCI, I’m at least somewhat likely to have gear that can just do this with vendor support instead of needing to slap together some Linux based solution.
Unless, I suppose, there’s various layer 2 domains you’re stitching together with multiple L2 hops and you don’t control the ones in the middle. In which case I’d just get a different link where that isn’t true.
'Testing MACsec' paragraph, especially around
>> Last but not least, ... so that half of the traffic ... was being sent fully unencrypted ...
When you say "exists" ... is there an OpenSource high-quality implementation ?
> However, the Blackwire hardware platform is expensive and priced out of reach of most educational institutions. Its gateware is written in SpinalHDL, a nice and powerfull but a niche HDL, which has not taken roots in the industry. While Blackwire is now released to open-source, that decision came from their financial hardship -- It was originaly meant for sale.
Here's some kind of link for the old BlackWire 100Gbe wiregaurd project mentioned: https://github.com/FPGA-House-AG/BlackwireSpinal
1. None of the commercial tools support them. All other HDLs compile to SV (or plain Verilog) and then you're wasting hours and hours debugging generated code. Not fun. Ask me how I know...
2. SV has an absolute mountain of features and other HDLs rarely come close. Especially when it comes to multi-clock designs (which are annoying and awkward but very common), and especially verification.
The only glimpse of hope I see on the horizon is Veryl, which hews close enough to SV that interop is going to be easy and the generated code is going to be very readable. Plus it's made by very experienced people. It's kind of the Typescript of SystemVerilog.
My issue with systemverilog is the multitude of implementation with widely varying degrees of support and little open source. Xsim poorly supports more advanced constructs and crashes with them, leaving you to figure out which part causes issues. Vivado only supports a subset. Toolchains for smaller FPGAs (lattice, chinese, ...) are much worse. The older Modelsim versions I used were also not great. You really have to figure out the basic common subset of all the tools and for synthesis, that basically leaves interfaces and logic . Interfaces are better than verilog, but much worse than equivalents in these neo-HDLs(?).
While tracing back compiled verilog is annoying, you are also only using one implementation of the HDL, without needing to battle multiple buggy, poorly documented implementation. There is only one, usually less buggy, poorly documented implementation.
Save for things like SV interfaces (which are equivalently implemented in a far better way using Scala's type system), SpinalHDL can emit pretty much any Verilog you can imagine.
If they had produced a platform with four 10 Gbps ports, then it would become interesting. But the whole hardware and bitstream would have to be redevelopped almost from scratch.
A hypothetical ASIC implementation would beat a CPU rather soundly on a per watt and per dollar basis, which is why we have hardware acceleration for other protocols on high end network adaptors.
Personally, if I could buy a Wireguard appliance that was decent for the cost, I'd be interested in that. I ran a FreeBSD server in my closet to do similar things back in the day and don't feel the need to futz around with that again.
https://old.reddit.com/r/mikrotik/comments/112mo4v/is_there_...
It does not have to be all things for all people today. It can be improved. (And it appears to be open-source under a BSD license; anyone can begin making improvements immediately if they wish.)
Concepts like "This proof-of-concept wasn't explored with multiple 10Gbps ports! It is therefore imperfect and thus disinteresting!" are... dismaying, to say the least.
It would be an interesting effort if it only worked with two 10Mbps ports, just because of the new way in which it accomplishes the task.
I don't want to live in a world where the worth of all ideas is reduced a binary concept, where all things are either perfect or useless.
(Fortunately for me, I do not live in such a world that is as binary as that.)
Fully available source from RTL up (although the license seems proprietary?) is very interesting from an audit standpoint, and 1G line speed performance, although easily achieved by any recent desktop hardware, is quite respectable in worst case scenarios (large routing table and small frames). The architecture makes sense (software managed handshakes configure a hardware packet pipeline). WireGuard really lacks acceleration in most contexts (newer Intel QAT supposedly can accelerate ChaCha20 but trying to figure out how one might actually make it work is truly mind bending), so it’s a pretty interesting place to do a hardware implementation.
Hm, "BSD 3-Clause License" is seems really proprietary to you?
But you are right: do the personal license in many(most?) Verilog files[1] overrules the LICENSE file[2] of a repo?
[1] https://github.com/chili-chips-ba/wireguard-fpga/blob/main/1...
[2] https://github.com/chili-chips-ba/wireguard-fpga/blob/main/L...
So for all intents and purposes, in my opinion, large parts of this Wireguard FPGA project are under this weird proprietary Chili Chips license. In fact, the license is so proprietary that the people who made this wireguard FPGA repository and made it visible to the public are seemingly in violation of it.
It puts us in a weird spot as well: I'm now the "holder of" a file and am obligated to keep all information within it confidential and to protect the file from disclosure. So I guess I can't share a link to the repo, since that would violate my obligation to protect the files within it from disclosure.
I would link to the files in question, but, well, that wouldn't protect them from disclosure now would it.
I can see an argument for IPSec. I haven't used that for many years. However, I see zero evidence that OpenVPN is "running out of steam" in any way shape or form.
I would be interested to know the reasoning behind this. Hopefully the sentiment isn't "this is over five years old so something newer must automatically be better". Pardon me if I am being too cynical, but I've just seen way too much of that recently.
The reasons are abundant, the main ones being performance is drastically better, security is easier to guarantee because the stack itself is smaller and simpler, and it’s significantly more configurable and easier to obtain the behavior you want.
Wireguard seems to make this much more difficult from what I can tell, though I don't know enough about networking to know if that's fundamental to wireguard or just a result on less mature tooling.
Here's a very educational comparison between Wireguard, OpenVPN and IPSec. It shows how easy wireguard is to manage compared to the other solutions and measures and explains the noticeable differences in speed: https://www.youtube.com/watch?v=LmaPT7_T87g
Very recommended!
WireGuard isn’t certified for any federal installation that I’m aware of and I haven’t heard of any vendors willing to take on the work of getting it certified when its “superiority” is of limited relevance in an enterprise situation.
With WireGuard I instead max out the internet bandwidth (400 megabits/s) with like 20% cpu usage if that.
I really don’t understand why. We have AES acceleration. AES-NI can easily do more bps… why is openvpn so slow?
Deleted Comment