Recently, I've learned about XDP and AF_XDP, which allows user-space programs to have a fast-path through the kernel and skip a large chunk of networking done by the kernel. This allows us to directly interact with the network interface TX queues, and send a lot of traffic very fast.
I initially started to do this because I was curious if it would work or not, but eventually I saw that it worked too well, so I polished it a little bit and released it as open-source.
Happy to answer any questions.