sendfile effectively turns your user space file server into a control plane, and moves the data plane to where the data is eliminating copies between address spaces. This can be made congruent with I/O completions (i.e. Ethernet+IP and block) and made asynchronous so the entire thing is pumping data between completion events. Watch the Netflix video the author links in the post.
There is an inverted approach where you move all this into a single user address space, i.e. DPDK, but it's the same overall concept just a different who.
Evented I/O works out pretty well in practice for the I and D cache, especially if you can affine and allocate things as the article states, and do similar natural alignments inside the kernel (i.e. RSS/consistent hashing).