Order of magnitude performance improvement isn't going to be possible. That basically requires that over 90% of your cycles are currently being wasted by the OS somehow. Maybe this project could get 20% improvement.
You're talking about the OP's project and I was not – at least not when I brought up orders of magnitude. The confusion is my fault. I implicitly changed the subject to my own fantasy tangent.
My point is that if one is going to build a narrow vertical stack up from specialized hardware, there had better be a 10x advantage over running the application the ordinary way or the experiment becomes a why-bother. Also, the application had better be valuable enough to justify the effort.
This vision of systems design has been alive in the Forth community for a long time – maybe not the "iterating on hardware as part of application development" part, but certainly the specialized vertical stack idea, just in a very austere form. They make the tradeoff of dramatically reducing what the software will do in order to make it feasible to develop that way. That's a tradeoff most of us aren't willing to make. But I have a feeling there are more options if one is talking strictly about servers.
> Order of magnitude performance improvement isn't going to be possible.
I think the term 'order of magnitude' has started taking on a connotation of essentially meaning 'a lot'. It's a fair observation, but I hear it bandied about so often that I rarely actually think the parties are in fact using it literally.
It depends on how efficient/inefficient the OS's network stack and data transfer to user space is. For managed runtimes in a VM taking advantage of zero copy APIs is a challenge. I don't think 'order of magnitude' is possible but clearly there are a lot of cases where if implemented correctly this idea could dramatically improve performance.