Sounds like you're experiencing vagaries of somebody (maybe Cloudflare, maybe some other ISP Cloudflare is peering with) doing traffic engineering, probably to reduce congesting on particular paths. The recommendation to go with the Pro plan is likely just the first step, the next step is to open a ticket and get them to fix it -- that's what you're paying them for.
Dropping Cloudflare is, of course, an option as most of the security stuff they do can be handled by competent security folks, but you (may?) need to find someone similar if your site is at risk of DDoS.
So it's not like I'm delivering features in one day that would have taken two weeks. But I am delivering features in two weeks that have a bunch of extra niceties attached to them. Reality being what it is, we often release things before they are perfect. Now things are a bit closer to perfect when they are released.
I hope some of that extra work that's done reduces future bug-finding sessions.
Of course I'd rather not maintain my own fork of something that always should have been part of poi, but this was better than maintaining an impossible mix of dependencies.
I do feel we're heading in a direction where building in-house will become more common than defaulting to 3rd party dependencies—strictly because the opportunity costs have decreased so much. I also wonder how code sharing and open source libraries will change in the future. I can see a world where instead of uploading packages for others to plug into their projects, maintainers will instead upload detailed guides on how to build and customize the library yourself. This approach feels very LLM friendly to me. I think a great example of this is with `lucia-auth`[0] where the maintainer deprecated their library in favour of creating a guide. Their decision didn't have anything to do with LLMs, but I would personally much rather use a guide like this alongside AI (and I have!) rather than relying on a 3rd party dependency whose future is uncertain.
I think `ts-rest` is a great library, but the lack of maintenance didn't make me feel confident to invest, even if I wasn't using express. Have you ever considered building your own in-house solution? I wouldn't necessarily recommend this if you already have `ts-rest` setup and are happy with it, but rebuilding custom versions of 3rd party dependencies actually feels more feasible nowadays thanks to LLMs. I ended up building a stripped down version of `ts-rest` and am quite happy with it. Having full control/understanding of the internals feels very good and it surprisingly only took a few days. Claude helped immensely and filled a looot of knowledge gaps, namely with complicated Typescript types. I would also watch out for treeshaking and accidental client zod imports if you decide to go down this route.
I'm still a bit in shock that I was even able to do this, but yeah building something in-house is definitely a viable option in 2025.
I would say this oversight was a blessing in disguise though, I really do appreciate minimizing dependencies. If I could go back in time knowing what I know now, I still would've gone down the same path.
Curious what other folks think and if there are any other options? I feel like I've searched pretty exhaustively, and it's the only one I found that was both lightweight and had robust enough type safety.
I think `ts-rest` is a great library, but the lack of maintenance didn't make me feel confident to invest, even if I wasn't using express. Have you ever considered building your own in-house solution? I wouldn't necessarily recommend this if you already have `ts-rest` setup and are happy with it, but rebuilding custom versions of 3rd party dependencies actually feels more feasible nowadays thanks to LLMs. I ended up building a stripped down version of `ts-rest` and am quite happy with it. Having full control/understanding of the internals feels very good and it surprisingly only took a few days. Claude helped immensely and filled a looot of knowledge gaps, namely with complicated Typescript types. I would also watch out for treeshaking and accidental client zod imports if you decide to go down this route.
I'm still a bit in shock that I was even able to do this, but yeah building something in-house is definitely a viable option in 2025.
Anyway please follow up or blog when you solve it. Sounds interesting.
import { performance, EventLoopUtilization } from 'node:perf_hooks'
performance.eventLoopUtilization()
See the docs for how it works and how to derive some value from it.We had a similar situation where our application was heavily IO bound (very little CPU) which caused some initial confusion with slowdown. We ended up added better metrics surrounding IO and the event loop which lead to us batch dequeuing our jobs in a more reasonable way that made the entire application much more effective.
If you crack the nut on this issue, I'd love to see an update comment detailing what the issue and solution was!
And yeah, I've been using prometheus' `collectDefaultMetrics()` function so far to see event loop metrics, but it looks like node:perf_hooks might provide a more detailed output... thanks for sharing
I ended up figuring out a fix but it's a little embarrassing... Optimizing certain parts of socket.io helped a little (eg installing bufferutil: https://www.npmjs.com/package/bufferutil), but the biggest performance gain I found was actually going from 2 node.js containers on a single server to just 1! To be exact I was able to go from ~500 concurrent players on a single server to ~3000+. I feel silly because had I been load-testing with 1 container from the start, I would've clearly seen the performance loss when scaling up to 2 containers. Instead I went on a wild goose chase trying to fix things that had nothing to do with the real issue[0].
In the end it seems like the bottleneck was indeed happening at the NIC/OS layer rather than the application layer. Apparently the NIC/OS prefers to deal with a single process screaming `n` packets at it rather than `x` processes screaming `n/x` packets. In fact it seems like the bigger `x` is, the worse performance degrades. Perhaps something to do with context switching, but I'm not 100% sure. Unfortunately given my lacking infra/networking knowledge this wasn't intuitive to me at all - it didn't occur to me that scaling down could actually improve performance!
Overall a frustrating but educational experience. Again, thanks to everyone who helped along the way!
TLDR: premature optimization is the root of all evil
[0] Admittedly AI let me down pretty bad here. So far I've found AI to be an incredible learning and scaffolding tool, but most of my LLM experiences have been in domains I feel comfortable in. This time around though, it was pretty sobering to realize that I had been effectively punked by AI multiple times over. The hallucination trap is very real when working in domains outside your comfort zone, and I think I would've been able to debug more effectively had I relied more on hard metrics.