Readit News logoReadit News
korkybuchek commented on We cut our Mongo DB costs by 90% by moving to Hetzner   prosopo.io/blog/we-cut-ou... · Posted by u/arbol
KaiserPro · a month ago
> MongoDB Atlas is deployed to VMs on 2-3 AZs

I've not actually seen an AZ go down in isolation, so whilst I agree its technically a less "robust" deployment, in practice its not that much of a difference.

> these "we cut bare costs by moving away from the cloud" posts are catnip for HN. But they usually don't make sense.

We moved away from atlas because they couldn’t cope with the data growth that we had(4tb is the max per DB). Turns out that its a fuck load cheaper even hosting on amazon (as in 50%). We haven't moved to hertzner because that would be more effort than we really want to expend, but its totally doable, with not that much extra work.

> more maintenance tasks (installing, upgrading, patching, troubleshooting, getting on-call, etc) with lower reliability and fewer services, isn't an advantage.

Depends right, firstly its not that much of an overhead, and if it saves you significant cash, then it increases your run rate.

korkybuchek · a month ago
> I've not actually seen an AZ go down in isolation

Counterpoint: I have. Maybe not completely down, but degraded, or out of capacity on an instance type, or some other silly issue that caused an AZ drain. It happens.

korkybuchek commented on Ban me at the IP level if you don't like me   boston.conman.org/2025/08... · Posted by u/classichasclass
sidewndr46 · 4 months ago
I don't think you have any idea how serious the issue is. I was loosely speaking in charge of application-level performance at one job for a web app. I was asked to make the backend as fast as possible at dumping the last byte of HTML back to the user.

The problem I ran into was performance was bimodal. We had this one group of users that was lightning fast and the rest were far slower. I chased down a few obvious outliers (that one forum thread with 11000 replies that some guy leaves up on a browser tab all the time, etc.) but it was still bimodal. Eventually I just changed the application level code to display known bots as one performance trace and everything else as another trace.

60% of all requests are known bots. This doesn't even count the random ass bot that some guy started up at an ISP. Yes, this really happened. We were paying customer of a company who decided to just conduct a DoS attack on us at 2 PM one afternoon. It took down the website.

Not only that, the bots effectively always got a cached response since they all seemed to love to hammer the same pages. Users never got a cached response, since LRU cache eviction meant the actual discussions with real users were always evicted. There were bots that would just rescrape every page they had ever seen every few minutes. There were bots that would just increase their throughput until the backend app would start to slow down.

There were bots that would run the javascript for whatever insane reason and start emulating users submitting forms, etc.

You probably are thinking "but you got to appear in a search index so it is worth it". Not really. Google's bot was one of the few well behaved ones and would even slow scraping if it saw a spike in the response times. Also we had an employee who was responsible for categorizing our organic search performance. While we had a huge amount of traffic from organic search, it was something like 40% to just one URL.

Retrospectively I'm now aware that a bunch of this was early stage AI companies scraping the internet for data.

korkybuchek · 4 months ago
> Google's bot was one of the few well behaved ones and would even slow scraping if it saw a spike in the response times.

Google has invested decades of core research with an army of PhDs into its crawler, particularly around figuring out when to recrawl a page. For example (a bit dated, but you can follow the refs if you're interested):

https://www.niss.org/sites/default/files/Tassone_interface6....

korkybuchek commented on Show HN: Socket-call – Call socket.io events like normal JavaScript functions   github.com/bperel/socket-... · Posted by u/bperel
sourcemap · 6 months ago
This is true. Just a few days ago I had Claude one-shot some WebSocket utilities for reconnect and message queueing. It took 2 minutes.

I've written countless WebSocket wrappers in the past (similar aversion to socket.io as others in this thread). The one-shot output was perfect. Certainly better than my patience would've allowed.

Maybe socket.io is doing something fancy on the server side, but for clients, it's absolutely overkill.

korkybuchek · 6 months ago
Maybe you could save that one-shotted code into a library of some sort...?
korkybuchek commented on Show HN: Socket-call – Call socket.io events like normal JavaScript functions   github.com/bperel/socket-... · Posted by u/bperel
imtringued · 6 months ago
socket.io is probably one of the most unnecessary libraries on this planet. Websockets are already as simple as possible.

In fact, websockets work so well I use them as a generic TCP replacement, because the message oriented transport model gives me 99% of what I need with the exception of custom message types. Leaving that out was a massive letdown to me, because you now need to carry a way to identify the message type inside the body, rather than just throwing the message itself into the appropriate protocol parser (e.g. a schema based binary format).

korkybuchek · 6 months ago
> socket.io is probably one of the most unnecessary libraries on this planet. Websockets are already as simple as possible.

Eh... While I agree that socket.io is one of those libraries you could probably "write" in an afternoon, and Websockets are simple, there are a couple of things that are kinda painful to rewrite time after time:

  - keepalives to detect dead sockets
  - reconnection logic with backoff
  - ability to switch to long-polling for weird environments
  - basic multiplexing/namespacing

korkybuchek commented on PDF to Text, a challenging problem   marginalia.nu/log/a_119_p... · Posted by u/ingve
90s_dev · 7 months ago
Have any of you ever thought to yourself, this is new and interesting, and then vaguely remembered that you spent months or years becoming an expert at it earlier in life but entirely forgot it? And in fact large chunks of the very interesting things you've done just completely flew out of your mind long ago, to the point where you feel absolutely new at life, like you've accomplished relatively nothing, until something like this jars you out of that forgetfulness?

I definitely vaguely remember doing some incredibly cool things with PDFs and OCR about 6 or 7 years ago. Some project comes to mind... google tells me it was "tesseract" and that sounds familiar.

korkybuchek · 7 months ago
Not that I'm privy to your mind, but it probably was tesseract (and this is my exact experience too...although for me it was about 12 years ago).
korkybuchek commented on Colossus for Rapid Storage   cloud.google.com/blog/pro... · Posted by u/alobrah
akshayshah · 8 months ago
Sure, but AFAIK S3’s multi-region capabilities are quite far behind GCS’s.

S3 offers some multi-region replication facilities, but as far as I’ve seen they all come at the cost of inconsistent reads - which greatly complicates application code. GCS dual-region buckets offer strongly consistent metadata reads across multiple regions, transparently fetch data from the source region where necessary, and offer clear SLAs for replication. I don’t think the S3 offerings are comparable. But maybe I’m wrong - I’d love more competition here!

https://cloud.google.com/blog/products/storage-data-transfer...

korkybuchek · 8 months ago
> Sure, but AFAIK S3’s multi-region capabilities are quite far behind GCS’s.

Entirely different claim.

korkybuchek commented on Colossus for Rapid Storage   cloud.google.com/blog/pro... · Posted by u/alobrah
akshayshah · 8 months ago
Very cool! This makes Google the only major cloud that has low-latency single-zone object storage, standard regional object storage, and transparently-replicated dual-region object storage - all with the same API.

For infra systems, this is great: code against the GCS API, and let the user choose the cost/latency/durability tradeoffs that make sense for their use case.

korkybuchek · 8 months ago
> This makes Google the only major cloud that has low-latency single-zone object storage, standard regional object storage,

Absurd claim. S3 Express launched last year.

korkybuchek commented on Colossus for Rapid Storage   cloud.google.com/blog/pro... · Posted by u/alobrah
jeffbee · 8 months ago
What on this page gives you that impression? Do I have to watch the 2-hour video to learn this?
korkybuchek · 8 months ago
Of course not. Gemini can summarize it for you.
korkybuchek commented on John Cage recital set to last 639 years recently witnessed a chord change   spectator.co.uk/article/w... · Posted by u/pseudolus
_petronius · 8 months ago
Some art-haters in the comments, so to defend this piece of contemporary art for a moment: one thing I love about it is a commitment to the long future of art, creativity, and civilization. What does it take to keep an instrument playing for six hundred years? To commit to that idea -- like the century-long projects of cathedral building in the middle ages, or the idea of planting trees you won't live to see mature -- is (to me) the awesome thing about the Halberstadt performance. All rendered in a medium (church organ) that has existed for an even longer time.

It's a pretty hopeful, optimistic view of the future in a time of high uncertainty, but also represents a positive argument: it's worth doing these things because they are interesting, weird, and fun, and because they represent a continuity with past and future people we will never meet.

Plus, you can already buy a ticket to the finale, so your distant descendants can go see it :)

korkybuchek · 8 months ago
Assume you already know about this given your interests, but just in case: https://longnow.org/

u/korkybuchek

KarmaCake day110October 16, 2024View Original