Being on HackerNews, One week on

Peaker · 14 years ago

9000 visitors a day averages about 0.1 visitor per second.

Let's say that the peak is 100 times more frequent than the average, so that would be about 10 visitors/second.

Being a systems developer, and not a web programmer, I work on software that handles tens of thousands of requests/sec. I don't understand why around 10 visitors/second would be a difficulty with semi-modern hardware. How many requests does that translate to?

This is a genuine question, can anyone explain what is it that takes so much work in handling a web request?

unconed · 14 years ago

Web servers used to serve just static pages. So when dynamic web applications were invented (e.g. CGI), they tried to slot in as transparently as possible. The web server would invoke a process, pass in the HTTP request, and get back the appropriate HTML to send to the client.

Languages like PHP follow the same model, and as a result, every single page request is processed independently. All the raw data is retrieved from the database, is processed appropriately for output (e.g. turning content into HTML), is run through a templating engine, and assembled with the right CSS and JS so it can be served. This is attractive from a rapid development point of view, because you can deploy changes instantly and can scale it out horizontally just by adding more servers, without any additional work.

However from an efficiency standpoint this sucks, and this is why the most common fix is to place a static HTML cache in front of it (e.g. Varnish) as well as opcode caches, object caches, etc. This only works if all your visitors see the exact same thing (e.g. a HackerNews discussion thread). If you use 'write through caching', then you can control the rate of updates independently of the amount of traffic you receive, and you can handle traffic pretty well.

If your pages are dynamic, you need a different approach. You'll want to cache all the static chunks of each page, and assemble them together with the dynamic parts on-demand. The extreme example is Facebook: everyone sees something different. The only way to scale this out is to parallelize everything, with your first tier of web machines making many simultaneous requests to a farm of servers behind them, delivering all the pieces within a relatively constant time.

The problem is that such a parallel architecture is both unnecessary for a small web app, as well as involves a leap in complexity and know-how that is undesirable for small teams. Hence, there is an increasing technological gap between what hobbyists/start-ups do, and what the giants are doing.

Edit: it's also important to realize that the web loves 'inefficient' dynamic languages not because they're dumb, but rather because development is very rapid, very experimental, involves designers, UX experts and marketers, and you don't want to be forced to make long-lasting decisions early in your development process.

InclinedPlane · 14 years ago

If you're hosting something chunky like stock wordpress on stock apache/php with stock mysql on the same server and your hardware is nothing more than a $20/mo. VPS from linode with a mere 512mb of ram then getting that sort of a traffic spike can quite easily shutdown your site. More so if you have an app that's even more resource intensive than WP.

If you're willing to throw either time or money at the problem then it goes away easily. But both resources are typically limited.

beck5 · 14 years ago

The problem with ShareLaTeX is the compiling of the latex documents, compiling a pdf may take a few seconds, if 10 people do that at the same time then things start to slow down....

___Calv_Dee___ · 14 years ago

I feel like this would be a common mistake when getting caught up in the moment of completing a dev project. You're so anxious to launch the site/app that load balancing factors could be overlooked. This seems like it could be really detrimental as you're pretty much canning your first impression when your backend crashes. Anyone up for sharing their horror stories of launching before their backend could handle the traffic? I feel like it could provide for some good insight in regards to patience and thought when it comes to launching.

verelo · 14 years ago

I love this traffic graph, looks very similar to past HN type analytic graphs ive seen.

I would say a big lesson people should learn from you and others is, use load balancers. At least if traffic spikes, you can add another server (provided you have an image sitting by waiting and your code doesnt mind being load balanced).

beck5 · 14 years ago

Makes sense when you get a bit bigger, for me a load balancer would just about double my hosting costs at the moment. Probably the difference between a weekend project like ShareLaTeX and a startup of 4 people.

shaka881 · 14 years ago

Collaborative typesetting is indeed cool, but the domain name promised far more excitement than it delivered.