Readit News logoReadit News
Posted by u/pg 13 years ago
How to get your IP unbanned on HN
HN has a hair trigger about banning IPs that request too fast (sorry about that; we don't have a lot of spare performance), so I wrote something people can use to get their IP unbanned once if it gets banned by accident.

http://news.ycombinator.com/unban?ip=<ip address>

Obviously you have to use it from another IP address, like your phone.

ck2 · 13 years ago
Use IPTABLES xt_connlimit to regulate overly aggressive client requests.

Since there are so few images on HN, there is no reason to have more than a couple connections per IP on port 80.

It will radically reduce your server load and there will be no blacklists/whitelists to maintain.

pandemicsyn · 13 years ago
...except people also browse HN from behind corporate gateway's/firewall's.
ck2 · 13 years ago
It won't ban them, it just throttles simultaneous connections from the same IP.

Unless there are dozens of people from the exact same IP and not an IP pool, it won't be a problem.

The worst case is they will see a longer delay for an initial page load in their browser by half a second. But it helps the server tremendously, especially since HN seems to use Apache.

bretthoerner · 13 years ago
I'm 99% sure HN runs on FreeBSD.
ck2 · 13 years ago
Good thing you left that 1% ;-)

   Server: Apache/2.2.22 (Ubuntu)
XT_CONNLIMIT does miracles for apache especially.

Actually looks like I am wrong.

Static objects are coming from Amazon while dynamic are coming from another server @theplanet.com

   Apache/2.2.19 (FreeBSD) 
So you are right, it's FreeBSB, but it's still Apache which really needs connection throttling. But there might be a reverse proxy in place. You can also IP throttle with a module in nginx.

smanek · 13 years ago
pg: I have fair bit of lisp dev experience. If, as a weekend project, I modified the HN src to use postgres and memcache would you consider using it in production? Obviously, I don't expect carte blanche prior agreement, but I wouldn't want to invest the time unless I thought it was plausible the work could actually help.

I would expect it to solve most of your performance problems for the foreseeable future (at the very least, by letting you scale horizontally and move the DB, frontends, and memcaches to separate boxes - plus ending memory leaks/etc by moving most of the data off the MzScheme heap).

The obvious downside is that it would use your (or someone at YC's) time. First to merge the changes I make to http://ycombinator.com/arc/arc3.tar into the production code, then to buy/setup some extra boxes and do the migration. We're probably talking, roughly, a day. It also has the unfortunate side effect of costing HN's src some of its pedagogical value, since it adds external dependencies and loses 'purity'.

Been looking for an excuse to learn arc for a while now ...

marcusmacinnes · 13 years ago
I suspect there's good reason why HN is still using this old codebase. YC after all is not short of the cash needed for a complete revamp.

The site is very much hacked together, but works... In a lot of ways, this reflects the hacker ethos of getting something up and running quickly at low cost while still producing value.

A revamp might have negative impact too by attracting a wider, more mainstream audience which could possibly dilute the purity of the community here.

Osmium · 13 years ago
> dilute the purity of the community here

Careful now :) It's not like there's anything stopping HN attracting a wider audience anyway; there's no restriction on who can register. Anyone can come and join in, which (in my opinion) is as it should be.

veemjeem · 13 years ago
There's also the usual engineer estimation: "Oh, it will probably take a day to rewrite the code. We'll deploy it and it will probably work just fine in production."

Any engineer that has live code has made this mistake before.

smanek · 13 years ago
Just to clarify, it will definetly take me more than a day to write/profile/test the changes (especially since I'll be learning arc in the process).

My hope is that it will only take a day or so to deploy it, once it's ready.

JoeCortopassi · 13 years ago
Very generous offer, but I would argue that HN's slow performance is a feature, not a bug. The average drive-by person, that is attracted to sensationalist articles and titles, simply doesn't have the patience for the slow load times of every page. The user that is seeking intelligent conversation, however, is more than willing to have 5+ second wait times if they know that they will be getting valuable content. Couple that with page load times having consistent slow load times, rather than surges of performance, and I wouldn't put past PG to build a delay into page loads to act as a sort of filter. Even if it's unintentional, I would still argue that is still useful in driving out some riff-raff
xvolter · 13 years ago
I also believe that Hacker News runs on a small stack of services developed by some past companies from Y Combinator.

I would agree that there is also little to no desire to make Hacker News "the news place" - where it supports thousands of posts a second and is extremely popular. In general Hacker News is used (and the hope is to stay that way) by startups and people interested in startups - it's slowly growing out to include more types of people - marketing, companies, blog posts who just want a lot of hits, etc - and not many people want to purposely support that.

eps · 13 years ago
This probably better belongs to a private email.
wglb · 13 years ago
The downside of this is that now there are many more moving parts to carry forward.
dylanpyle · 13 years ago
doesn't sanitize HTML fyi - may leave you open for XSS
pg · 13 years ago
Ack, what was I thinking? Fixed. Thanks!
tptacek · 13 years ago
The same thing every smart developer who ever committed or deployed a line of vulnerable code thought: "I'm just trying to get this feature done, not write a formal proof". You're in good company.
someone13 · 13 years ago
Do you have a rough set of guidelines for how fast we should request from HN? For a side project, I was thinking of writing something that scraped the HN frontpage and all the associated comment threads every 10 minutes or so, and I'd rather not cause performance issues or get banned. I'd be happy to rate-limit requests to whatever is convenient.
unreal37 · 13 years ago
May be better to use the official API.

http://www.hnsearch.com/api

laumars · 13 years ago
That's not an official API: http://www.hnsearch.com/about

Quote: "HNSearch was built by the team at ThriftDB to give back to the community and to test the capabilities of the ThriftDB flexible datastore with search built-in."

Interesting API all the same though.

tallanvor · 13 years ago
If it were an official API, wouldn't it be associated with HN or Y Combinator rather than an external website?
mvanveen · 13 years ago
The robots.txt file for HN suggests a Crawl-Delay value of 30 seconds.
citricsquid · 13 years ago
This might be helpful: http://api.ihackernews.com/

edit: oh, official API is above. Disregard this one :-)

freditup · 13 years ago
I'm curious to why HN would be walking such a performance tightrope. I could speculate, but it would be uninformed rambling, so I'd love it if someone more knowledgeable than I could explain.
grinich · 13 years ago
It's a side project by a couple of guys with full-time jobs, written in an experimental Lisp dialect and running on a single machine.
akkartik · 13 years ago
The last bit is key. HN is served off flat files, and caches state in-memory in global variables. That -- and not cost -- makes it hard to add a second machine.
TallboyOne · 13 years ago
I'd also like to know this

Deleted Comment

malandrew · 13 years ago
Awesome. I've gotten my IP banned several times after the browser crashed and I reopened the tabs (I had too many HN threads open prior to crash, enough to trigger the ban)
saurik · 13 years ago
Yeah... if I open Chrome I am pretty much guaranteed to be banned for days. :( The mechanism should really be changed to account for this: a ton of requests per second for only a few seconds should not trigger an issue, it should be a number of requests per second spike along with some sustained usage per minute. I actually made modifications to Chrome to change how it loads tabs mainly because of Hacker News' weird IP ban system, but I still got burned recently as I accidentally hit "undo close tab" one too many times, which reopened an entire window.
ars · 13 years ago
On firefox turn on the option "Don't load tabs until selected". I don't see this option in chrome.

It speeds up browser startup dramatically. Especially when you leave lots of tabs open as your "to read" list.

tjoff · 13 years ago
My solution is to use a firewall with per-application rules and just turn off network access for chrome before I launch it. On my laptop I just unplug the wired/wireless network for during the launch. This was mainly because of HN but also has the added benefit of taking less system resources since a blank page typically is less resource hungry than a real page.

Firefox has a better solution for this but then again, I don't use firefox.

Revisor · 13 years ago
Happened to me as well with starting Opera. I felt as if my most loved uncle slammed the door in front of my face.

I only loaded so many pages because I love HN. :)

The ban was lifted a few days later, not sure if automatically or thanks to my (unresponded) email request.

evx · 13 years ago
In my experience the banning is too strict.

It is triggered very quickly and it seems to last forever (maybe 15min would be better?).

I ask pg to kindly consider making it a bit more lenient.

I doubt HN goes under deliberate/malicious attacks, etc...

I'm making a HN extension that preloads some data such as the comments and the links on the next page (it's still with reasonable delays).

But at the moment it's impossible for it to function without risking the user getting banned.

EwanToo · 13 years ago
I've no doubt that HN is under pretty much constant deliberate, malicious attacks.

Pretty much any site with decent traffic is under constant attack, and the high profile of HN means it'll be under far more scrutiny than others.

nkurz · 13 years ago
Repost from "Show dead" that relates to this issue:

[−]sunstone1 10 hours ago | link [dead]

Well I never had my IP banned but I did have my account hell banned after about a dozen posts as you can see. Oh, actually no, you can't see, because it's banned. No, I never bothered to get another account, now I'm just a taker not a giver.

Most of the time it's clear why a user was banned, but looking at sunstone's history I don't really see a reason. While the algorithm will never be perfect, it would be nice if there was a clearer solution for misfires.