Dear Google: please let me ban sites from results

I see a lot of people asking what happens when a group of people downvote a site just to ruin its ranking. Sure that's a problem, but there's an easy solution on Google's end: your blacklist only affects you. Yes, that means all of us have to hide efreedom ourselves. Doesn't seem like a problem to me...

Plus, we are talking about a company whose core business demands that it can identify groups of bad-faith voters. Given time, they may find a way to incorporate this data safely into the ranking data (if anyone could, it would be Google).

And I know there are extensions to do this (mine mysteriously stopped working recently), but doing this on the client-side in a way that's bound to a single browser install just seems wrong to me, especially for Google.

daviding · 15 years ago

I think a personal black-list would be ideal initially as those most motivated would be most helped, i.e. the majority of people who might not care about the status quo are then not impacted at all.

As mentioned above, then introducing the shared-ranking via the social graph would be the next logical step. It could be something opt-in'd to ease adoption.

Then, ideally (and this is my personal 'white whale' problem) it would be great to imagine something where the user could whitelist through no action of their own rather than have to do any work to block, i.e. use the result set 'hit' of what's clicked in the results to act as a personal ranking upvote.

There's some interesting engineering issues of per-user indexing though, but hey, you wanted to work at Google right?

prawn · 15 years ago

"Yes, that means all of us have to hide efreedom ourselves. Doesn't seem like a problem to me..."

efreedom is monetised by Google ads. Might seem like a problem to Google.

Let's say it starts with personal blacklists. Then trusted lists that you can subscribe to (AdBlock-style). Then word spreads and enough people are using it such that AdSense revenue drops 20-30% or more?

(IME, CTR on ads is much higher on these content-light sites than it is on more reputable sites.)

danudey · 15 years ago

To be honest, I think this is the reason Google doesn't have this feature. The sites everyone wants to blacklist are the spammers that game Google search and show Google ads. If they don't get traffic, they don't show ads. If they don't show ads, Google doesn't get that money either.

It's to Google's benefit that people end up on these pages, see a ton of ads, and then click on one out of confusion or desperation.

Dead Comment

peterhoffmann · 15 years ago

The blacklist does not only have to affect me, just throw in the blacklists of my social graph too. These are the people I trust.

AndrewO · 15 years ago

Which brings up another point: I think that if anything is going to threaten Google in the coming years, it'll be the quality of their social graph. Gmail gives them a lot of data, but if my inbox is any indication a lot of it is somewhat ambiguous. Aside from that their social apps haven't done too well (in most places).

Facebook, on the other hand, has developed a system where nearly every user activity creates a new easily processed and meaningful connection between users or out to the web itself. And those connections are probably closer to representations of some kind of trust than "I email that person a lot".

Anyway, I'm not saying the sky is falling for Google, just that search appears to be changing for the first time in awhile.

coderdude · 15 years ago

I don't know about you, but my social graph is pretty diverse. I wouldn't trust all those very different people to make an important decision like "what sites should be visible to me in search engines."

nervechannel · 15 years ago

This is exactly what I meant. Lots of people have added the idea about crowd-sourced re-ranking based on blacklisting, then said "it'll never work..."

AndrewO · 15 years ago

It seems to be a particular affliction among our group (by that I mean anyone who spends a lot of time here) that once we learn to apply one revolutionary/disruptive idea, we can't stop even when we probably should. I'd say it's because we're used to trying to think of how to scale-up every idea, but I've been accused of armchair-psychology before... :)

Travis · 15 years ago

We can also look to the AdBlock extension. They have prefilled lists that you can customize, so it's a "shared ecosystem" that you subscribe to, then customize yourself.

kgc · 15 years ago

I think this could be modeled after Gmail's social spam filter.

rorrr · 15 years ago

It should be pretty easy to set up public block lists. The ones that are honest about their methodology (of which sites are spammy) would win.

No, it's not particular hard, but it will make the problem worse.

Why?

99% of users are non-tech oriented.

Those users will not really be aware of the specific problems with the search results, they won't understand the concept of a good vs bad result and they certainly won't bother to tweak/ban/filter their results.

The 1% that do care and are currently being vocal about it will start filtering their results and they will perceive that the problem is solved. They will stop making a fuss.

So now, the complaints have gone away, but 99% of users are still using the broken system, so the good sites that create good original content are still ranking below the scrapers and spam results for 99% of the users.

The problem must be solved for all (or at least the majority) of users.

(And you can't take the 1%s filtering and apply it to all users in some kind of social search because the spammers will just join the 1% and game the system)

Hoff · 15 years ago

The problem must be solved for all (or at least the majority) of users.

Perfection being the enemy of good enough, and a common and valued and traditional mechanism to delay product shipment.

And Google might well be able to utilize information from that 1% of users that have sorted that out - 1% of a Really Big Number of searches, factoring for the folks looking to game the search results (downward, in this case) - to provide feedback back into their search results.

sudont · 15 years ago

This goes back to the traditional engineering parable: is it better to create a million dollar car that gets 900mpg (gasoline), or to make a 5 dollar widget that adds another 5mpg in every car?

3pt14159 · 15 years ago

>99% of users are non-tech oriented.

I disagree. Let's call it 95%.

>Those users will not really be aware of the specific problems with the search results, they won't understand the concept of a good vs bad result and they certainly won't bother to tweak/ban/filter their results.

So have only people that have enabled the advanced features of Google search ban sites. All of a sudden only people that "get it" are the ones that can ban.

>So now, the complaints have gone away, but 99% of users are still using the broken system, so the good sites that create good original content are still ranking below the scrapers and spam results for 99% of the users.

So we need to use the votes to stop the spammers.

>(And you can't take the 1%s filtering and apply it to all users in some kind of social search because the spammers will just join the 1% and game the system)

Sure you can. If you couldn't then Reddit would be a wasteland of adds, but it isn't. They only have 4 or 5 engineers there and they can write code that will stop vote rings, let alone Google.

It's actually a pretty simple exercise to stop vote rings, unless the anti-vote ring code is open sourced, but even there it should be possible.

nervechannel · 15 years ago

Do you think 99% of users are too stupid to click 'report spam' when they get a spam email?

SimonPStevens · 15 years ago

Not "too stupid" no.

I think 99% of email users have not been adequately trained in why or how they should report spam, and even if they were I think most of them would still not care enough to actually do it with any regularity.

When pushed many may acknowledge that they know it exists, they will probably even be able to find the button when asked if given a chance. But they won't remember to do it when they see spam, they'll just ignore it and move on to the messages from people they know.

matthiaswh · 15 years ago

Not all spam is so obvious, both in email and search results. These are all assumptions, but I think it's safe to say the average person identifies the email 'Grow your Pe Nis like a Woman' (true story) as spam. However, the less obvious 'Enter to win a trip to Hawaii' still fools many people. Think about how effective the 'I am a Nigerian Prince that needs help out of my country and then I will give you 1.2 million dollars' scam has been over the years.

With search results, the spam is more often than not Made for AdSense sites that the average user doesn't realize are pure garbage. Then there are the mass-produced content sites like eHow that most technical people realize are worthless, but the average user loves. It isn't often you see Viagra sites popping up in searches for woodworking. It does happen occasionally though.

So no, I am pretty confident a majority of users would not utilize effectively a feature like that.

kingsidharth · 15 years ago

You report spam when you know it's spam.

And when it's about search result. People are browsing and clicking through adsense filled "landing page sites". Most of them think that it's their fault that they couldn't find the thing they were searching for.

kingcub · 15 years ago

I think you are ignoring a very obvious fact, those 1% that are no longer 'complaining' are really doing so in an automated fashion with their blacklists. Further I think the complaints will be even better because people are more inclined to a fix a problem if it is easy to do so (adding it to their blacklist) than sending an email saying please remove this from my search results (and hoping it gets removed in the future), or doing an ad-hoc in browser solution that does not give feed back to google.

What I think really needs to be exploited is a ring of trust type aspect, I'd like to have the Hackernews ring where all us on here work together to remove the spam from our results and let's Google see what are taking out, maybe that will help them improve their algorithms.

AJ007 · 15 years ago

Google already has reading level data per domain (append &tbs=rl:1 to any site search)

Why not apply that reading level algorithm to users gmail data and public social network profiles, estimate the users IQ, and then those at the top of the pile are given "result burrying" moderator privileges.

Confirmed user accounts (cell phone verification) combined with other algorithms, such as profile age and activity, could make spamming sufficiently complex to de-incenvize all but the most illicit spammers.

Users at the bottom of the IQ pile (non-logged in users based on past search data and geo-location socio-economic status) don't even get the option to bury results. Which, by the way I think is more like 20% of US internet users than 99%.

slaker · 15 years ago

That's not true. Google is already having a Spam button in Gmail. The same idea can be applied here.

Dead Comment

al_james · 15 years ago

Yes that would be good. They could then look at the number of people blocking certain domains and de-weight them in the global results.

Traditionally google seem against human powered editing (as this would be), but I think as the black hat SEOs run rings around them, its needed badly.

eli · 15 years ago

An extremely easy way for a bunch of people to get together and destroy someone's ranking? That doesn't sound like such a good idea.

docgnome · 15 years ago

4chan would go bananas.

robryan · 15 years ago

If this wasn't at all preventable the idea of adwords would never work, people would just be running up competitors costs.

csomar · 15 years ago

Given that Google gets hundred of millions of search and visitors, then you'll need a hundred of thousands of down votes to get black-listed. No black hatter can really do that (create a 100K account/IP to avoid Google radar and down vote the website).

jobu · 15 years ago

Perhaps, but I'm sure any legitimate site could contact Google and resolve the issue...

True, but as it stands, someone's ranking can be destroyed by not playing the black hat SEO game when your competitors are and allowing their crappy spam filled site to outrank you. Swings and roundabouts.

thushan · 15 years ago

Anyone have any ideas why the clone sites like efreedom are ranking above Stack Overflow when SO's inbound links and reputation values are likely far better than efreedom's in Google's algorithm? I'm surprised that search engine optimization could do THAT much to a site's ranking. Also it's not like SO isn't doing the same kind of SEO themselves.

What I'm trying to get at is, with all things equal, let's say Stack Overflow and efreedom's SEO is on par with each other, shouldn't SO's reputation/inbound link ranks automatically trump things?

btilly · 15 years ago

My understanding is that the clones take the material and modify it to have exact matches for phrases that people are likely to search for. The exact match causes it to rank higher for those searches.

SO is not editing the material for SEO, they just have whatever content the users generated.

hessenwolf · 15 years ago

Efreedom is deliberately manufacturing links, whereas SO may be just hoping for the best.

stcredzero · 15 years ago

This would only work if 1) it was sufficiently painful to put in a block for your searches and 2) this had no effect on global results.

1) - It doesn't have to be extremely painful, just painful enough, such that true loathing is needed as motivation. This way, we filter out frivolous decisions. A few seconds pause would be enough.

2) - We need to let the reduced ad revenue do the job for us through the market. Anything else will be gamed much to everyone's detriment. Just empower people to remove the annoyance, and let the money do its thing.

Re 2, yes. A lot of people have reacted as if I suggested letting people affect everybody else's results. I'm not convinced that's possible to do safely, I'd be happy just to see it for my own results.

Re 1, painful? WTF? The whole point is to make it quick and usable. I can already blacklist sites the painful way, by adding them to a Google Custom Search page. The whole point is I'd like a quick add-to-killfile button, like email clients have had for decades.

radley · 15 years ago

Google does provide this service: it's called Google Custom Search. You can prioritize or blacklist sites and it's pretty easy to add it to your browser searchbar. I don't always use it, but I'll switch to it when I encounter a spammy topic, usually dev-related searches.

http://radleymarx.com/blog/better-search-results/

yason · 15 years ago

Not so fast.

CSE wants you to list sites that you want to search from. Of course, you can't default to '' or '.'. They even stated that '.com' and '*.org' etc. won't return any results. That's unacceptable. Secondly, given you could configure it meaningfully it seems it's pretty hard to configure your browser's search bar to use this CSE instead.

And that's what I think most people use for searching. At least I do.

Facebook got it right this time: with each post, there's an option to hide that post, that person, that application, or that site which posted the post. One click that means "don't show stuff from them anymore": that's what Google needs, too.

I made my custom search secondary.

By this I mean I added it to my browsers, but I still use regular Google search daily. If the results is laden with bogus sites, then I switch over and start again, weeding if necessary.

Initially I thought I'd use GCS all the time, but it lacks the Google menu (Images, Maps, etc) which comes in handy more often than I expected. I use GCS most for code/development related searches.

If you leave the included sites empty, and only supply excluded sites, then it searches everything and excludes those particular ones.

dejb · 15 years ago

I can't believe this isn't the most popular comment. Kinda makes HN look a bit like a knowledge vacuum with all these recent discussions of how to ban results when a google service that's existed for nearly 5 years can do the job. The format of the results isn't quite a nice as the normal google search though.

Err, I mentioned this well before the comment above:

http://news.ycombinator.com/item?id=2075437

It's not so much that it's a knowledge vacuum, just that someone didn't read the whole thread before replying.

I'm on the "wrong" side of the Flash/HTML5 debate, so my average post value is really low. It wasn't a big deal until the HN algorithm got tweaked recently.

Deleted Comment

jimmyswimmy · 15 years ago

Agree with the sibling poster - this is the best comment in the thread. Do you have any interest in sharing your hard work with us?

If you go to the CSE website and select 'Advanced' and then download annotations, you can export the list of sites you've excluded.

Further you can make the exclusion list ("annotation list") into a feed - so it is entirely possible to implement the kind of user-generated blacklist of sites which has been discussed here.

GCS is rather personalized, which makes sense. For example, I don't want experts-exchange showing up, but some people have paid for their service and want it. I'm also a Flash developer, so my list probably won't be useful for most HN readers.

GCS is really easy to set up - takes only a few minutes. I spent the most time hunting down rouge sites - which was actually kinda fun and cathartic.

Big tip: keep an easy-to-get-to link to the GCS Sites Control panel, so it's easy to add new sites. I've added ~40 more in the past two months.

Pewpewarrows · 15 years ago

Gmail already does it, and the global system uses an algorithm to look at reported spam results in order to automatically move future emails from that party to the spam folder automatically, not just for the person that reported it, but for everyone.

If they're not looking into integrating that nicely into the existing search results page (not a separate form that the average user will never find or use), especially after all the internet chatter about it recently, then they definitely should make that a top priority in 2011. I definitely don't want them to do a rush job on it though. I don't want competitors to start reporting each other as spam in search results to try and game the system even further. I'm assuming they have anti-gaming measures in place for Gmail, so they won't be completely starting that from scratch...

mtkd · 15 years ago

I don't see how you could anti-game this, the SEOs would just use mechanical turk to hire 100s of people (with valid Google Accounts) to do the reporting.

At best G could use the information as a list of potential spammers and filter domains manually, but I really can't see this being automated without giving the SEOs another weapon.

moe · 15 years ago

I don't think anyone wants to filter sites that could be gamed with "100s of votes". We want to filter sites that will require tens of thousands of votes to get rid of.

pixelbeat · 15 years ago

Google were experimenting with voting on results: http://techcrunch.com/2007/11/28/straight-out-of-left-field-...

Also there is this form for reporting spam sites: https://www.google.com/webmasters/tools/spamreport

Integrating the above into standard search results would be difficult unless it was restricted to users with a good "karma". That might be possible in our increasingly socially networked world

cosmicray · 15 years ago

You could also restrict it to people who buy, and sell, on Google Checkout. Putting real money/goods on the table tends to weed out the fake accounts.

CWuestefeld · 15 years ago

The thing is, the SO scrapers like efreedom aren't spam, strictly speaking. It's just that they clone existing content without adding value, and as such are just noise in the results.

Perhaps we need to frame the discussion differently, considering what the searcher wants, rather than "spam-free hits".

That was my point really. I don't want to see eFreedom hits, I consider them spammy, so I'd like to be able to click-ban them from my results.

If Google use that information to gradually adjust their ranking overall, then fair enough -- won't affect me, I can't see them anyway.

EDIT: Even if they don't let that affect everyone else's results (because of gaming), then I still don't care, I still don't see the crap in my results ever again.

Luc · 15 years ago

Also, I would like '[any widget] review' to take me to an actual review, not pages upon pages of spam. I usually end up looking at comments on a few trusted sites (e.g. Amazon). This seems broken...

Yes, most of the results for this query wind up pointing to pages saying "be the first to review [any widget]".

As a workaround, try searching for "[any widget] sucks" and "[any widget] good".

EDIT: tying this to other discussions on the topic, it's a symptom of Patio11's observation that natural language search doesn't work very well. If you want to find something, you need to paint a picture of what it looks like, rather than asking a question about it.

tokenadult · 15 years ago

I've noticed for years that "[product] hosed" brings up good results on how to work around bugs in various software products.

tesseract · 15 years ago

automated version of this: http://www.sucks-rocks.com/

djhworld · 15 years ago

I think the worst culprits are the ones that skim StackOverflow questions and rehash them into their own supposed original "question and answer" site

bartl · 15 years ago

Jeff Atwood discusses this in his most recent Coding Horror blog post (http://www.codinghorror.com/blog/2011/01/trouble-in-the-hous...), in which he states that he doesn't really mind people copying StackOverflow questions (and answers), but that he does mind that the copies get a higher Google ranking than the original.

retube · 15 years ago

How does that happen though? efreedom must be doing something special with SEO to get higher than SO given that SO must have huge page rank score.

ergo98 · 15 years ago

What do you think most StackOverflow answers are? It's a karma-paid labor pool where you can post questions and a lot of under-employed people will rush out and do the necessary Google searches, collating and slightly rewriting the results to yield the most votes.

Everyone is ripping off someone's content.

And just to be accurate here, SO content is creative commons (created by the community). Are those just cheap words?

What do you think most StackOverflow answers are? It's a karma-paid labor pool ... said the comment on Hacker News, earning the poster 3 points so far.

bigfudge · 15 years ago

Except that thare are some genuinely useful compilations on SO which you don't get elsewhere, e.g.: http://stackoverflow.com/questions/72394/what-should-a-devel...

These add significant value to the original content IMO

stackoverflow consistently provides high quality answers though. If someone was to come along, take the creative commons material and present it in a way that actually improved my experience of finding the answer I was after I'd be all for it.

In my experience though the sites that are taking the content are ad ridden messes which remove value rather than add anything.

jonhendry · 15 years ago

" It's a karma-paid labor pool where you can post questions and a lot of under-employed people will rush out and do the necessary Google searches"

Maybe it's like that in your field, but in Mac dev questions you're fairly likely to get answers from established OS X developers, and even Apple employees.

It's not the answers I'm bothered about, it's just how they spam my search feed. I whack in some queries to google and the first 10 results will just be the same SO question but spread over multiple different sites.