Ask HN: Let's build an HN uBlacklist to improve our Google search results?

>become worse lately thanks to all those StackOverflow and Github clones

A google search showing some of these leech type sites:

https://www.google.com/search?q=%22code+that+protects+users+...

For me, "farath.com" is outranking stackoverflow.

Siira · 4 years ago

> farath.com was first indexed by Google more than 10 years ago

This seems pretty suspicious? Is it reporting the first time Google crawled the main domain farath.com? How is that relevant information?

judge2020 · 4 years ago

This is the first time it crawled the domain at all. It's been a website since at least 2008[0], but was recently re-registered in 2020[1].

0: https://web.archive.org/web/20080607010730/http://www.farath...

1: https://who.is/whois/farath.com

nebula8804 · 4 years ago

Thats weird. I noticed no ads on this farath.com site. Are they going to monetize the email subscriptions somehow? How are they making money off of this?

endisneigh · 4 years ago

This is a great example of why "Google sucks!!11" is mainly FUD. Let's say you're looking for the SO link, which is #2 for Google. Let's compare:

Google ("code that protects users from accidentally invoking the script when they didn't intend to")

Link: https://www.google.com/search?q=%22code+that+protects+users+...

SO - #2

Bing ("code that protects users from accidentally invoking the script when they didn't intend to")

Link: https://www.bing.com/search?q=%22code+that+protects+users+fr...

SO - #2

Brave Search

Link: https://search.brave.com/search?q=%22code+that+protects+user...

SO - Not on page

You.com

Link: https://you.com/search?q=%22code%20that%20protects%20users%2...

SO - Doesn't load

DuckDuckGo:

Link: https://duckduckgo.com/?q=%22code+that+protects+users+from+a...

SO - #2 (seems to depend on refresh)

Basically they're all the same. Google is faster, but the order of the results is identical.

If you did a large scale analysis in this manner I doubt Google would lose.

tyingq · 4 years ago

I'm not sure it's a good example, really. It's an "exact phrase search" with quotes, which doesn't happen much in real life.

It was helpful solely to show what some of these leech sites are.

Searching for (without quotes): What does if __name__ == "__main__": do?

Is probably a better test of which search engine has better results for the real-life query. Google might still win, but it should do a better job of screening out the spammy sites. It used to be better at this.

Terry_Roll · 4 years ago

I have noticed that Google and Bing seem to present results which link to sites like stackoverflow.com where the questions and solutions are absolute FUD.

I think someone or an entity has been engaged in a consertive effort to manipulate the results if its not something more nefarious in Google and Bing's domain.

Very few entities have the resources to do this either, its not something a ragtag band of goat herders could, thats for sure!

code2life · 4 years ago

In my experience, the you.com apps and overall search results aren't affected by SEO the same way that some of the other engines are, which is why I think their results work for me

Deleted Comment

tut-urut-utut · 4 years ago

Just tried your search in both Google and Duck Duck Go. On Google first page spam copies are ~80% of the links, on DDG maybe 40%. Not good, but much better than Google.

ahurmazda · 4 years ago

I tried you.com[1]. The first few results seem quite relevant. Best part is that you can actually personalize the weights to assign to your search (your very own bubble)

https://you.com/search?q=code%20that%20protects%20users%20fr...

tobyjsullivan · 4 years ago

This isn't the same search. The parent post had quotes around the phrase. You.com returns identical copy-cat results if you do the same search.

To be fair, not sure what other results we'd expect if we're going to search for a specific, plagiarized phrase.

Edit: actually, upon review, you.com does indeed give one extra useful result within the top three. So one point to gryffindor.

ffhhj · 4 years ago

I saw you.com displays some Code Complete snippets but the lines are too short and doesn't get the language highlighting, which make it harder to read. Nice try anyway.

Isn't this Google's job? Are developers a small but lucrative target and so the suits at Google don't see the benefit of improving that experience by cleaning up the spam?

Can we just nudge them to do so under the threat of an influential minority leaving due to their use case being affected?

asdfasgasdgasdg · 4 years ago

Is there a word for this tendency to say, "it's someone else's job" as a justification for doing nothing at all to help or improve one's own circumstances? I see more and more of it in the public discourse over the last years and it kind of bothers me. I see it a lot in conversations related to poverty or climate change, but it is as we see here by no means exclusive to those topics.

To the original replyer: you could wait for Google to do something, but if they were going to fix the listicle issue, and it were fixable on their end, they'd probably have done it by now. I'm disappointed in the situation too but if there is a workable solution on our end it would be silly to ignore it because fixing the problem is someone else's job.

To the OP: I worry that the number of domains pumping out crap might be far greater than we know, and that might hamper the effectiveness of this. If the collaborative block list ever got big enough you might also have to deal with spam. But I think it would be a great thing to try. This is one of those issues that annoys me, but it's just below my action potential threshold. My biggest objection right now is the spammy recipe websites.

jazzyjackson · 4 years ago

https://en.wikipedia.org/wiki/Bystander_effect

sanketpatrikar · 4 years ago

I suggest this because there can only be so many websites that use SEO to game their way to the top and bury the good results beneath them.

If we manage to block them, we might be able to get a results page with good sites upfront and the other meaningless content below it. I assume Google will also surface good content along with the bad, so our blacklist might enable the good stuff to reach the top.

The spam problem, I'm sure of yet, but we might either be able to block enough of it to be satisfied or it won't pose a problem for most searches that are currently giving bad results.

ZeroGravitas · 4 years ago

I like to think of it as solving the problem in the right place.

It’s often possible to work around issues in lower layers, but it's usually at least worth raising it upstream to get it fixed 'properly'.

It'll help me when I dont have a blocklist active, and it'll help new programmers who arent familiar. It'll reward good sites with extra traffic and discourage new spammers entering the market.

In the worst case, if Google really can't or won't address tge issue, understanding the upstream problem more fully can help make a better workaround.

andyjohnson0 · 4 years ago

> I worry that the number of domains pumping out crap might be far greater than we know, and that might hamper the effectiveness of this.

I'm sure you're right about the number of spam domains, but Pareto suggests that blocking even a small percentage of them might provide a large gain.

https://en.wikipedia.org/wiki/Pareto_principle

sanketpatrikar · 4 years ago

I ended up creating a repo with blacklist.txt myself and will add to it for my own usage. I don't see anyone else who'd maintain this. Feel free to use it / contribute to it.

https://github.com/sanketpatrikar/hn-search-blacklist

renewiltord · 4 years ago

Internal v External Loci of Control perhaps?

germandiago · 4 years ago

> Is there a word for this tendency to say, "it's someone else's job"

At the risk of sounding pretentious, I call it "socialist", since they spend their lives telling others what to do or what is good or not for the rest of us but they rarely do anything about it. Surprisingly, this is the group that is really worried about poverty and climate change and do as much as I do for it, with the difference that I do it by myself, the few times I do it, not requiring the rest to do it.

It is always someone else who will do it. Though the other day I had a conversation with a non-socialist person that had that same attitude ("other should do it") towards what OTHERS should do. I really dislike that attitude, no matter where it comes from.

Point at hand: when I want or promote something, I am the first one to do it no matter others do it or not. The rest, no matter the ideology, all b*llshit.

As imperfect as I am, I try to do what I think is good (and sometimes my imperfection prevents me from doing it) but I do not spend my life telling other people why they are worse than me and telling them what they should do or not. The most I have for someone is good suggestions, never requirements.

nottorp · 4 years ago

> Isn't this Google's job?

Have you searched anything on Google lately? The answer is "no". Their new job seems to be to stuff your results with anything even remotely related (and sometimes related in a way that only machine learning can see) so you have things to click on.

Edit: with the lone exception of "find me this bussiness nearby".

mikevin · 4 years ago

It's very obvious Google is no longer the equivalent of grepping the web. There's some ML/NLP interpretation that's rewarded for returning the substring/interpretation that returns the most/highest ranking results.

It's very noticeable if your search contains a short keyword that has to interpreted in the context of the other keywords. As an example, if I search for 'ARM assembly' plus another keyword (macro, syntax etc) it will see 'ARM assembly' without the extra keyword has way more high ranking results and happily show me how much it knows about armchairs that don't require assembly. Ignoring the fact that the extra keywords are there specifically to limit the search results.

It's tiring, a lot of time I previously spent browsing the limited but valuable results it returned I now have to spend mangling the keywords enough to outsmart their ML/NLP interpretation and get it to admit I am actually asking for the thing I am asking for so I can finally get to the part where I have to solve the modern captcha: click all the results that are:

1. Not stolen/rehosted 2. Not a "Hello World" level Medium blog 3. Written by an actual human

goodlinks · 4 years ago

If I search for a business name its normally not the first result any more. Usually an advert for another company in the same space I dont want to use (basically an offensive / scamming result from user perspective) and then also the standard "buy search term on amazon".

ziggus · 4 years ago

I think you're wildly overestimating the influence of the minority within HN (or other similar communities) that actually care enough to switch to another search engine.

This reminds me of the Linux gamers who claim that they can influence game development companies by purchasing games with Linux ports, but wind up being less than 0.5% of sales of most games with Linux ports, which leads manufacturers to ignore that customer base almost completely.

fileeditview · 4 years ago

Not disagreeing with you.. big companies truly mostly ignore Linux but there are more than a few indie devs who support Linux as a platform. And I tend to play only indie games these days anyways because all the big commercial games have been reduced to some kind of click-and-succeed or free-to-play-and-milk-some-whales crap.

I personally am kinda happy where Linux gaming has come to be. Sure it could always be better but I remember times where there were only like 3 games for Linux and you had to compile them yourself..

yellowsir · 4 years ago

game companies might have ignored us, but in the end it created a space for valve and codeweavers to fill.

alangibson · 4 years ago

> cleaning up the spam

I'll be happy to be proven wrong, but I think Google is now fully in the 'optimize for engagement' camp. If that's what they're doing, it's by definition not spam (from their point of view) if people are clicking on it more than the non-spam results.

Again, only my guess as to what's going on. I don't see another good explanation for them only serving cloned Stackoverflow and top X lists for basically everything now.

onionisafruit · 4 years ago

From a user point of view, a search engine’s job is to link you away from the search engine, so how does a search engine measure engagement? Is it time on page or maybe number of searches performed? When you don’t have viable competitors both of those are improved by worse search results. Even number of ads clicked would be improved with worse search results because ads don’t have as much competition for your attention when there are no relevant results on the page.

sanketpatrikar · 4 years ago

It is Google's job, but they either aren't doing it or are failing at it. We could do something about it at least until a better alternative or a solution appears.

> Can we just nudge them to do so under the threat of an influential minority leaving due to their use case being affected?

Many influential people have tried and nothing seems to have transpired from it.

Google.com is the most popular website. I don't think the leaving of any minority group we manage to create would even matter to Google, let alone force them to fix the issue. Not that I discourage using alternatives.

anigbrowl · 4 years ago

Can we just nudge them to do so under the threat of an influential minority leaving

No. This is a classic mistake of intellectual types, who are impressed by each others' cogent arguments. But there is a much wider pool of people who are not, and among whom the intellectual types actual have very little influence, due to being boring and hard to understand (plus, it has to be said, kind of snobbish about how smart they are).

Now, you might reason that Google is full of smart people who should care about cogent arguments. But that assumes as an unspoken premise that Google's internal goal is to maximize the quality of the service and profit from being The Best. They passed that goal years ago are now so awash in money that it's cheaper to just squash competition than to innovate. They can be moved by threats to advertising revenue got up by angry crowds on social media (a market when they have little direct power), but Google would probably be delighted if grumpy nerds wandered off somewhere else. If they need talent or access to some compelling technology they can just throw a pile of cash at the problem.

BuyMyBitcoins · 4 years ago

>” Can we just nudge them to do so under the threat of an influential minority leaving due to their use case being affected?”

I sense Google is too big to cater to us like this. Despite a steady decline in quality, Google is still the dominant search engine and the competition isn’t even close to its market share. Not only would they not notice many of “us” leaving, the amount of change they would have to implement in order to satisfy our desires would end up changing the product for the rest of the market. On some level, the product managers must be satisfied with the metrics as they stand since Google is continuing with their current course.

tjpnz · 4 years ago

>Isn't this Google's job?

Or more fundamentally perhaps this is just the system working as Google intended?

ineedasername · 4 years ago

Google's goal isn't to create the best possible search engine, it's to have a search engine that is good enough that people won't actively seek an alternative at the same time that they put as much ad content as possible in there, again before it's so much that people seek an alternative.

I doubt many advertisers like the status quo very much either. They basically have to pay for ad placement to ensure the first results for their product aren't ads for competing products. On mobile when I search for Boox the first result linking to them is an ad. Same for Kobo. In other instances I'll search for company or product and a competitor ad is the first to show. So vendors get stuck paying for ads when their own site should probably be the first organic result, above the ads.

reaperducer · 4 years ago

Are developers a small but lucrative target and so the suits at Google don't see the benefit of improving that experience by cleaning up the spam?

Google doesn't make money from people finding what they're searching for. Google makes money by keeping people searching.

tut-urut-utut · 4 years ago

Instead of spending energy to change Google, why not just leave them for good?

Start with changing default search engine to DuckDuckGo or something else, install uBlockOrigin and Privacy Badger to disable tracking, and gradualy reduce using every Google or application, starting with Chrome.

Be the change you want to see.

sanketpatrikar · 4 years ago

I relate to this opinion. There are two reasons why my suggestion might still be useful:

1. DuckDuckGo too is affected by these SEO-gaming sites, so maintaining a blacklist will help us make that experience better too.

2. There are times when only Google can find us what we're looking for, so this will prove useful when we go back to it.

moneywoes · 4 years ago

No, google wants more clicks so they would prefer poor results that keep users searching

PragmaticPulp · 4 years ago

I think the disconnect comes from people expecting perfect search results as curated by humans, whereas Google necessarily must optimize for automated results. Automated results will never be perfect.

omnicognate · 4 years ago

Are you paying them to do it?

MarcelOlsz · 4 years ago

Worst case scenario if Google drops the ball I just go back to the library.

jstx1 · 4 years ago

I'm sure they have great books on stackoverflow answers, reddit reviews of products and opening times of local stores.

beepbooptheory · 4 years ago

Google has a fiduciary responsibility to shareholders, which is so much work as it is! Why are you trying to ask them to do more?