The low quality of results has been a problem from a while now and has become worse lately thanks to all those StackOverflow and Github clones. So I was wondering if we could come together and contribute to a single blacklist hosted somewhere and then import it into each of our browsers. Who knows? We might end up improving the quality of the results we all get.
Lists to get rid of the StackOverflow and Github clones already exist. [1]
I would love to contribute to a project like this, but won't be able to be a maintainer due to time constraints. Would greatly appreciate it if someone could host this. A simple txt file on github would do.
What do you say, HN?
[0]: https://github.com/iorate/ublacklist [1]: https://github.com/rjaus/awesome-ublacklist
A google search showing some of these leech type sites:
https://www.google.com/search?q=%22code+that+protects+users+...
For me, "farath.com" is outranking stackoverflow.
This seems pretty suspicious? Is it reporting the first time Google crawled the main domain farath.com? How is that relevant information?
0: https://web.archive.org/web/20080607010730/http://www.farath...
1: https://who.is/whois/farath.com
Google ("code that protects users from accidentally invoking the script when they didn't intend to")
Link: https://www.google.com/search?q=%22code+that+protects+users+...
SO - #2
Bing ("code that protects users from accidentally invoking the script when they didn't intend to")
Link: https://www.bing.com/search?q=%22code+that+protects+users+fr...
SO - #2
Brave Search
Link: https://search.brave.com/search?q=%22code+that+protects+user...
SO - Not on page
You.com
Link: https://you.com/search?q=%22code%20that%20protects%20users%2...
SO - Doesn't load
DuckDuckGo:
Link: https://duckduckgo.com/?q=%22code+that+protects+users+from+a...
SO - #2 (seems to depend on refresh)
Basically they're all the same. Google is faster, but the order of the results is identical.
If you did a large scale analysis in this manner I doubt Google would lose.
It was helpful solely to show what some of these leech sites are.
Searching for (without quotes): What does if __name__ == "__main__": do?
Is probably a better test of which search engine has better results for the real-life query. Google might still win, but it should do a better job of screening out the spammy sites. It used to be better at this.
I think someone or an entity has been engaged in a consertive effort to manipulate the results if its not something more nefarious in Google and Bing's domain.
Very few entities have the resources to do this either, its not something a ragtag band of goat herders could, thats for sure!
Deleted Comment
https://you.com/search?q=code%20that%20protects%20users%20fr...
To be fair, not sure what other results we'd expect if we're going to search for a specific, plagiarized phrase.
Edit: actually, upon review, you.com does indeed give one extra useful result within the top three. So one point to gryffindor.
[1] https://github.com/darekkay/config-files/blob/master/adblock...
On a much smaller scale, if anyone is interested, I maintain a black list focused on those code snippet content farms that gets in the way when you're searching for some error message or particular function here https://github.com/jhchabran/code-search-blacklist.
[1]https://www.cyberciti.biz/tips/about-us [2]https://twitter.com/nixcraft
I'll be honest, I don't remember how I came to that conclusion but I suspect I encountered an unsatisfactory answer to a question I was looking to answer, saw the .biz and drew my conclusions.
The noise to signal ratio for most of my queries is so high that I have to start judging a book by its title, not even its cover.
The .biz definitely does not help, since it hints to me that it's just another one of those worthless reposting sites, as someone else commented below.
When I see your site pop up in my search results I know the content is going to be more reliable than most of the others. Thanks for the effort you've put into it.
Glad you said something though, I wouldn't have looked at it twice without a human attestation.
The other comment made me remember there was captcha too, right? I had been using my own rented server as a VPN for all my internet access. But I'd have never blocked it for a public list - I've read the 'about me' page.
That's really frustrating. I'm building a faster search engine for programming queries and just added your site cyberciti.biz as a recommended and curated source of Unix/Linux material. Hope more devs get aware of your work and you (and your collaborators) receive the credits deserved. Thanks for your work of many years.
Can we just nudge them to do so under the threat of an influential minority leaving due to their use case being affected?
To the original replyer: you could wait for Google to do something, but if they were going to fix the listicle issue, and it were fixable on their end, they'd probably have done it by now. I'm disappointed in the situation too but if there is a workable solution on our end it would be silly to ignore it because fixing the problem is someone else's job.
To the OP: I worry that the number of domains pumping out crap might be far greater than we know, and that might hamper the effectiveness of this. If the collaborative block list ever got big enough you might also have to deal with spam. But I think it would be a great thing to try. This is one of those issues that annoys me, but it's just below my action potential threshold. My biggest objection right now is the spammy recipe websites.
If we manage to block them, we might be able to get a results page with good sites upfront and the other meaningless content below it. I assume Google will also surface good content along with the bad, so our blacklist might enable the good stuff to reach the top.
The spam problem, I'm sure of yet, but we might either be able to block enough of it to be satisfied or it won't pose a problem for most searches that are currently giving bad results.
It’s often possible to work around issues in lower layers, but it's usually at least worth raising it upstream to get it fixed 'properly'.
It'll help me when I dont have a blocklist active, and it'll help new programmers who arent familiar. It'll reward good sites with extra traffic and discourage new spammers entering the market.
In the worst case, if Google really can't or won't address tge issue, understanding the upstream problem more fully can help make a better workaround.
I'm sure you're right about the number of spam domains, but Pareto suggests that blocking even a small percentage of them might provide a large gain.
https://en.wikipedia.org/wiki/Pareto_principle
https://github.com/sanketpatrikar/hn-search-blacklist
At the risk of sounding pretentious, I call it "socialist", since they spend their lives telling others what to do or what is good or not for the rest of us but they rarely do anything about it. Surprisingly, this is the group that is really worried about poverty and climate change and do as much as I do for it, with the difference that I do it by myself, the few times I do it, not requiring the rest to do it.
It is always someone else who will do it. Though the other day I had a conversation with a non-socialist person that had that same attitude ("other should do it") towards what OTHERS should do. I really dislike that attitude, no matter where it comes from.
Point at hand: when I want or promote something, I am the first one to do it no matter others do it or not. The rest, no matter the ideology, all b*llshit.
As imperfect as I am, I try to do what I think is good (and sometimes my imperfection prevents me from doing it) but I do not spend my life telling other people why they are worse than me and telling them what they should do or not. The most I have for someone is good suggestions, never requirements.
Have you searched anything on Google lately? The answer is "no". Their new job seems to be to stuff your results with anything even remotely related (and sometimes related in a way that only machine learning can see) so you have things to click on.
Edit: with the lone exception of "find me this bussiness nearby".
It's very noticeable if your search contains a short keyword that has to interpreted in the context of the other keywords. As an example, if I search for 'ARM assembly' plus another keyword (macro, syntax etc) it will see 'ARM assembly' without the extra keyword has way more high ranking results and happily show me how much it knows about armchairs that don't require assembly. Ignoring the fact that the extra keywords are there specifically to limit the search results.
It's tiring, a lot of time I previously spent browsing the limited but valuable results it returned I now have to spend mangling the keywords enough to outsmart their ML/NLP interpretation and get it to admit I am actually asking for the thing I am asking for so I can finally get to the part where I have to solve the modern captcha: click all the results that are:
1. Not stolen/rehosted 2. Not a "Hello World" level Medium blog 3. Written by an actual human
This reminds me of the Linux gamers who claim that they can influence game development companies by purchasing games with Linux ports, but wind up being less than 0.5% of sales of most games with Linux ports, which leads manufacturers to ignore that customer base almost completely.
I personally am kinda happy where Linux gaming has come to be. Sure it could always be better but I remember times where there were only like 3 games for Linux and you had to compile them yourself..
I'll be happy to be proven wrong, but I think Google is now fully in the 'optimize for engagement' camp. If that's what they're doing, it's by definition not spam (from their point of view) if people are clicking on it more than the non-spam results.
Again, only my guess as to what's going on. I don't see another good explanation for them only serving cloned Stackoverflow and top X lists for basically everything now.
> Can we just nudge them to do so under the threat of an influential minority leaving due to their use case being affected?
Many influential people have tried and nothing seems to have transpired from it.
Google.com is the most popular website. I don't think the leaving of any minority group we manage to create would even matter to Google, let alone force them to fix the issue. Not that I discourage using alternatives.
No. This is a classic mistake of intellectual types, who are impressed by each others' cogent arguments. But there is a much wider pool of people who are not, and among whom the intellectual types actual have very little influence, due to being boring and hard to understand (plus, it has to be said, kind of snobbish about how smart they are).
Now, you might reason that Google is full of smart people who should care about cogent arguments. But that assumes as an unspoken premise that Google's internal goal is to maximize the quality of the service and profit from being The Best. They passed that goal years ago are now so awash in money that it's cheaper to just squash competition than to innovate. They can be moved by threats to advertising revenue got up by angry crowds on social media (a market when they have little direct power), but Google would probably be delighted if grumpy nerds wandered off somewhere else. If they need talent or access to some compelling technology they can just throw a pile of cash at the problem.
I sense Google is too big to cater to us like this. Despite a steady decline in quality, Google is still the dominant search engine and the competition isn’t even close to its market share. Not only would they not notice many of “us” leaving, the amount of change they would have to implement in order to satisfy our desires would end up changing the product for the rest of the market. On some level, the product managers must be satisfied with the metrics as they stand since Google is continuing with their current course.
Or more fundamentally perhaps this is just the system working as Google intended?
I doubt many advertisers like the status quo very much either. They basically have to pay for ad placement to ensure the first results for their product aren't ads for competing products. On mobile when I search for Boox the first result linking to them is an ad. Same for Kobo. In other instances I'll search for company or product and a competitor ad is the first to show. So vendors get stuck paying for ads when their own site should probably be the first organic result, above the ads.
Google doesn't make money from people finding what they're searching for. Google makes money by keeping people searching.
Start with changing default search engine to DuckDuckGo or something else, install uBlockOrigin and Privacy Badger to disable tracking, and gradualy reduce using every Google or application, starting with Chrome.
Be the change you want to see.
1. DuckDuckGo too is affected by these SEO-gaming sites, so maintaining a blacklist will help us make that experience better too.
2. There are times when only Google can find us what we're looking for, so this will prove useful when we go back to it.
Specifically blocking github clones seems doable. Adding anything else needs equally specific criteria or it will quickly become subjective and unfair.
As another commenter here said "Google does not make money by helping you find what you are searching, it makes money by keeping you searching". That only works when there is no competition. But once Apple would be in the game, people would use what presents them with the better results. Right now, I don't feel there is real competition.
it’s limited to popular queries, so for many searches you may get ‘no results, search the web (google)’.
i made a bit buggy web front end for siri search so i could better play around with the results https://luke.lol/search/
On the other hand you can see how Google is using its dominance in Search to push its browser and mobile OS - once you login to Google in Chrome on your phone, suddenly they can track you when you use their mobile Apps etc. And Apple is trying hard to grow in the "Services" field, i.e. through Apple Music and Apple TV - both available to Windows and Android users too. Just as they made a buttload of money with iTunes and the iPod because they also targeted Windows users.
I don't think I'd want to download a list of the most blocked sites and plug it into one of my tools though, for some of the reasons you mentioned.
On the other hand it is absolutely ridiculous to conflate the difficulty of occasionally adding a domain to a local filter, and assisting to build a random unproven search engine. People volunteer their development effort for projects they personally find interesting or challenging. If you want more developers advocate for the project don't try to scold people for wanting to spend a small amount of their time refining a solution that works for them.
Plus there’s no private Google API here, just an extension that removes search results from the page. I suppose you could say the extension APIs are from Google (Chromium) but they’re certainly not private and are commonly used.
> I’m not sure why you think that a domain blocklist would be harder than custom search engine development.
I didn't say this. The custom search is already created. Helping it's development is much easier now. AFAIK it's main problem is the lack of hosted servers.