Some really good thoughts here. I'll summarize the ones that hit me:
- "Why are people searching Reddit specifically? The short answer is that Google search results are clearly dying. The long answer is that most of the web has become too inauthentic to trust."
This is it for me exactly. I search for the following kinds of things on Reddit exactly because results on other sites aren't trustworthy: Reviews are secretly paid ads. The "best" recipe for pancakes is only what's trending on instagram right now. The latest conditions on mountain bike and hiking trails are being shared inside communities like Reddit but not on the web. The same for trending programmer tools.
- "It is obvious that serving ads creates misaligned incentives for search engines..."
What I'm shocked by is that Google somehow maintained a balance on this for so long. Well, at least a good enough balance that people still use it primarily.
- "Google increasingly does not give you the results for what you typed in. It tries to be “smart” and figure out what you “really meant" ..."
This is the most annoying behavior because I really mean what I write.
- "There’s a fun conspiracy theory that popped up recently called the Dead Internet Theory..."
I hadn't heard of this. Now that's some sci-fi level of conspiracy but in today's world it seems totally plausible.
There's a reason why it seems shocking that Google has been able to balance the ads well enough that people still use it. They haven't! Google has orchestrated a monopoly over search engine distribution that allows them to get away with search results that are dominated by ads and spam, without losing most consumers.
Let's be blunt here - almost no consumer consciously chooses to use Google search anymore. Google has a distribution monopoly through Android, its deal with Apple on iOS and MacOS, and on desktop through Chrome.
I'm working on a search engine startup. It is in all practical senses impossible for an iPhone or Mac user to change their search engine to a new search engine on Safari or at the iOS level. And despite being technically possible on desktop with Chrome, it is for all practical purposes beyond what any typical consumer can easily do.
Their monopoly over distribution - not search result quality - is what keeps consumers searching Google and clicking ads.
> It is in all practical senses impossible for an iPhone or Mac user to change their search engine to a new search engine on Safari or at the iOS level.
On my IOS device, under Settings -> Safari -> Search Engine, I have a drop down with options, including Bing and DuckDuckgo, but defaulted to google.
On Macos, with Safari running, Safari -> Preferences… -> Search, Search Engine I have a drop down, defaulted to google, with Bing and DuckDuckgo amongst other choices.
Agreed on google”s effort to get their search engine as the default. However I just don’t understand how changing search engine is impossible given what I’m seeing on my devices? Nor does it seem over the top onerous to my eyes.
"Let's be blunt here - almost no consumer consciously chooses to use Google search anymore."
This bluntness does not go far enough. People do not change defaults, no matter how "easy" it may be to do so.
A default is a pre-made choice by someone other than the consumer. There is no set-up process where the consumer makes a choice. The choice has already been made. Consumers do not make this choice. Even if they could, in practice they don't. That fact may seem insignificant but it is worth billions of dollars.
If I am not mistaken, the current CEO of Google spent most of his time working on "default search engine" (or "default web browser") deals before taking the CEO job. In probably the most important one, Google pays Apple a hefty sum to be the default search engine. It was estimated at $10 billion in 2020 and $15 billion in 2021.[1]
Defaults are effectively permanent settings. It does not matter how easy it is to change a default setting if practically no one ever does it. $15 billion is too much to pay for something that may or may not change. It does not change. It is money in the bank.
Just tried https://andisearch.com/ and I like it. Felt like a fresh look on results instead of the same old SEO ones. For example, searched for a few Java queries and found very informative website/results that weren't dominated by Bealdung. Searched for "soccer scores", "chelsea FC", "prince andrew", "WP export" and found things that would never have been on Google's first page, but were excellent returns. Nice work.
There's a reason why it seems shocking that Google has been able to balance the ads well enough that people still use it. They haven't! Google has orchestrated a monopoly over search engine distribution that allows them to get away with search results that are dominated by ads and spam, without losing most consumers.
I disagree. Two to three years ago I could get more what I wanted in a complex search once I tuned it properly. So Google had a twenty year run of good and useful searches. Google also worked to strong arm their monopoly, yes. But I claim they still served some quality after that. It's not that unusual for a monopoly built on quality to maintain their quality for a period of time after it achieves that monopoly status - institutional standards die but they can die over time.
> almost no consumer consciously chooses to use Google search anymore...Their monopoly over distribution - not search result quality - is what keeps consumers searching Google
I don't disagree with this as a fact, but I think there are a lot of things that work this way that aren't actually monopolies in the competition-preventing sense. If I wanted to launch a new breakfast cereal, getting my product into grocery stores would be one of the major challenges of starting that business. Competition for shelf space is a core concern of a lot of consumables. This definitely creates a lot of stickiness and barriers, and that comes with its share of downsides, but there are also good reasons that distribution systems work the way they do. Transaction costs are important.
“…almost no consumer consciously chooses to use Google search anymore”? 6-8 quality leads in the last 14 days (ave. sale at $3,200) on less than $220 spent on ads begs to differ with you. We’ve only started advertising the last two weeks. We’ve had calls and form submissions _all_ from Google and we only launched our site roughly 45 days ago. I’m not a Google fanboy and I think Search does need an overhaul but people are mostly definitely using Google Search. Another client of mine gets 8-12 new customers per month all from Google searches and she doesn’t spend a dime on Google.
Is there a reason why you've chosen the chat style interface vs the standard search box at the top and results at the bottom layout?
This is not a comment on the search results itself - always appreciate the efforts to break out of the standard google results and surface other sources, but I found the interface confusing and the previews were also taking up a lot of space. A compact view would be better - or giving the option to turn the previews on / off.
"almost no consumer consciously chooses to use Google search anymore"
May I ask how you arrived at this observation? This is the first time I am hearing this. I know of NO ONE who uses any other search engine. The term "Googled" is not yet a proxy for other search sites.
I would think that there'd be an online opportunity for a search engine that only searches humanly curated sites. Those sites would be ones that have quality information rather than spam. Some obvious examples - wikipedia, reddit, hackernews, public domain books, etc.
It's easy to game an algorithm, but hard to game a human - humans know garbage when they see it.
As an aside, whenever I get a prescription, included with it is a dense two page sheet of detailed information about the drug. I see nothing like that online with a search. Why is this sort of thing not online?
Just tried andisearch and am extremely impressed. It has so far handled all queries I have thrown at it better than brave search and DDG. Will continue to experiment, best of luck and awesome work!
Firefox on Linux Mint was pointing at something else for a while (DDG I think? Bing? I don't recall).
I gave up after a few weeks and had to switch it back to Google. Google's not perfect - it's never been perfect, it was just better than the alternatives - but it's still less bad than others.
Distribution will come if the product is better, but it is a hard problem. I try every new search engine I can and they are always worse/slower than google.
I have tried using DuckDuckGo as my default search engine, but Firefox changed it back to Google with every update, so eventually I just gave up on that endeavor.
>Let's be blunt here - almost no consumer consciously chooses to use Google search anymore
Do you have anything substantive to support this? I highly doubt it is true given the fact that the verb "to google" literally means "to search the internet".
> It is in all practical senses impossible for an iPhone or Mac user to change their search engine to a new search engine on Safari or at the iOS level.
There are five (very simply accessible) different choices for Safari on iOS.
But if you switch to iCabMobile on iOS there are TWENTY-FIVE search engines to choose from.
> This is the most annoying behavior because I really mean what I write.
Tons of people don't, though. They type whatever unprocessed half-second thought they have into Google and expect Google to lead them to the water, even if they're tugging and trying to go in the completely wrong direction. Google has optimized for working 'most of the time' for 'the most people', and that means striving for fixing the complete word soup of search results people type in.
A single mediocre experience optimized to work ‘most of the time’ for ‘most people’ is quite contrary to the narrative that has made Google such tremendous amounts of money (“let us surveil you so that you can have a more personalized experience”) though, isn’t it?
Given all of the data collected about Google users, ought not one of the applications of that data be some way to give users specifically what they are searching for if their past behavior suggests that they mean what they type? Couldn’t the “search only for <exact query>“ option be a very good data point on making that determination automatically, or enabling a user setting for “give me exact results based on what I actually typed by default”?
It seems possible to me that this behavior has more to do with the value of ads for “big” keywords than with (poorly) inferring user intent.
This used to be solved by allowing queries like `Class Inheritance +ruby' to require results to include "ruby". They killed this for Google+ by changing it to quotes, so `Class Inheritance "ruby"' but now they interpret even those. When I use Google, which is less and less, I am not looking for a fight with a computer to express my intent, I'm looking for the answer to a question. That never seemed to be an issue until recently.
This is very helpful if I search for a name I didn't quite pick up or don't know how to spell, or if I only remember fragments of a quote or topic, then I just blurt out my stream of consciousness and Google will mostly point me in the right direction. That being said, I wish I could explicitly tell Google to treat my query more literally. Ideally you would be able specify the search query in some kind of grammar. They have these kinds of prompt mechanics for GPT3, so I doesn't seem too unrealistic, even if it's all ML nowadays.
That's like speaking to little children, that are learning to talk, reproducing their errors. Some adults believe that it's cute, but it's idiotic, confuses the babies and make their progress more difficult and slow.
I honestly find it pretty helpful. You can type "russian murder painting" into Google and it will come up with Ivan the Terrible and His Son. All that hinting may be annoying if you know exactly what you wanted, but I'm not a specialist in everything I ever search for.
Right, although piping junk into the search box and expecting it to bring back something useful is trained behavior.
I've been using DuckDuckGo a lot more recently and the thing that surprises me isn't the kind and quality of the results, it's that I actually need to use my brain to search.
It's not about whether this is a good or a bad thing—I kind of like the precision in a way, it's just jarring how different it is as an experience.
Do they? I see this stated all the time, with no references.
They type whatever unprocessed half-second thought they have into Google and expect Google to lead them to the water
Perhaps if Google didn't try to fix things for people, they would be more thoughtful with their searches.
Take away the junk food, and people will resort to real food. The same way some cities limit parking at big events so that people have to take mass transit. It's for their own good, but they have to be shown the way.
Google has optimized for working 'most of the time' for 'the most people
This may be Google's goal, but it hasn't happened yet.
I don't have very many friends or acquaintances in the tech bubble, so I base my observations around real people in the real world. More and more they're giving up on Google entirely.
Their primary search engines these days seem to be Instagram, Pinterest, Etsy, Amazon, and other non-Google sources.
When I ask someone why they're searching Amazon reviews for tech support information, they tell me because it's not on the web. That's Google's failure.
I see what you are saying but it seems to me that it used to do a much better job at that. These days I feel like I'm fighting the search engine constantly and it is certainly not magically finding what I want anymore. It feels like some crusty unmaintained tool that I have to know how to use.
It's funny to observe my stepson learning his way through Google. It's happening mostly through the assistants on TVs and locked cellphones. But he's learning to do exactly what you said: half-second thoughts and brute forcing many queries for the same subject. He's 7.
A less charitable interpretation --- and unfortunately one that could be true --- is that Google does not want you to think. It wants to keep you stupid because it's easier to deceive those who can't think and bend their thoughts in the direction that gives G more $$$. I'd say it's not merely optimising for the stupid; it's actively encouraging it. It wants to be your brain, control your thoughts and life.
Google has optimized to whatever sequence of behaviors achieves the most profit. The search results are not chosen for utility to the user but as nudges in a cycle of influence intended to drive you to attend to an ad, purchase something, or consume particular content.
They should not be engaged in non-consensual manipulation of social or political behaviors, and the ethics of market manipulation at scale through advertisement are far from clear.
Another factor that isn't being fully accounted for is a new SEO/marketing technique where many people are asking scripted questions publicly on sites like reddit and then stealthily providing answers that market a product or service. This leads to reddit results not being exactly authentic as well. Pretty much most online reviews cannot be trusted as we are begged to do positive reviews of companies (and when companies outright purchase positive reviews, which is also very rampant) also as a factor.
Though Google is at fault for letting their service falter to the "payola" race, many other factors are in play all across the Internet since data quality has faltered almost totally. For major-cost and non-refundable purchases I need to trust, I go to brick and mortar stores and inspect what I am buying. I am thankful not everything has shifted to an online-only model. It's going to be a very bumpy ride on the Internet until Congress and consumer protection laws wake TF up and do their job.
Is Google unique in that, though? Amazon reviews are worthless now because companies pay customers to leave positive reviews or pay review farms to leave positive reviews. Even if someone reports them to Amazon, though, the companies just close their accounts and open new ones with different names and sell the same product. It's so trivial for them to pivot when they get caught that I'm not sure there is a solution to this problem.
Reddit does have astroturfing, but a lot of communities are aggressive about identifying and banning shills, so it's not as widespread as in google search results.
IMHO it's not so much of a problem with google search, but the internet as a whole.
Most genuine discussions have moved from open, publicly accessible web to places inaccessible to search engines and general public. Smaller niche forums, blogs and personal websites with no financial incentive have died out. People have moved to Facebook, Discord, Whatsapp, Instagram, Slack, Twitter and other places behind logins. Online newspapers and portals are increasingly using paywalls. Most of the genuine human interactions and quality content is not indexable anymore. Instead we have a million affiliate marketers fighting for the top positions in search results with every possible seo trick.
Reddit is one of the last places with huge amounts of publicly accessible online discussions.
It's because private communities are the only ones free from mass abuse. Public forums moved to private discord groups with hard to find invite links / etc because public forums take an army of anti abuse workers to keep alive. While a discord group just needs a few people to kick the trouble makers and maybe revoke the invite link for a while.
But your overall point is valid, I think. My pointing to a single active forum doesn't change the fact that many of other enthusiast groups have moved to facebook groups and the like.
Feature, not bug. If there's no public search, it can't be gamed for money. The problem TFA identifies is one of discerning that the person you're getting your info from is an actual person, who cares. The best way of doing that, until we find some way of creating institutional trust in these matters, is talking to the sort of person that spends all day in talking to a chat room about whatever it is you're asking about.
I too have found myself searching more in Reddit. Not to throw shade on Reddit, but even if I find exactly what I’m looking for in there, it’s depressing that it’s all bound up inside of another walled garden who will eventually have the same incentive as Google: squeeze every last advertising dollar out of the produc… I mean users. Like Google, it’s just a matter of time before they too lose their balance.
A question worth posing to this community: how can we build an internet that’s hostile to advertisers? Secondarily, how can said internet also be much more accessible to content authors so they won’t have to learn a css, html, and JS to publish some stuff? Finally, how can that content be discovered from within this network?
I "search" reddit a lot, but all my searches are always through Google. Reddit search is notoriously bad, even after multiple attempts by them to fix it. Suffixing my Google searches with "reddit" though gives all the results I'm looking for.
A factor in there has got to be 'who pays?' If it's hostile to advertisers, then there's got to be money to pay for the infrastructure from somewhere.
Maybe a tax on ISPs? I think I'd happily pay $10 extra per month for access to an ad-free interrnet. Maybe $20. But how many of the people that are already happy with the ads and poor google results would do so? Would it be sustainable?
> how can we build an internet that’s hostile to advertisers?
You have to reify "trust" into concrete, computer-representable data. Maybe borrow the "web of trust" concept from PGP, but do some sort of multiplicative thing where the amount you trust someone's recommendation online is the product of the trust relationships between you and the recommender. That's really the best you can do - even legislation against online advertising will be subverted by companies that go through layers of proxies to buy influence.
One small thought — having the search engine be configurable, so that the user can specify which sources to give priority to (e.g., Reddit, NYT Wirecutter, Wikipedia, etc.), would be an incremental improvement.
I, too, search "<search term> + reddit" often for product reviews and such. Thing is, the results on that front have started to slide as the paid review side of the internet catches on. I'm finding that it's getting harder and harder to trust the reddit search results - lots of shill accounts and obvious junk. That's not a google problem, specifically, but it's another degradation of a workaround for declining search result quality :(
Yeah, a lot of subreddits are clogged with the same bad info that's gotten all over Google's front page. The stickied "list of recommendations" on an enthusiast sub is just the same as you'd get from clicking the top result of "Best X 2022" on Google, complete with affiliate links
About the "dead internet conspiracy" - I've worked in writing how-to articles for a fairly large "help" website. They paid very little attention to the quality of the articles. I was paid for each piece and thus had about 30 minutes to write an article and later integrate feedback from internal review. Otherwise the payment became too low.
The most important factor was cramming SEO terms and links to keep people on the website into the articles.
The result is trashy articles that could well have been written by a bot but aren't. This could possibly be done with the help of curated bot-content, but I think we're far away from the point where this is really more profitable than getting students to do the work.
>> This could possibly be done with the help of curated bot-content, but I think we're far away from the point where this is really more profitable than getting students to do the work.
It may be becoming borderline. I expect that sentence/paragraph completion is already becoming useful to people who churn out quick content for a living. In any case, the important part isn't whether or not it's bots. The important part is whether or not it's authentic. The precise meaning of authenticity gets squishy, but it exists nonetheless.
IMO the sentiments are correct, whatever the details. Part of why google sucks is that the internet is worse, for a bunch of the things we use google to search for. The internet becoming a larger, more profitable industry changed it. Instagramming for influencer perks, SEOing, or selling targeted ads like FB do... it does not lead to the same places that earlier iterations of the WWW produced. Times change.
My friend briefly had a copywriting job writing weed strain descriptions for dispensaries. He was never provided the product he was describing, just told to make it up.
The other day I looked up the wordle answer (I know I know). The first result was a site where I had to scroll through about 19 paragraphs of SEO vomit to get to the answer. The page could literally have one word on it and serve it's purpose. If that isn't a sign that the the internet, or at least google search, is dead, I don't know what is.
Speaking of bots, I'd be interested to know the percentage of articles on major traffic content sites are authored or co-authored by AI.
My suspicion is this is rife given how many articles read poorly and are almost entirely fluff. If this is true it would appear we are doomed to algorithms shaping our online experiences, which is worrying given the existing shrinking diversity of opinion and content. It's like a entropic gene pool in nature, but with information.
Lately, half the results I get are pages that I cant view without paying to some service or signing up for a free trial.. Google literally serves up results that are unreachable ..
Classic example of this kind of content... Try searching "How to use X to get stains out of Y".
You will find a page for almost any X and Y combination. And they will all have wording like "Put some X on the stained Y... wait a bit... rub it in... and then put it through the washing machine. Hope it works!".
I would expect an automaton to be able to spell; so perhaps the presence of spelling errors is a mark of an authentic page. Maybe one could force Goo to spit out authentic results by including a strategically-misspelled word in the search terms.
Totally agree on the reddit point, I've also noticed the same occurring to me. The girlfriend recently got Pokemon Arceus and sometimes asks me to Google something she wants to know.
It's completely pointless, you just get a bunch of articles from news sites (??) that transcript the quest but not tell you anything more. I miss a nice community wiki like I'm used to from playing Dark Souls etc.
I've just started appending site:reddit.com to everything, works a lot better.
Yeah, Fextralife saved my ass multiple times while working through the dark souls series. It's a shame that type of community resource isn't more popular.
Google could fix this by making the algorithm take into account searches that often end with "reddit", thus applying more weight to Reddit results to similar searches where the user didn't include Reddit in it. Clearly it's an indicator that those are the better results.
Take StackOverflow for example. Almost any programmer will find a SO result as the top result and it's usually exactly what you're looking for. Since there's no money to be made by companies writing blog posts on debugging a compiler error, Google's algorithm works as intended.
Question is: Why hasn't Google done anything about this?
It's the organic results that are terrible, so they're not losing ad revenue by placing these garbage sites at the top. Perhaps its to intentionally make better websites pay for ads to get better placement? But those won't be the ones to ever pay to begin with...
I think it would be foolish to assume google hasn't spent hundreds of hours in meetings talking about what they can do about everyone having to type reddit. Problem is they are facing an army of SEO experts who are one step ahead of google. As well as legal issues. Imagine if it was found google was artificially boosting reddit in an unfair way.
I did software dev at a marketing firm for about a year, and it was pretty soul sucking, so I know what you mean. I won't work at one again unless it's literally my only option.
I agree google is bad, but I think reddit is rapidly becoming equally as inauthentic. I'm sure every major player at this point understands the gains that can be had by astroturfing reddit. The real problem seems to be the internet is inherently untrustworthy and going back to finding people you trust in the real world is the only fix I can see.
We've also got to address the people and corporations that are gaming the system that google has created. Google is by no means off the hook, but Marketing practices have also taken a very bad turn to deception and in reinforcing a payola systems recently that we may never be able to recover from trust-wise.
> - "Why are people searching Reddit specifically? The short answer is that Google search results are clearly dying. The long answer is that most of the web has become too inauthentic to trust."
Haha, the noobs. I use HN instead sunglasses cool face
A bit more seriously: I fully agree with this. And if HN doesn't have what I'm looking for then I use Reddit as well. But if HN has some info on the topic with a few highly upvoted threads, damn, it always impresses me.
When I'm looking into some project or piece of software I'm unfamiliar with, I really do search for HN posts on it. Fastest way to cut through (enough) of the biased material and get something genuine. Even hyped stuff usually has enough contrarian posts to give you an idea of where to look for the skeletons.
Me too I even have a bookmarklet in Firefox so that I can use a prefix (hn) and the search is rewritten as site:news.ycombinator.com, to make sure all results are limited to HN.
I also have the same kind of bookmarklet for Reddit and Google Scholar.
The reason that the quality on Reddit is higher is because there’s people moderating those quality subreddits. Without those moderators it would all turn to crap and be just as useless as Google.
(This isn't an argument against your point, just a bit of additional context that increasingly is odd to me as Reddit gains more and more social weight)
> - "There’s a fun conspiracy theory that popped up recently called the Dead Internet Theory..."
> I hadn't heard of this. Now that's some sci-fi level of conspiracy but in today's world it seems totally plausible.
I never believed in conspiracy theories, and after I read "Media Control" by Noam Chomsky I understood there is no need for conspiracy theories once you understand how individual incentives are aligned and how individuals always act to maximise profits.
Someone on HN phrased this and I am not taking credit for it but it explains beautifully whats going on: "Google is not making money by showing you the best search result they can, they make money by keeping you searching."
That does not make sense. If searches did not result in satisfactory results, then people would stop searching.
Which they are, evidenced by restricting searches to HN or Reddit.
This is a problem for google, maybe not right this minute as growth might offset dissuaded users, but nevertheless, it does not behoove them in the long run to provide garbage search results to people.
I view the Dead Internet Theory as Black Mirror style satire. All it would take is liberal application of GPT-3 style transformer AI to content generation and much of the Internet could be fake. You could have fake political trolls arguing with other fake political trolls from the other side, fake blogs, fake review sites, etc. and it would take me a while to notice. Most of the modern Internet is just that bad.
Advertising always creates perverse incentives. It works in traditional media too. Look at what happened to things like Discovery and The Learning Channel when they became subject to advertising based pressure for ratings. They went from having actual educational content to being full of tabloid trash.
The death of the authentic web really chimes with me. I have an almost physical reaction when I occasionally come across a page that isn’t trying to sell me something, that is a labour of love.
A month or so ago, I was trying to help someone retrieve some very old Wordpress for Mac files. I found http://www.columbia.edu/~em36/wpdos and was so touched, I sent the author a few dollars for a coffee
Google's problem is they've virtually nothing (given their resources) to "commoditizing their complement". https://www.gwern.net/Complement
Google's compliment is web sites. What have they done to make a web site easier to make?
They even killed their RSS feed. They have released a bit of web tech, but their offerings are generally a bit sad or only solve Google problems (e.g. Go).
If you want to distribute an .exe or .app, MS and Apple have released some pretty good tools to help. If you want to write a blog or make a simple web app, it's unlikely you're going to think "Google has some great stuff to help, and has awesome tools". Mozilla's web resources are better. Microsoft's web resources are better.
>> "Google increasingly does not give you the results for what you typed in. It tries to be “smart” and figure out what you “really meant" ..."
> This is the most annoying behavior because I really mean what I write.
Yeah I remember this being mentioned in a local presentation at university. As a great thing. Google doesn't search for what you write, but what you want.
The problem is that very often Google don't know what I want. Before they introduced this, I was able to define my query so that I got exactly what I wanted.
Dead Internet Theory is totally believable. I remember back in 2006-7 or so, being slightly curious about putting up a food review site because I was really angry at my local X food establishment. I found places were you could buy complete restaurant database ready to be scripted onto the web for maybe 90 a pop. The data was actually pretty good, but I was shocked by the huge community around flipping these dbs into internet spam. For the very high majority these were hungry business types who could barely open a code editor without asking for help. I only expect the problem has gotten exponentially worse since then now that ai generated content has improved in quality.
I think the OP is weak because it conflates ads and seo spam. Yes, Google went all in with ads, and yes, this hurt its credibility and the quality of its products.
But there is no conceivable universe where seo spam isn't the arch enemy of Google. Google needs to fight spam to survive, it knows it and it does. But that's hard. So hard in fact, that nobody else has cracked the problem, and for all the anecdotal evidence of someone switching to Bing, Google's marketshare is still utterly dominant.
The Dead Internet Theory may have some weight: Google hasn't dropped the ball, but it is slowly drowning into the sea of "content-free content".
That said, the reason the Reddit trick works is that it uses information Google explicitly excludes when ranking content (engagement signals).
Google has a bunch of “objective standards” that it uses to paternalistically shape what the web looks like. Many of these are divorced from what users actually want for pieces of content (https, AMP, a life story in front of recipes to demonstrate authorship, etc).
I think there’s actually two dimensions to this problem and the article and your comment only address one of them.
The other dimension is that, in the past, if you searched for stuff your results were likely to be a blog or a forum thread. Today the bloggers have evolved into instagrammers, TikTokkers, YouTubers, or Podcasters. The Forums and community pages have moved to Facebook, Twitter, Slack, Discord, etc.
So it’s not just that SEO and botspam has eaten the Google results page, it’s that this is all that’s left of the open internet that needed search to navigate. Much of it truly is a wasteland. Google owns part of the blame for crippling RSS and privileging recent pages and specific domains or AMP pages over evergreen, self-hosted content in search results. But also users have given up on the open internet in droves. Instead of starting a fansite they start fan subreddits or discords instead.
That inauthenticity comment really hit home for me, too. I realize that I do not trust the internet at large, and haven't for a long time. That's been the real trigger for my retreat from mass social media into smaller, tigher online communities.
Even HN is starting to feel like it wants to sell me something.
The latter feeling may be because HN is run by a startup accelerator. They run literal native ads on the front page whenna startup they sponsor goes live.
This fits well with my own worldview. I've been griping about Google results for years, and jumped for DuckDuckGo when it became usable. I'm sure that fifteen years from now, DuckDuckGo will be ad-infested crap and someone new will come along to replace it, just as Google replaced AltaVista.
Even DDG knows that it can't handle everything, and so it has its bang shortcuts. I've used the !reddit one, and I'd use !w (Wikipedia) except I do those from the Firefox search bar.
I've heard the "everything's a bot" theory before, but never saw a name put to it before. I'd have to guess that 99% of all SMTP traffic is spam at this point.
>I'm sure that fifteen years from now, DuckDuckGo will be ad-infested crap
In terms of direct ads, perhaps. But for SEO spam, in many cases, DDG already seems to be there. For example, things as simple as "python datetime", "python json", or "python datetime.now", where it would seem obvious that the top result would be the documentation for the module/function, have spam sites above the actual Python documentation. Meanwhile, search for "matplotlib", and your screen will fill up with ads.
> This is it for me exactly. I search for the following kinds of things on Reddit exactly because results on other sites aren't trustworthy: Reviews are secretly paid ads.
So are reddit-comments and entries. There it's even worse, because most people don't connect the content with manipulation.
> The "best" recipe for pancakes is only what's trending on instagram right now.
So, like reddit? I mean every platform has their hive mind, and reddit is even worse, because the hive mind can be manipulated with paid upvotes, not just reposting and comments.
> The same for trending programmer tools.
Aren't most of them yet again commercial products from companies?
"It is obvious that serving ads creates misaligned incentives for search engines..."
I don’t think this is the problem. The problem is the need for public companies to grow exponentially. If you take away the need to constantly grow exponentially, then ads on search can be both balanced well and make an insane amount of money.
Based on some of the April Fools' Day experiments that Reddit has done in the past, I'm not sure why you wouldn't have the same hesitation and mistrust of Reddit posts and comments. So much of the content, even on Reddit, is made by bots or copied by bots from older, legitimate user-generated content.
There’s something sad and ironic about using Google to search Reddit. One, I mostly dislike using Reddit - I only want to see specific discussions very occasionally. Two, what is the state of the internet if I have to use the best search engine to find content on a website I mostly dislike? Haha.
Reddit's search engine is kind of crap (not terrible, but also not great). That's why I use Google for Reddit, to have a better Reddit search experience.
I figured that's why it's so high, is Reddit's UX keeps slowly getting worse so the best way to find stuff on Reddit is by searching outside of it.
Searching Reddit helps but the quality of comments has gotten lower since 2015 or so. It seems to coincide with the wave of subreddit bans and the nakedly politically-driven moderation on subreddits. And with the reflexive attitude—against anything countering the Reddit consensus—that developed during the Trump years. High quality posters seem to have withdrawn from the site (at least in how much they comment) and what's left is mostly ignorant teenagers and bitter millennials with shitty jobs. In turn, that crowd is much less likely to upvote high-quality thoughtful content, so the cycle continues. The decline in quality has trickled even into the less popular subs. Don't get me wrong, the site has always had problems, but the more recent decline in thoughtfulness is dramatic.
The worst part is that despite Reddit getting so much worse, there is no other place that's grown to fill the void. This place is great, and I do search HN when it makes sense, but it's small and narrow in scope. Reddit basically crowds out any competing websites by sucking up all the low-level chatter required to sustain a community, but has also pushed away high-quality posters, who now have no place to go. Very tragic but maybe a good case study in shitty network effects.
Quality on any non niche popular subreddits were already abysmal long before 2015. Reddit is useless for anything which is not highly specific but there are some diamonds in the rough: great subreddits exists about fashion, knives, gardening, coffee, shaving and plenty of other weird interests.
I dunno, looking at the growth curve in Paul Graham’s tweet I expect most of the drop in average comment quality can be attributed to the size of the user base. It’s hard to keep high-quality content the norm even in much smaller communities.
/r/nfl had a reputation for high-quality content and wasn’t a particular battleground in the Trump Wars. It’s still a good breaking news feed, and the live game threads are fun, but every post is dominated by joke comments and memes.
I find the idea that banning Trump supporters killed reddit to be pretty far-fetched.
To the extent there is a change in quality, it probably comes from other factors, including having a bigger, broader, and different user base now than in the past (and only a small portion of that change likely came from Trump-related bans).
>most of the web has become too inauthentic to trust
i think that specialised search engines are gaining ground. For example, I am using github search for searching code samples, that works better than google.
You might want to check my side project that tries to explore the subject. I have a search tool / catalog of duckduckgo !bang operators, i am hoping that it allows for better discoverability of specialized search engines.
The latest addition is a description for each search engine, just hover over the name, and you get a description derived from the sites meta and title tags.
I think that specialised search engines are gaining ground, it has become easier to set one up, thanks to elasticsearch/lucene. They can be quite good, for a limited domain, and they don't have to invade your privacy in order to find out what you are looking for. I think that what is missing are tools like this, that would aid the discovery and use of these search engines. I hope that this will allow them to eat into the market from the 'low end'.
> The latest conditions on mountain bike and hiking trails are being shared inside communities like Reddit but not on the web
This point especially rings true for me, but it also concerns me a bit. Reddit has killed a lot of other forums over the years. If something happens to Reddit, we run the risk of losing a large corpus of information.
>> "Google increasingly does not give you the results for what you typed in. It tries to be “smart” and figure out what you “really meant" ..."
> This is the most annoying behavior because I really mean what I write.
I hate this too. I do get typos corrected by Google. But I don't need that - if I put a typo in my query and get bad results, I can correct the typo myself. But if Google decides I must not have meant what I actually said, there's no way for me to correct that. It's a ridiculously bad tradeoff - we eliminate errors that are trivially fixed by introducing errors that can't be fixed at all.
I have similar feelings about phone input autocorrect, which automatically converts typos that are very easy to read into (mostly) correctly spelled words, plus (sometimes) completely unintelligible nonsense.
I agree. Often, what I am after online is to see what other realpeople are saying about something. typing 'reddit' into google is basically a proxy for "please google for the love of good, can you start indexing actual human discussions again?".
I have a similar habit, I often search "[topic] forum" - I'm not fond of Reddit specifically due to accessibility issues although in some cases I still go there because it's the only good source.
>because results on other sites aren't trustworthy
Reddit is gamed way more than google. Paid posters, moderation of anything against a narrative. Google search may be dying, but reddit ain't doing much better.
> The long answer is that most of the web has become too inauthentic to trust.
It's only a matter of time before reddit too becomes too inauthentic to trust. Not only is it directly funded by advertising, its audience is mainstream enough for advertisers to invest time and money posting fake opinions in order to make it look like it's coming from real people.
I seriously hope I never see comments or news about people appending hacker news to searches. I don't want advertisers to kill this site when they catch wind of it.
They have filed for an IPO last month with the SEC so they should go public very soon.
Last valuation was at 10b$ which is ridiculous for a website that can literally get its most popular subreddits shutdown arbitrarily whenever a small group of extremely online volunteer mods decide to "go on strike" by locking the subs because they don't like something/someone else on the website.
It happened before and the admins yielded to them so I don't see why it wouldn't happen again, especially since it's not like they can run the website without that weird cabal of (mostly delusional/psychotic) power mods doing their work for free.
It's perhaps a little bit early in its creation to be sharing this but I am working on a new search that should help to fix the problems mentioned in the article, https://namusearch.com/. It allows you to build (and share with others) a curated list of websites that you want to use for searches
Yup, when I want to search anything I use a combination of reddit, HN, and Discord. My main use of Google these days is to find a website I forget the name of but roughly know what it's called. In the olden days, I used bookmark aggregation sites like del.icio.us to search for relevant content, which was generally more fruitful than a Google search.
I didn’t really even think about this properly until just now.. these days I am looking at Reddit, Facebook groups and if needs be, YouTube (videos not by ‘creators’ as far as possible) to find information I used to google. Ads and referral links have totally ruined the usefulness of so much information.
I agree that Reddit remains a good source for info, but I’ve found that Google usually does a good job surfacing Reddit results — usually in the top half. Though this is most notable when I’m googling for esoteric info about TV shows and video games (e.g. “best build mass effect 3”)
"The latest conditions on mountain bike and hiking trails are being shared inside communities like Reddit but not on the web."
I just wanted to mention that a friend of mine made an app for user-reported trail conditions that might be worth taking a look at:
https://trekko.app/
> I search for the following kinds of things on Reddit exactly because results on other sites aren't trustworthy: Reviews are secretly paid ads.
There is so much shilling on Reddit if you knew it would blow your mind. I wish more people realized this. Reddit is the best place to shill because not only is it ridiculously simple, people also automatically assume you’re not shilling, and then once you seed the idea, everyone else will do the shilling for you indirectly.
The healthiest way to use Reddit is like Wikipedia: assume the information you’re reading is highly compromised and biased in one way or another, but use it as a starting point in your further research and it’s a great tool.
Reddit posts are not your friends. Upvotes do not mean the contents of the posts are legitimate or not shilling.
Reddit is the best place to shill and the sooner the non-shillers figure that out, the better off the entire internet will be.
Reddit also - in my opinon - actively enables shilling and botposting. Why do they have an API?
A forum that's meant to be 100% about humans talking to humans doesn't need an API, so why does it expose one?
Also the model of user-created and user-moderated subreddits actively enables the creation of shill accounts. It's trivial to create a subreddit and use it to farm karma with a ton of bots. If you can keep real users from ever entering your walled garden of a subreddit (of which there are many) your bots will never be detected until you wipe their comment history and set them loose on the rest of the site.
Its not about reddit as a search, it's about using reddit to validate your search because the alternative would likely yield poor results. You could trust the 10 listacles that came up as the first results that all look oddly similar, or you can try and filter through reddit by including it in your search terms
An important thing to realize, too, is that this is a problem that keeps getting worse. The article talks about product reviews and recipes, but it's been spreading a lot further than that. Recently I was trying to look up a technical error, and found a lot of web pages that seemed to be auto-generated with "How to solve [error_scraped_from_the_web]", complete with a list of generic things unrelated to the error (IE, "Step one: try turning your computer off and turning it back on again. This is usually a good first step, and you'll be surprised at how often...").
Likewise, I wonder how long appending "Reddit" will work. As others have pointed out, Reddit shills are already relatively common, and it's becoming increasingly common for bot accounts to create lots of random comments to appear to be human (such as finding a thread with thousands of comments, then copying and pasting the comment to another place in the thread or to another thread, or auto-generating a simple sentence based on other comments in the thread).
Sometimes the advertising hordes move so fast they kill something before it even takes off, like what happened with Clubhouse.
Free project idea for someone with more time on their hands than me:
Classical search engines determine trust automatically, based on various factors including "link neighborhoods" where trustworthy sites link to other trustworthy sites. These automated strategies are clearly breaking down; the spammers are winning the arms-race.
So maybe we need to go back to human-based trust.
People used to curate lists of websites, which partly solved this problem but didn't necessarily scale. I wonder if that idea could be supercharged.
Consider a browser extension that people install, which:
a) gives users a button to mark a site as trusted/favorited
b) tracks domains visited (and frequency)
Then, separately, you can manually add people you know personally to your "network". You trust them, so anything they trust is also something you might be able to trust. Manual favorites could be weighted higher than frequently-visited sites, and both could be displayed inline next to links on all pages you visit. You could also see which people the trust in a given link comes from, in case some of them consistently have bad judgement about these things and you want to remove them from your list. Then, finally, you could create a personalized search-engine that only indexes the sites determined to be trusted by your personal network.
Of course this would require placing a great amount of trust in the extension and service themselves, so maybe they would have to be open-sourced or self-hostable or something (a profit motive might create a huge amount of temptation to abuse the data). That's a stickier problem.
Edit: There was a little ambiguity left here about transitive trust; “friends of friends” type stuff. I think if this went on for unlimited hops, we’d be back at square one. So maybe it only uses direct contacts, or maybe some small N of hops (where longer ones are weighted lower?). Maybe this would be configurable, not sure.
Also re: privacy, maybe you could come up with a clever way to E2E encrypt the site visit data, even though it’s shared with many parties?
This is very much in the spirit of what we were trying to do with trove.to [1] — give people an easy way to curate & annotate lists of websites, and layer a social graph and endorsement system on top of those lists.
The problem we encountered is that the vast majority of people are not hyper-organized list makers — the 1% rule of the internet [2]. To create a "human curated search engine" with any utility, you need a massive amount of manually-categorized data — data which most people are simply not interested in generating. This is why no social bookmarking site (e.g. delicious, pinboard, etc.) has ever taken off to hundreds of millions of users.
I still think there's something exciting to be built here, but it will likely need to take a more "automated" approach as you suggested.
The biggest problem, doing that, is that you increase the pressure from bad actors to websites that are trusted by lots of people.
So, for instance, the more weight you give to sites that are quoted by Wikipedia in your search rankings, the more content farms will have incentives to sneak edits that link to their sites.
There are ways to counter that (eg moderation), but in general, defense is more expensive than offense.
I like this idea of incorporating trust and reputation. As for the curation of websites not scaling, some time ago I thought about the possibility of a search engine where the user supplies a list of trusted websites (for example, university websites, blogs of people they admire), and the search engine ranks pages based on link distance to these websites.
This reminds me of Cory Doctorow's "whuffie" from Down and Out in the Magic Kingdom (man, that title takes me back!).
Whuffie is, roughly, money determined by your social interactions. More importantly, others also have a queryable score that's weighted according to who you esteem highly - this sounds like what you're proposing!
I was thinking something similar recently, and I also believe it's an idea worth exploring. Something else to add to this conversation. There's an obvious difference between two cases in which a trusted person trusts a url:
1) Single contributor website (blog, personal page...): It seems that we could spread the trust the whole website in the algorithm (at least more than for the next case)
2) Multi contributor website (forum, newspaper): It seems the trust should be given at an URL level
Something worth delving into if we are designing this trust based search engine in real-time here at HN ;)
In that post, I don't address the reputation management aspect as much, but it's central to making the whole thing work, and I think crowd-sourcing and a well-conceived reputation management system that can influence results are good next areas for exploration.
I think the Keybase project would have been great for providing the "authentication" part of this solution. To bad it died on the vine of the Zoom purchase.
You could also incorporate "reputation" of the author. Basically have a real person, the author stake their real identity on the quality of the blog post they wrote.
This feels like the underlying issue. Google may have stayed the same, or even slightly improved.
But the web, in the sense of quality:crap ratio, has gotten substantially worse.
This flood seems like the ultimate manifestation of turnkey hosting solutions.
Imho, we could do worse than reviving an idea from email's early days vs spam: negligible per-use charging. The idea was to tax emails at $0.0001 (or somesuch). Insignificant for actual users, but financially decimates high-volume, low-value spammers.
This has been happening a lot with StackOverflow and GitHub pages lately. A lot of the times, the actual GitHub or SO link won't even be on the first page.
I'm surprised they haven't done some kind of manual pruning of junk like that, or maybe they have and it's not working... but on the surface it totally seems like they could implement something that says "GitHub has content X, and these other 10 sites are 99% the same, but we've flagged GitHub as an authoritative source so they'll always outrank the clones".
Maybe it's a fear of appearing unfair. Or maybe they secretly want to hurt Microsoft by turning a blind eye. Or maybe this is actually a much harder problem. If I had to guess it's probably #3. But as a user of search it's frustrating to find the clones ranked above the real stuff.
Yup, just found this morning that an article my wife wrote on a very obscure legal topic was stolen, reformatted, and posted on some "life hacks" sort of site. It shows up #3 in the DDG results. At least her originals are still #1 and #2.
Meanwhile I have in my inbox in the last 24h at least a half-cozen emails looking to do SEO work for my company website.
Web = untrustworthy? YUP
I'd happily pay for a serious version of 1999 Google, but updated to filter out anything advert based, and search for exactly what I want.
Search is such a fundamental function, and we've done the experiment and the advert model fails - it needs to be just another utility.
Just as bad as the auto-generated pages are company blog pages whose SEO rigged post pretends to give "help" for a problem where the main solution is of course, using their product.
Yeah, looking up technical stuff now for anything outside of very major tech is an absolute nightmare. It's nothing but auto-generated pages made of random parts of random forums posts smashed together under some weird url like tech-helb-4-yuodbajdasdasd99234029242.co.xyz.com.org.
I've clicked on a few out of curiosity, immediately recognizing they were garbage from the description text, and it's just endless SEO links and completely random text.
You'd think one of the richest companies on earth could make a freshman intro to CS-level spam filter. If they can't, then they truly do hire the most incompetent people on earth. If they won't, then they hire the evilest people on earth.
Yeah, those pages are definitely auto-generated. Static site generation makes it possible for those types of pages (I call them "shims") to jump to the top of the results list. I wrote about it here: https://zestyrx.com/blog/nextjs-ssg
I don't see what static site generation has to do with it. You can spin up a huge number of shims even more easily with a dynamic site and a DB with a list of all the messages you want shims for.
I concur on the looking up technical errors bit directing you to auto-generated sites!
I was recently trying to troubleshoot a very basic error message for Linux and was getting results and webpages that would list the error message in the title in some way, but then give instructions on "First, open up device manager", "Click Win+R to open windows command prompt", etc. Lots of untrustworthy ads. Different URLs, almost line-for-line identical webpages.
This was something like the top four search results (that weren't sponsored ads).
1) if this is from the project's own site, it's good.
2) if it looks like an archive of the project's mailing list, it's good.
3) if it looks like an internet forum, it might be good, or it might be just another poor soul asking the same question.
4) if it's on StackExchange, it's like on the forum, except your chances are slightly better. Karma must flow.
5) if it's on Reddit, it's like on the forum, except your chances of getting an answer are worse.
6) if it's a blog of some geek, sometimes it can be better than 1) and 2), or you might just get a straight answer.
7) in any other case it's most likely a SEO farm. Run.
If I have a linux problem these days, google usually gives me the relevant piece of source code on Github from which the error message originates. Like, you're a big boy now, go figure it out yourself.
It seems we're back to the early 2000s, when search engines not so much specialized in topics, but leaned heavily towards one type of content or the other. Holy hell, maybe one day Reddit's own search will be good enough so google can be ditched for good!
It still exists, the last app update was uploaded yesterday, but it seems most of the users didn't stick around after the initial hype died down. I guess the early focus on monetization and trying to turn it into payment app didn't help.
I want Google to allow me to specifically include/exclude mirrors from my search results. "Only show my the original source of this content", or "only show me mirrors of this content".
I don't want to see the same result repeated 5 times across different stack overflow mirrors.
It almost immediately upon the real adoption curve (this seems like October November 2020 anecdotally to me) just absolutely filled to the brim with NFT shills, cryptocurrency pumpers, and “self actualization with this one weird trick” promoters of their MLM. But early 2021 it was impossible to find any rooms with anyone discussing anything but these things.
I have noticed that a lot of tech and startup clubs on Clubhouse were created by users from India and Iran. Today the clubs are still dominated by users from these countries.
Nothing wrong with that but that also indicates the usual clickfarm spammers from developing countries had unfiltered access to Clubhouse from day 1.
This might sound elitist but it's probably a good idea that C2C apps get their first batch of users and community leaders from high income countries before branching out to the rest of the world.
Google used to be better at filtering out garbage content like this. They have resources for detecting low quality content (e.g. all pages on this domain follow the same content-free pattern).
I suspect that doing that wouldn't drive ad revenue up, so they don't bother.
I had to search crates.io . Let me tell you. It's not the pinnacle of search.
What I searched was `fast bitset implementation`. My results consisted of drill bits a stack overflow questions and a Baeldung article on hashSet vs long[]
> "Step one: try turning your computer off and turning it back on again. This is usually a good first step, and you'll be surprised at how often..."
This seems like a natural result of optimizing each search for revenue. Think of a search to solve an error message on your computer. There's a very small number of vulnerable people who are going to spend money as a result of that search, so optimizing for ads would mean tailoring the results specifically for those people, pushing them to sleazy sites where they might spend money on some kind of antivirus scam. The results are worthless to you, but who cares? You're worthless to Google when you're doing that kind of search. Try searching for something that people in your demographic spend money on, and the results will likely look better to you.
Might the inevitable arms race between bot writers and bot detectors be the missing accelerator for a general AI that has a predilection for top 10 white label brands of generic consumer products?
Post truth society has arrived, trump was a symptom, not the cause.
Combined with AI imitating speech and deepfakes, and technology of inplanting false memories, we will have the matrix, just not the wau we expected ^ ^
Google used to be really, really good at finding exactly what I told it to find. Nowadays, it's turned into the yellow pages; sponsored content from businesses trying to sell me goods and services.
Can people suggest good alternatives or search patterns for certain categories of information or search types?
Some of the search patterns I currently I use:
* Youtube for product reviews and demos, entertainment, music and educational material.
* Google with site:reddit.com at the start for questions best answered by other humans; crowd-sourced answers, authentic replies from mostly real people.
* Google with site:news.ycombinator.com if I want to find "forum-like" discussion on topics I'm interested in.
* Google Image search with site:amazon.co.uk when looking for niche products I need to buy, because Amazon's search is so incredibly broken and game-ified.
What I'm having a heck of a time finding is technical content; long-form programming tutorials, deep dives into academic concepts (I do a lot of signal/audio processing and search for blog posts related to these topics), circuit schematics, electronic engineering content. These used to exist on enthusiast forums 10-15 years ago, but Google often no longer surfaces hits from these forums, both because the content is old and the forum model is dying. Reddit is the "replacement" but it plagued with low-effort "look at my thing" posts that help nobody.
In my experience, the forum experience is far from dead, but it's effectively impossible to surface in a search engine - any search engine - unless you know the name of the forum.
Oh, and the content must also be "fresh". If the content isn't "fresh" (which most of the best forum/blog posts are not), nobody shows it anymore. I can search for a specific blog post using a verbatim quote, but the result (if it exists) is buried under 10+ pages of "fresher" content, no matter how disconnected it may be from the search.
The forum experience is dying. I spent about 4 years of my time in-between Google stints working on a searchable feed for forum sites. Finally gave it up when I realize the extent to which the forum scene had died and moved to Reddit & Facebook while I was working on the project.
The root problem is that attention has gone from abundant to scarce, and people already have their habits. That makes it really hard to build a new forum site and attract an audience that's willing to type your URL in every day (and if they don't visit daily, forget about building a viable community). Forum hosts like Facebook and Reddit don't have this problem - you can view your Buy Nothing Group and Moms of Springfield posts interspersed with your feed of friends, or your r/factorio content interspersed with a steady stream of r/AskReddit.
There's also emerging technological barriers. If you don't sign up for CloudFlare, as a new website, you're going to get hosed - but at the same time, CloudFlare makes it basically impossible for any new search engine other than Google to spider the site. Ditto security patches, and keeping software up-to-date. Most people don't want to deal with sysadmin stuff at all, particularly if they're trying to build a community as a hobby. So that pushes people further toward hosted solutions with a turn-key secure software stack, which is Facebook and Reddit.
If you find a forum for a given subject, it is almost always an authoritative source filled with experts. This is especially true in engineering disciplines.
It's unfortunate that Reddit and social media took over and led to their decline, because it's suboptimal setup in so many ways.
- Reddit in the large is a high noise, low signal monetization chamber. Some subreddits have good moderation, but that doesn't stop the spill over and drama.
- You can't assume much about any given Reddior, and you won't typically form relationships or associations with them. It's pretty much pseudonymous.
- Reddit doesn't focus on authorship. It doesn't allow inclusion of images, media, or carefully formatted responses in threads.
- Reddit corporate is the authority and owner of all content. They can change the rules at any time, and that's a fragile and authoritarian setup for human discourse.
- Reddit corporate is constantly changing the UI and engaging in dark patterns to earn more money. This flies in the face of usability.
Forums should make a comeback. It would be better if each community had real owners and stakeholders that had skin in the game rather than a generic social media overlord that is optimizing for higher order criteria that sometimes conflict with that of the community.
But forums have problems too. They should be easier to host, frictionless to join, easy to discover, and longer lived.
Another way to think of this: every major subreddit is a community (or startup) of its own and could potentially be peeled off and grown. You'd have to overcome the lack of built-in community membership and discovery, but if you can meet needs better (better tools for organizing recipes, community events, engineering photoblogs, etc.), then you might be able to beat them. Reddit can't build everything, just like Facebook couldn't.
This is depressing. Good information is useful for far longer than a carton of milk in your fridge! And a lot of that new "milk" is apparently made of chalk and bilge-water.
The entire information ecosystem has internalized a bias toward "freshness." It's even really strong in software. Evidently code is more valid and correct if it has recent GitHub commits.
Anyone know the origin of the fresh rule, and the purpose? It makes sense in some niches but in others it is so obviously bad I wonder why Google added it
the forum experience has effective been totally replaced by either discord or subreddits, or any other kind of self-moderated social media group you can think of.
Its a plus in minus in a lot of ways but the biggest con is that its just straight impossible to search a discord log effectively.
I've been on the Kagi beta test for a few weeks now and, for the kind of searches I mostly do, it seems to be a massive improvement on Google. Strongly recommended.
I've also been using Kagi for ~1 month and god can I testify for how fantastic it's been. You have to TRY to find blogspam and the allowance of blacklisting domains plus some other handy search customization features make it an absolute joy to use.
It may lack "instant answer" widgets or other fancy search engine features but it gets the actual "search" part of the equation so right that I find it astonishing how I ever used DDG/Google in the past.
Same. Over the years I've trialed most search engines out there, but always find my way going back there after at most 2 days of trying them, because I end up adding "@google" before every query anyways because the results are bad.
With Kagi most of the results are what I'm looking for. If they are not, I'll still try "@google", but so far with very few queries Google's results were actually better. The biggest drawback is worse "smart cards" results, but I hope they keep those optional/unobtrusive anyway.
The strange thing is that the feeling Kagi gives me, isn't even unknown. It just feels like Google circa 2010.
I've signed up for the beta, but it's hard to shake the feeling that signing in to a search engine is a mistake. "We respect your privacy", "we'll never sell your data", I've heard these claims before and they've almost always been lies. They can tell me that they don't maintain an eternal history of all my queries, but how can I ever verify that?
I love this search engine, it gives me the same feeling that Google did when it became a thing. Their business model after beta will be that users pay to use it, and it has no ads. This is a very encouraging sign, and personally I'll be willing to pay for quality search without ads. I hope enough other people feel the same to make Kagi profitable and functioning for years to come.
Looks interesting, but am I crazy for thinking that $10/month is an insane price to pay for a general purpose search engine? Surely Google wouldn't making anywhere near $10/month off of me even if I disabled adblock.
As a bit of a weird hobby, I like to read up on right wing conspiracy theories. That means I do a fair number of searches for specific terms and people mentioned in fake-news facebook/forum posts.
Google seems to slowly oscillate between thinking that I am a right wing loon, and thinking I am Joe Public who must not be shown misinformation. That is, sometimes google is perfectly willing to vomit forth results from the propaganda mills, even when I'm not specifically looking for it, and other times I can't get conspiratorial-minded results even when I am making an effort to find them.
This most frequently manifests itself when I am looking for sources for claims that I know exist. Like if I remember reading an earlier conspiracy that has just been invalidated, or someone posts some a video of someone reading a blog post. If google has decided I am an innocent bystander not to be shown conspiracies it can be nearly impossible to track down the original blog or posts about the conspiracy.
Recency bias is another huge problem with google results. Older content gets heavily de-prioritized, even when it is clearly what you want. Google is willing to give up on terms in your search before it is willing to show you old stuff. For example, if you tried to research early Ukrainian political corruption during Trump's impeachment, your results would be nearly entirely Trump-related content even if you tried to use google's date-filters and exclude terms like -Trump.
I noticed this recently when trying to find primary sources for flat earth claims. They don’t exist on Google, for me at least. You can still find them on duck duck go if you search for something like “flat earth ice wall” but Google just returns generic debunk articles.
This sounds like filter-bubbling. From what I can tell, Google doesn't have user specific filter bubble but user-category filter bubbles, and it's constantly updating the category of users it thinks you're in.
`site:reddit.com` has been worked poorly for me recently, although I've used it many times in the past. He's my most recent search (I was traveling and trying to watch Netflix, but geo-block was preventing some shows from appearing):
The entire first page is for NetflixViaVPN subreddit (not linking to avoid SEO). They have a stickied post that seems to shill two VPN providers I haven't heard of. This is plausible, as maybe Netflix hasn't either... The stickied post has comments disabled, so it's hard to tell. Then if you click other posts, a bot auto-links the stickied post, but everyone is making different suggestions that may imply the stickied post is wrong.
Interestingly, the same search on DuckDuckGo only has three posts from that subreddit. This better matches what I wanted! The first one I'm seeing is:
This seems much more plausible. All those comments suggest a provider I've heard of and that I've heard other people mention IRL. Google seems to rely too much on the URL or the page header, so it's stuck in a single subreddit.
imo, google is still king, but you have to be a bit of a power user. You're already using `site:` which is good if you know exactly where you're looking. If not you can use `related:` in the same way. I find using `-something` to remove terms the most useful. I'll search for something (usually an error message) then add `-react` (and mumble "ffs not everything is react"). Then if I still see things I DON'T want add more `-` to the string.
It's not GREAT that you have to do that, but it's pretty functional and certainly better than going past page 1 of search results.
The major problem with this that I've experienced is even if I use operands like + and - to specify or remove terms--more often remove--Google ends up using a synonym in place of that word that means the same thing.
So if for instance I'm looking up info about ADHD meds as an adult, I might get tons of articles about childhood ADHD since that's where all the research is. I search Adult ADHD meds, I still get articles about childhood ADHD. So then I:
and I still get crap blog spam that's probably related to teaching or raising children or some other bullshit like warning about the dangers of addiction or something, and never information about my ADHD or the meds for it.
It's not GREAT is the understatement of the decade.
I've had some luck inserting "forum" into search terms to find real human content. Mostly when trying to find technical info about cars, but may apply to other fields.
>What I'm having a heck of a time finding is technical content; long-form programming tutorials, deep dives into academic concepts
github search is good for that. search for 'list of awesome anytopic'/'curated list of anytopic'/'list of anytopic' and you might find a repository with a curated lists of links on anytopic. (search box on the main page of github)
You might also want to check my side project: I have a search tool / catalog of duckduckgo !bang operators, i am hoping that it allows for better discoverability of specialized search engines.
Since there's only a handful of sites you target your searches at, it would be nice if you could just have your own search engine that focuses on those few sites, and perhaps crawls a little deeper.
I've sometimes thought the death of Google will be the self hosted search engine.
I've found my jobs' internal (social) message boards/mailing lists/Slack channels/etc to be great resources as the only contributors are those who work/worked at the company. Your (ex)coworkers presumably met/meet a certain competency bar and are less likely to spam. At larger companies there are message boards/mailing lists/Slack channels/etc for nearly every topic.
For local information, I've found forums for local sport teams to be great resources during the off season. Posters are often happy to engage in any sort of chat during the off season. Even if you haven't gotten to know the frequent posters during the sport's season you can use the (usually highly visible w/o any additional clicks) account age/# of posts/"karma" as a proxy of posters' trustworthiness. note: If you don't normally contribute on-topic (i.e., about the team and sport) posts, I would only search the forums for your questions and not post off-topic questions as that'll get you quickly banned.
It's particularly awful on mobile where you get Google's "smart" cards which can be ads, followed by ads then the actual results which are mostly SEO trash. Trying to find support for Google Fiber routers was nearly impossible because Google just tried to interpret what I wanted as signing up for Google Fiber and just overwhelmingly suggests that. It gets even worse on Youtube where after like 10 results for what you typed in they just show you "things you might like".
For technical content I had great results from using safari books online in the past. Having most tech literature an easy web search away was super convenient, because typically the best treatment of any subject is in book form. The downside is that it is expensive, so when I switched employers I lost access and I wasn’t willing to pay for it myself.
Change your search strategy. Most forums require a membership to view them, and most long form posts are on personal websites. Google can't or won't serve those. You have to navigate like it's the old web. Find a good place to make landfall, read old posts, ask around, and follow all your leads.
> There’s a fun conspiracy theory that popped up recently called the Dead Internet Theory
I think we're well on the way ...
Was recently pretty shocked, searched for "gas heating repair" and got back at the top some sites with my suburb name in the title. Naturally I thought, wow, if there is a local place I should go there. Clicking into it, it has everything about my suburb - a picture of the local park, and whole paragraphs of random text containing bits and pieces about the local area interspersed with odd sentences about gas heating ("Cold mornings in XXX can be confronting without effective heating" etc). The text kind of makes sense but also reads like it was generated by GPT3.
Of course, then I realise, this is all SEO. They have generated a page like this for every suburb in my city. There are tens of thousands of such pages they are hosting. The most shocking thing is this is a small time gas repair dealer. They clearly don't know how to do this, they've gone with a low budget to an SEO firm who has effectively generated a giant plume of toxic content into the web atmosphere, all to create a marginal benefit for this one small company.
If a small time low budget unsophisticated company can do this, then I have to assume it's happening everywhere. On a mass scale we have giant smoke stacks all over the internet spewing toxic plumes into the atmosphere. And the humans are gasping trying to find the small bits of remaining breathable air.
Yes you are right, almost every business with an online presence is generating vast amounts of garbage which exactly targets a huge range of specific keywords. From the search engine perspective, the page is exactly what you are looking for.
I am curious, did this small business actually provide the service you were searching for or not? If they do, then it sounds like it was ultimately useful.
Ah, young whipper snappers, everything old is new again, and clearly the world is always getting worse.
Well, some things are (reverse image search, ease of accessing 'Cached' pages -- now I have to go to archive.org Wayback, etc), but forum search has always been bad.
Long before Reddit was big, USENET/DejaNews and forum software like PHPbb/UBB ruled supreme (and before Markdown there was UBB Code). Google, despite owning DejaNews, did not often surface links into USENET content, and a lot of forums, for whatever reason, were not indexed by Google. For example, I used to spend a lot of time reading the latest on PC/3D Hardware stuff on Beyond3D, Overclockers, Rage3D, etc and I almost always had either use site specific search (dejanews.com or say, PHP BB's built in local search), or I had to add site:beyond3d.com for example.
And is a large amount of confirmation bias going on in these Google threads that appear. Some people make assumptions that their search patterns are representative of the billions of searchers ("argh, I searched for pytorch k-means and a GitHub wrapper site appeared!") and that their experience is a representative sample, while others focus only on what has gotten worse, and not what has gotten better.
What's clearly gotten worse is webspam. But while it has degraded the Googlee experience, it's not clear any of the other search engines are any better at filtering it out, except by luck because perhaps they don't crawl as many sites as often.
I think this is very important to note and I agree.
Whether or not google search right now is as good as it could be should not be the main point of discussion.
We have to remember to acknowledge that the web google is indexing now differs drastically from the web it was indexing 20 years ago. Web pages are now less likely than ever to be freely accessible plain text put forth in good faith for public consumption. Google (in addition to dealing with big walled gardens designed explicitly to hide content from google) is trying to sift through basic spam, industrial scale SEO exploitation, and nation-state cyber warfare.
Bitching about google search being bad almost feels like yelling at the canary in the coal mine when it passes out.
The issue (at least for me) is that google is no longer actually searching for the thing I ask for, and it's being blatantly disrespectful of users who cared enough to learn how to actually use the search features.
Quick example from today? I did a literal two word search - gulp admzip - and while the result are okish, an increasing amount of space is taken up by results with this handy little blob at the bottom:
"Missing: gulp | Must include: gulp"
"Missing: admzip | Must include: admzip"
WTF are they smoking? I asked for two fucking words, and the top result doesn't include one of them. Then the second result doesn't include the other.
So then I add quotes around the phrase I want "gulp admzip" because I'd really only like to actually see results that include that EXACT phrase, and... drumroll... IT DOES IT FUCKING AGAIN: "Missing: gulp | Must include: gulp"
And that literally has nothing to do with the quality of the items it's searching, and everything to do with Google deciding what I meant - Clearly I meant the npmjs.com package adm-zip, because that item gets vastly more views than any of the real search results.
I couldn't have possibly meant to restrict the search to the actual fucking phrase I told it to search for, because there aren't that many results, and they don't get many views.
> What's clearly gotten worse is webspam. But while it has degraded the Googlee experience, it's not clear any of the other search engines are any better at filtering it out
The problem as I see it is Google has created a bunch of perverse incentives to make your page rank higher. One big problem is Google gives higher rank to "comprehensive" articles. On the one hand that would seem like a good thing right? But what you end up getting is endless affiliate articles that don't seem to be written for humans. And they are really easy to spot if you know what to look for.
A great example is webhosting reviews. Search "best web hosting" and click any of the 1st page results and you will almost always get an article that just rambles on and on and on with headings like: best web hosting for email, best web hosting for blogs, best web hosting for email marketing. To a human, it's an incredibly disorganized mess, but to Google's bots, its "highly comprehensive and authoritative".
While that may be true, it’s also true regardless of the ranking algorithm or which web search is the winner.
Webspam will seek to game whichever search company has dominant market share and they will structure their spam to overcome the filter and ranking specifics of that engine.
Considering tools like GPT-3, one could easily imagine in the limit, a spammer running a large number of searches through a search engine, finding out what ranks high, and the training a generative model on that dataset to produce similar articles. Auxiliary signals like inbound links and DNS records they can also usually work around by purchasing domains or buying inbound links.
It will always be a war and there is never going to be a victory over webspam. Even with something like web3 where posting content costs money I can imagine ways spam.
I think the biggest sentiment shared with everyone against Google search at the moment is the eagerness to see better competition.
While it may be true (and certainly easy) to paint everything with this large brush of 'nothing has changed, we've seen this before' - it misses the point just a tad, which is (imo) aimed at the disdain for constantly being tracked, and wanted to have a much more hacker friendly web experience (like the old days).
Things such as https:/serx.cf/ (https://searx.space) or https://github.com/benbusy/whoogle-search (https://whoogle.dcs0.hu) should prevail on a site such as HN - and furthermore, we should be setting the path with these tools for the future use of internet by non-power users.
The other day I was searching for a specific kind of jewelry and realized I don't know of a search engine that can do what I needed, which is to just find good results for my search. Searches for jewelry-related keywords triggered Google to go 90+% ads, and their results (and other search engines' results) were so junked up with spam and the same couple sites over and over that they were useless.
We're back to the Web needing a search engine.
[EDIT] I should add that the ads Google was showing me didn't even do a very good job of showing me the very specific kind of thing I was looking for, even though there must be thousands of stores around the world selling pieces that fit the keywords. The ads were for jewelry, but most of them weren't anything like what I was trying to find. In this case an entire page of ads but all from different sites and mostly the thing I was looking for would have been better than nothing, but it couldn't even do that.
I've found Brave Search[1] and Kagi Search[2] to be great alternatives to Google. I know exactly the sort of thing you're describing and both of them are a breath of fresh air in the space.
Am I the only one who just skips the search engines and go straight to the source? If I want factual information, I just go to Wikipedia and use their search. If I want to shop I'll go to respected online stores and again use their inbuilt search feature.
Obviously I've just built up a list of good sites in my head which I trust... Google search is good for discoverability if you're new to the web I guess? Although in the old days that's what web directories where good for:
Wikipedia is the perfect contrast to google search results.
- It contains 100% signal- no noise- and provides helpful related links if you need more information.
- Pages are organized and brutalist.
- Every page has a steward who (thanklessly) keeps the information accurate, up-to-date, and ad-free.
Contrast this to Google:
- 50-100% noise (depending on the query; more information requires more queries and therefore less signal)
- SERP pages are disorganized and absolutely riddled with UX dark patterns (modals, banners, autoplaying video, etc). Many pages with good info are over-styled/over-javascripted/over-languaged, and finding the one or two sentences you're looking for is a chore.
- One-off SEO spam plagues everything; ads and affiliate links are pervasive. Stewardship is a waste of time.
>If I want factual information, I just go to Wikipedia and use their search.
Wikiepedia is a very good resource for a lot of things, and a good jumping off point, but you shouldn't assume that you are getting "factual information", especially when it comes to hot button social or geopolitical issues.
It's much easier to type "searchterm" or "searchterm wiki" in your web browser address bar and then click through to the Wikipedia result than it is to first navigate to every individual site and use their non-standard search bar.
> Google search is good for discoverability if you're new to the web I guess?
This is almost certainly not true. I don't think kids (who make up the vast majority of 'new to the web') care about using Google for discoverability. They'll use YouTube, Twitch and Instagram to find things they care about. Google is for answering questions, not finding new things.
And honestly, it's not generally that good at answering questions.
I feel like that this has not changed the last 20 years. Yes - google was at some point like a miracle that seemed to solve lots of problems around searching the www for information.
While google "refined" its search and monetarized it the web still evolved and is evolving to something.. different. Many of websites most people already know, competing around google top rankings and ad revenue; there are even people dedicated to "make $website more visible to the web (what they really mean is google)" for lots of money while the real internet goes on in the background.
We need more ways to search the web. We need lots of different search engines that are competing and working together also. The web is still young and no one really knows what it will be in the future. (I fear it has to do with ads. Lots. Of. Ads)
>The web is still young and no one really knows what it will be in the future.
My fear is that walled gardens might win in the future because who guarantees you that websites won't move to Facebook Pages, Facebook Groups, Slack and Discord channels etc. Open web is weaker than ever just look at LinkedIn; walled garden, throws you Register form in the face when you try to access it and won't let anybody crawl or scrape their content except Google who drives more traffic to their walled garden.
I know normal people would never use it but I sorely wish there was a way for me to just grep the web instead of using "search" as offered by Google et al.
They have the buyer ready to buy and still missed the sale because they wanted to make a profit on ads. How ironic. And this kind of experience probably turns a lot of buyers off from using Google search.
We don't need your unrelated ads, we already know what to buy, don't patronize us. We need help getting to the product page from keywords. We need real reviews. We need a shopping experience we can trust.
That was the craziest part to me. I was interested in both products and information, and despite deciding it was a good idea to show me almost nothing but ads, they didn't manage to show me anything worth clicking for either purpose. I was practically their ideal target for getting someone to click an ad on purpose, and they still dropped the ball.
Could you clarify what you mean by 90% ads? I would assume there are always organic results right after the top 2-3 ads, has this changed somehow?
Also you say the results were "all from different sites". Is that a good or a bad thing? I imagine having too many results from the same site would be less informative, no?
I'm very curious to try the same search query myself, but of course I understand that it may not be something you'd want to share.
Google's ideal would be that every single search result would be both relevant and an ad. And it's going there, somewhat at least. Because the people who have the most time to write articles are employees writing/researching 8h a day. Someone doing it on their free time has no chance of competing. The problem is that obviously the people paid to write are biased. In some cases maybe it's a problem, in others maybe not.
It's already dead. Google mined all the links that were curated by the initial internet communities for all it was worth and turned them into profits for Google's earliest employees and shareholders. Now that no one is curating useful links anymore their search quality, unsurprisingly, is deteriorating. Without human curation there is no signal for Google to use anymore and whatever signal is there is just SEO spam that is optimized for serving ads. It's like an ouroboros eating its own tail.
It’s not just the links. After the links, google mined facts, like “how much does a german shepherd weigh,” so no on gets those clicks, and the incentive is gone there too. They’re even mining the snippets of the content, lowering the incentives for creating that too.
It's essentially a machine for printing money and people don't really understand what they're giving up in exchange for "free" search results. Google is beholden to market forces, it's no longer in the business of indexing useful information because the market doesn't value useful information, it values ad revenue.
This is a structural problem and anything that gets large enough will succumb to the same forces. If the incentives are for optimizing ad revenue then that's what all corporate machines will do at scale, regardless of their initial motives and incentive structure. It doesn't help that Google is also an ad network, hence the ouroboros aspect.
This sounds almost like Goodhart's Law: "When a measure becomes a target, it ceases to be a good measure."
Google made links on the web the measure of how good a page was. That became the target of everyone trying to do SEO. As a result, it stopped being a good measure of how good a page was.
But in the long run, nothing will work in that environment, because every measure will be gamed as soon as people figure out that Google is using it. Google's only choice is to try to stay ahead of the SEO crowd, and I'm not sure they can do that (well) for too much longer. In fact, if the article is to be believed, they're already starting to fail.
Yes, it's very similar with the added caveat that Google has an interest in serving results that have ads from their own network. This is why Google's metrics can be hacked. Anything that is barely above being classified as spam but serves ads from Google's ad network will be prioritized over other results simply because they have to hit their quarterly revenue targets. SEO hacking is not possible if a search engine is just a search engine but Google is also an ad network so they will always be susceptible to being gamed.
This is also the case for social media platforms. They're incentivized to surface content that generates engagement and ad revenue. Basically ads are at the root of all problems when it comes to the internet and the content on it.
I honestly don't think this is the problem, like at all. There are human websites made by humans, still. There's more crap, sure, but the good stuff is largely still out there.
The problem begins and ends with the conflict of interest that Google both sells ads and selects search results. If they didn't have a vested interest in people visiting sites with their ads on them, they could decimate the number of spam results.
Which websites are shown in the search results is not influenced by whether Google has ads on them.
The only thing that influences Google search results is Google's desire to keep as many people using Search as often as possible, since nearly all of Google's money comes from showing those text ads at the top of the Search results. This is all public information, you can read it in the 10K etc.
So if Search sucks, it's not because Google has the wrong incentives but because they can't solve the problems Search faces.
Hard to proof this but anecdotally lots of energy that used to go into people's hobby or passion websites now is going into digital platforms like reddit, pinterest or Twitter, or if lucky substack or Medium. Some of those are walled gardens and Reddit as discussed here is where people now search. Much less out on the actual net
Wonder how we could set up an alt-web without the incentives that cause this problem. Delist any for-profit site? How would the sites keep the lights on without ads?
To me it's more a sociological problem than technological. Also networks have changed.. somehow the decentralization idea is spreading fast. For ideological, technical, cost .. or other reasons. Some people start neighborhood wireless networks etc.
It also seems to me that internet has somehow became a middle man and is not providing human deep enough interactions, especially outside chat-like website (basically any exchange, business)..
I could envision a whatsapp like system with quality control for producers and transparent transaction/tracking/accounting management offered by the network so people spend less time on side-loads and just focus into helping each others and doing what they need to.
Federation is the only reasonable solution at this time but the technical overhead of federated search is high enough that most people won't use it so it won't benefit from network effects like Google did in the beginning. There might be a combination of blockchain juju that could make federation viable but all the thought leaders in that ecosystem are too high on their own supply to realize they could use blockchains for anything other than gambling.
There’s no better solution - you either have ads or a paywall, which will result in few users.
If there were a better solution we’d all already be using it. Certainly you can rely on savvy people to produce free stuff, but the total amount of content will be drastically lower and therefore fewer consumers.
The best thing is to just use bookmarks and your favorite sites’ own search.
It's interesting, and quite concerning, that Reddit has cemented its position as a key repository of useful information on just about everything just as its drive towards monetisation really kicks into gear. It's concerning because, as part of that monetisation strategy, Reddit is becoming increasingly walled off and anti-user. I am sure I am only one of many long-time Redditors who have vowed to stop using the site completely once old.reddit.com goes, and it's only a matter of time.
It's probable that a huge amount of useful information will soon become much more difficult to access, and/or diluted by stealth advertising, as Reddit looks to aggressively monetise its position. I'm interested to see if a credible alternative emerges and if there is any effort to move some of the existing useful data off the platform.
It's very interesting to me as somebody's who's been making and burning Reddit accounts since before the Digg implosion. 15 years ago, I trusted what I saw on Reddit when it came to things like products: If I came across a post on the best can openers, I had some certainty that people were just sharing their opinions on can openers.
Now I don't trust a single damn thing I see on that site.
Yeah I have a 6 month timer on Reddit accounts because I tend to post on topics in my home town and my close hobbies and I’m afraid of being doxxed. At least in HN I just talk about programming languages and stuff so I keep my account.
I also use reddits account name generator. I think that also helps that my name there wasn’t thought up by me. Least a little level of abstraction.
Has anyone here actually tried the "<search query> reddit" search lately? Click on a reddit link and it takes you to a page that forces you to open it in an app. This has made me stop using reddit completely.
- "Why are people searching Reddit specifically? The short answer is that Google search results are clearly dying. The long answer is that most of the web has become too inauthentic to trust."
This is it for me exactly. I search for the following kinds of things on Reddit exactly because results on other sites aren't trustworthy: Reviews are secretly paid ads. The "best" recipe for pancakes is only what's trending on instagram right now. The latest conditions on mountain bike and hiking trails are being shared inside communities like Reddit but not on the web. The same for trending programmer tools.
- "It is obvious that serving ads creates misaligned incentives for search engines..."
What I'm shocked by is that Google somehow maintained a balance on this for so long. Well, at least a good enough balance that people still use it primarily.
- "Google increasingly does not give you the results for what you typed in. It tries to be “smart” and figure out what you “really meant" ..."
This is the most annoying behavior because I really mean what I write.
- "There’s a fun conspiracy theory that popped up recently called the Dead Internet Theory..."
I hadn't heard of this. Now that's some sci-fi level of conspiracy but in today's world it seems totally plausible.
Let's be blunt here - almost no consumer consciously chooses to use Google search anymore. Google has a distribution monopoly through Android, its deal with Apple on iOS and MacOS, and on desktop through Chrome.
I'm working on a search engine startup. It is in all practical senses impossible for an iPhone or Mac user to change their search engine to a new search engine on Safari or at the iOS level. And despite being technically possible on desktop with Chrome, it is for all practical purposes beyond what any typical consumer can easily do.
Their monopoly over distribution - not search result quality - is what keeps consumers searching Google and clicking ads.
On my IOS device, under Settings -> Safari -> Search Engine, I have a drop down with options, including Bing and DuckDuckgo, but defaulted to google.
On Macos, with Safari running, Safari -> Preferences… -> Search, Search Engine I have a drop down, defaulted to google, with Bing and DuckDuckgo amongst other choices.
Agreed on google”s effort to get their search engine as the default. However I just don’t understand how changing search engine is impossible given what I’m seeing on my devices? Nor does it seem over the top onerous to my eyes.
This bluntness does not go far enough. People do not change defaults, no matter how "easy" it may be to do so.
A default is a pre-made choice by someone other than the consumer. There is no set-up process where the consumer makes a choice. The choice has already been made. Consumers do not make this choice. Even if they could, in practice they don't. That fact may seem insignificant but it is worth billions of dollars.
If I am not mistaken, the current CEO of Google spent most of his time working on "default search engine" (or "default web browser") deals before taking the CEO job. In probably the most important one, Google pays Apple a hefty sum to be the default search engine. It was estimated at $10 billion in 2020 and $15 billion in 2021.[1]
Defaults are effectively permanent settings. It does not matter how easy it is to change a default setting if practically no one ever does it. $15 billion is too much to pay for something that may or may not change. It does not change. It is money in the bank.
1. https://9to5mac.com/2021/08/25/analysts-google-to-pay-apple-...
I disagree. Two to three years ago I could get more what I wanted in a complex search once I tuned it properly. So Google had a twenty year run of good and useful searches. Google also worked to strong arm their monopoly, yes. But I claim they still served some quality after that. It's not that unusual for a monopoly built on quality to maintain their quality for a period of time after it achieves that monopoly status - institutional standards die but they can die over time.
I don't disagree with this as a fact, but I think there are a lot of things that work this way that aren't actually monopolies in the competition-preventing sense. If I wanted to launch a new breakfast cereal, getting my product into grocery stores would be one of the major challenges of starting that business. Competition for shelf space is a core concern of a lot of consumables. This definitely creates a lot of stickiness and barriers, and that comes with its share of downsides, but there are also good reasons that distribution systems work the way they do. Transaction costs are important.
This is not a comment on the search results itself - always appreciate the efforts to break out of the standard google results and surface other sources, but I found the interface confusing and the previews were also taking up a lot of space. A compact view would be better - or giving the option to turn the previews on / off.
May I ask how you arrived at this observation? This is the first time I am hearing this. I know of NO ONE who uses any other search engine. The term "Googled" is not yet a proxy for other search sites.
It's easy to game an algorithm, but hard to game a human - humans know garbage when they see it.
As an aside, whenever I get a prescription, included with it is a dense two page sheet of detailed information about the drug. I see nothing like that online with a search. Why is this sort of thing not online?
I gave up after a few weeks and had to switch it back to Google. Google's not perfect - it's never been perfect, it was just better than the alternatives - but it's still less bad than others.
I mean maybe they was true and now they don't... But yeah good luck!
Deleted Comment
Do you have anything substantive to support this? I highly doubt it is true given the fact that the verb "to google" literally means "to search the internet".
There are five (very simply accessible) different choices for Safari on iOS.
But if you switch to iCabMobile on iOS there are TWENTY-FIVE search engines to choose from.
Tons of people don't, though. They type whatever unprocessed half-second thought they have into Google and expect Google to lead them to the water, even if they're tugging and trying to go in the completely wrong direction. Google has optimized for working 'most of the time' for 'the most people', and that means striving for fixing the complete word soup of search results people type in.
Given all of the data collected about Google users, ought not one of the applications of that data be some way to give users specifically what they are searching for if their past behavior suggests that they mean what they type? Couldn’t the “search only for <exact query>“ option be a very good data point on making that determination automatically, or enabling a user setting for “give me exact results based on what I actually typed by default”?
It seems possible to me that this behavior has more to do with the value of ads for “big” keywords than with (poorly) inferring user intent.
https://archive.md/wwMY3
https://www.google.com/search?q=how+old+is+linux&oq=how+old+...
I've been using DuckDuckGo a lot more recently and the thing that surprises me isn't the kind and quality of the results, it's that I actually need to use my brain to search.
It's not about whether this is a good or a bad thing—I kind of like the precision in a way, it's just jarring how different it is as an experience.
Do they? I see this stated all the time, with no references.
They type whatever unprocessed half-second thought they have into Google and expect Google to lead them to the water
Perhaps if Google didn't try to fix things for people, they would be more thoughtful with their searches.
Take away the junk food, and people will resort to real food. The same way some cities limit parking at big events so that people have to take mass transit. It's for their own good, but they have to be shown the way.
Google has optimized for working 'most of the time' for 'the most people
This may be Google's goal, but it hasn't happened yet.
I don't have very many friends or acquaintances in the tech bubble, so I base my observations around real people in the real world. More and more they're giving up on Google entirely.
Their primary search engines these days seem to be Instagram, Pinterest, Etsy, Amazon, and other non-Google sources.
When I ask someone why they're searching Amazon reviews for tech support information, they tell me because it's not on the web. That's Google's failure.
They should not be engaged in non-consensual manipulation of social or political behaviors, and the ethics of market manipulation at scale through advertisement are far from clear.
Google is optimizing for that.
Though Google is at fault for letting their service falter to the "payola" race, many other factors are in play all across the Internet since data quality has faltered almost totally. For major-cost and non-refundable purchases I need to trust, I go to brick and mortar stores and inspect what I am buying. I am thankful not everything has shifted to an online-only model. It's going to be a very bumpy ride on the Internet until Congress and consumer protection laws wake TF up and do their job.
Most genuine discussions have moved from open, publicly accessible web to places inaccessible to search engines and general public. Smaller niche forums, blogs and personal websites with no financial incentive have died out. People have moved to Facebook, Discord, Whatsapp, Instagram, Slack, Twitter and other places behind logins. Online newspapers and portals are increasingly using paywalls. Most of the genuine human interactions and quality content is not indexable anymore. Instead we have a million affiliate marketers fighting for the top positions in search results with every possible seo trick.
Reddit is one of the last places with huge amounts of publicly accessible online discussions.
There are still some pretty active enthusiast message boards, e.g., https://www.tacomaworld.com
But your overall point is valid, I think. My pointing to a single active forum doesn't change the fact that many of other enthusiast groups have moved to facebook groups and the like.
A question worth posing to this community: how can we build an internet that’s hostile to advertisers? Secondarily, how can said internet also be much more accessible to content authors so they won’t have to learn a css, html, and JS to publish some stuff? Finally, how can that content be discovered from within this network?
Maybe a tax on ISPs? I think I'd happily pay $10 extra per month for access to an ad-free interrnet. Maybe $20. But how many of the people that are already happy with the ads and poor google results would do so? Would it be sustainable?
You have to reify "trust" into concrete, computer-representable data. Maybe borrow the "web of trust" concept from PGP, but do some sort of multiplicative thing where the amount you trust someone's recommendation online is the product of the trust relationships between you and the recommender. That's really the best you can do - even legislation against online advertising will be subverted by companies that go through layers of proxies to buy influence.
The most important factor was cramming SEO terms and links to keep people on the website into the articles.
The result is trashy articles that could well have been written by a bot but aren't. This could possibly be done with the help of curated bot-content, but I think we're far away from the point where this is really more profitable than getting students to do the work.
It's people but they work like bots.
It may be becoming borderline. I expect that sentence/paragraph completion is already becoming useful to people who churn out quick content for a living. In any case, the important part isn't whether or not it's bots. The important part is whether or not it's authentic. The precise meaning of authenticity gets squishy, but it exists nonetheless.
IMO the sentiments are correct, whatever the details. Part of why google sucks is that the internet is worse, for a bunch of the things we use google to search for. The internet becoming a larger, more profitable industry changed it. Instagramming for influencer perks, SEOing, or selling targeted ads like FB do... it does not lead to the same places that earlier iterations of the WWW produced. Times change.
My suspicion is this is rife given how many articles read poorly and are almost entirely fluff. If this is true it would appear we are doomed to algorithms shaping our online experiences, which is worrying given the existing shrinking diversity of opinion and content. It's like a entropic gene pool in nature, but with information.
You will find a page for almost any X and Y combination. And they will all have wording like "Put some X on the stained Y... wait a bit... rub it in... and then put it through the washing machine. Hope it works!".
Right now, I don't think you can tell it to use some specific terms in the text it generates, but it doesn't sound like a difficult extension.
> Whether they’re a bot or human, they are decidedly fake.
Fake plastic trees.
https://archive.fo/1EjSu
Deleted Comment
It's completely pointless, you just get a bunch of articles from news sites (??) that transcript the quest but not tell you anything more. I miss a nice community wiki like I'm used to from playing Dark Souls etc.
I've just started appending site:reddit.com to everything, works a lot better.
Take StackOverflow for example. Almost any programmer will find a SO result as the top result and it's usually exactly what you're looking for. Since there's no money to be made by companies writing blog posts on debugging a compiler error, Google's algorithm works as intended.
Question is: Why hasn't Google done anything about this? It's the organic results that are terrible, so they're not losing ad revenue by placing these garbage sites at the top. Perhaps its to intentionally make better websites pay for ads to get better placement? But those won't be the ones to ever pay to begin with...
Haha, the noobs. I use HN instead sunglasses cool face
A bit more seriously: I fully agree with this. And if HN doesn't have what I'm looking for then I use Reddit as well. But if HN has some info on the topic with a few highly upvoted threads, damn, it always impresses me.
I also have the same kind of bookmarklet for Reddit and Google Scholar.
(This isn't an argument against your point, just a bit of additional context that increasingly is odd to me as Reddit gains more and more social weight)
I never believed in conspiracy theories, and after I read "Media Control" by Noam Chomsky I understood there is no need for conspiracy theories once you understand how individual incentives are aligned and how individuals always act to maximise profits.
Someone on HN phrased this and I am not taking credit for it but it explains beautifully whats going on: "Google is not making money by showing you the best search result they can, they make money by keeping you searching."
Which they are, evidenced by restricting searches to HN or Reddit.
This is a problem for google, maybe not right this minute as growth might offset dissuaded users, but nevertheless, it does not behoove them in the long run to provide garbage search results to people.
Advertising always creates perverse incentives. It works in traditional media too. Look at what happened to things like Discovery and The Learning Channel when they became subject to advertising based pressure for ratings. They went from having actual educational content to being full of tabloid trash.
A month or so ago, I was trying to help someone retrieve some very old Wordpress for Mac files. I found http://www.columbia.edu/~em36/wpdos and was so touched, I sent the author a few dollars for a coffee
Google's compliment is web sites. What have they done to make a web site easier to make?
They even killed their RSS feed. They have released a bit of web tech, but their offerings are generally a bit sad or only solve Google problems (e.g. Go).
If you want to distribute an .exe or .app, MS and Apple have released some pretty good tools to help. If you want to write a blog or make a simple web app, it's unlikely you're going to think "Google has some great stuff to help, and has awesome tools". Mozilla's web resources are better. Microsoft's web resources are better.
You mean having to pay them for the priviledge of not being flagged as "dangerous" by shitty machine learning algorithms?
> This is the most annoying behavior because I really mean what I write.
Yeah I remember this being mentioned in a local presentation at university. As a great thing. Google doesn't search for what you write, but what you want.
The problem is that very often Google don't know what I want. Before they introduced this, I was able to define my query so that I got exactly what I wanted.
But there is no conceivable universe where seo spam isn't the arch enemy of Google. Google needs to fight spam to survive, it knows it and it does. But that's hard. So hard in fact, that nobody else has cracked the problem, and for all the anecdotal evidence of someone switching to Bing, Google's marketshare is still utterly dominant.
The Dead Internet Theory may have some weight: Google hasn't dropped the ball, but it is slowly drowning into the sea of "content-free content".
But if that's the case, so is everyone else.
That said, the reason the Reddit trick works is that it uses information Google explicitly excludes when ranking content (engagement signals).
Google has a bunch of “objective standards” that it uses to paternalistically shape what the web looks like. Many of these are divorced from what users actually want for pieces of content (https, AMP, a life story in front of recipes to demonstrate authorship, etc).
The other dimension is that, in the past, if you searched for stuff your results were likely to be a blog or a forum thread. Today the bloggers have evolved into instagrammers, TikTokkers, YouTubers, or Podcasters. The Forums and community pages have moved to Facebook, Twitter, Slack, Discord, etc.
So it’s not just that SEO and botspam has eaten the Google results page, it’s that this is all that’s left of the open internet that needed search to navigate. Much of it truly is a wasteland. Google owns part of the blame for crippling RSS and privileging recent pages and specific domains or AMP pages over evergreen, self-hosted content in search results. But also users have given up on the open internet in droves. Instead of starting a fansite they start fan subreddits or discords instead.
Even HN is starting to feel like it wants to sell me something.
Even DDG knows that it can't handle everything, and so it has its bang shortcuts. I've used the !reddit one, and I'd use !w (Wikipedia) except I do those from the Firefox search bar.
I've heard the "everything's a bot" theory before, but never saw a name put to it before. I'd have to guess that 99% of all SMTP traffic is spam at this point.
In terms of direct ads, perhaps. But for SEO spam, in many cases, DDG already seems to be there. For example, things as simple as "python datetime", "python json", or "python datetime.now", where it would seem obvious that the top result would be the documentation for the module/function, have spam sites above the actual Python documentation. Meanwhile, search for "matplotlib", and your screen will fill up with ads.
So are reddit-comments and entries. There it's even worse, because most people don't connect the content with manipulation.
> The "best" recipe for pancakes is only what's trending on instagram right now.
So, like reddit? I mean every platform has their hive mind, and reddit is even worse, because the hive mind can be manipulated with paid upvotes, not just reposting and comments.
> The same for trending programmer tools.
Aren't most of them yet again commercial products from companies?
Dead Comment
This is the least of the issues with Google.
I don’t think this is the problem. The problem is the need for public companies to grow exponentially. If you take away the need to constantly grow exponentially, then ads on search can be both balanced well and make an insane amount of money.
Deleted Comment
I figured that's why it's so high, is Reddit's UX keeps slowly getting worse so the best way to find stuff on Reddit is by searching outside of it.
So are a lot of Reddit comments.
This is for me exactly too. Search "best XXX for YYY", and I get back two pages of dubious websites that smell like paid ads a mile away.
The worst part is that despite Reddit getting so much worse, there is no other place that's grown to fill the void. This place is great, and I do search HN when it makes sense, but it's small and narrow in scope. Reddit basically crowds out any competing websites by sucking up all the low-level chatter required to sustain a community, but has also pushed away high-quality posters, who now have no place to go. Very tragic but maybe a good case study in shitty network effects.
/r/nfl had a reputation for high-quality content and wasn’t a particular battleground in the Trump Wars. It’s still a good breaking news feed, and the live game threads are fun, but every post is dominated by joke comments and memes.
To the extent there is a change in quality, it probably comes from other factors, including having a bigger, broader, and different user base now than in the past (and only a small portion of that change likely came from Trump-related bans).
https://en.wikipedia.org/wiki/Phantom_time_hypothesis
i think that specialised search engines are gaining ground. For example, I am using github search for searching code samples, that works better than google.
You might want to check my side project that tries to explore the subject. I have a search tool / catalog of duckduckgo !bang operators, i am hoping that it allows for better discoverability of specialized search engines.
https://mosermichael.github.io/duckduckbang/html/main.html - (best viewed on a PC).
The latest addition is a description for each search engine, just hover over the name, and you get a description derived from the sites meta and title tags.
I think that specialised search engines are gaining ground, it has become easier to set one up, thanks to elasticsearch/lucene. They can be quite good, for a limited domain, and they don't have to invade your privacy in order to find out what you are looking for. I think that what is missing are tools like this, that would aid the discovery and use of these search engines. I hope that this will allow them to eat into the market from the 'low end'.
The projects source is here: https://github.com/mosermichael/duckduckbang
Unfortunately they don't invest too much into !bang operators at duckduckgo, however that's my input data...
Deleted Comment
Reddit especially has horrible search.
This point especially rings true for me, but it also concerns me a bit. Reddit has killed a lot of other forums over the years. If something happens to Reddit, we run the risk of losing a large corpus of information.
> This is the most annoying behavior because I really mean what I write.
I hate this too. I do get typos corrected by Google. But I don't need that - if I put a typo in my query and get bad results, I can correct the typo myself. But if Google decides I must not have meant what I actually said, there's no way for me to correct that. It's a ridiculously bad tradeoff - we eliminate errors that are trivially fixed by introducing errors that can't be fixed at all.
I have similar feelings about phone input autocorrect, which automatically converts typos that are very easy to read into (mostly) correctly spelled words, plus (sometimes) completely unintelligible nonsense.
So let mainstream use fuzzy and keep power user features. The big issue is power user support is gone for people with Google fu.
Ads or no ads isn’t really an issue for this because it’s such a small percentage of users that know Google fu.
>This is the most annoying behavior because I really mean what I write.
It should do both. And it used to do both.
Reddit is gamed way more than google. Paid posters, moderation of anything against a narrative. Google search may be dying, but reddit ain't doing much better.
It's only a matter of time before reddit too becomes too inauthentic to trust. Not only is it directly funded by advertising, its audience is mainstream enough for advertisers to invest time and money posting fake opinions in order to make it look like it's coming from real people.
I seriously hope I never see comments or news about people appending hacker news to searches. I don't want advertisers to kill this site when they catch wind of it.
You can still can get exact searches by google dorks but "normal" people might find "google trying to be smart" actually useful.
Nice try. That's totally what a bot would say.
I have long been surprised they havent been acquired .. I assume for sure they have had plenty of offerss in the past
Last valuation was at 10b$ which is ridiculous for a website that can literally get its most popular subreddits shutdown arbitrarily whenever a small group of extremely online volunteer mods decide to "go on strike" by locking the subs because they don't like something/someone else on the website.
It happened before and the admins yielded to them so I don't see why it wouldn't happen again, especially since it's not like they can run the website without that weird cabal of (mostly delusional/psychotic) power mods doing their work for free.
https://forum.agoraroad.com/index.php?threads/dead-internet-...
Archive: https://archive.ph/VoaxV
Reminds me of the "birds aren't real" theory. Its almost more social commentary than a serious theory
Deleted Comment
Are you a fan of r/mtb or r/mountainbiking?
I just wanted to mention that a friend of mine made an app for user-reported trail conditions that might be worth taking a look at: https://trekko.app/
Oh my sweet summer child. Reddit is absolutely infested with paid shills.
Deleted Comment
There is so much shilling on Reddit if you knew it would blow your mind. I wish more people realized this. Reddit is the best place to shill because not only is it ridiculously simple, people also automatically assume you’re not shilling, and then once you seed the idea, everyone else will do the shilling for you indirectly.
The healthiest way to use Reddit is like Wikipedia: assume the information you’re reading is highly compromised and biased in one way or another, but use it as a starting point in your further research and it’s a great tool.
Reddit posts are not your friends. Upvotes do not mean the contents of the posts are legitimate or not shilling.
Reddit is the best place to shill and the sooner the non-shillers figure that out, the better off the entire internet will be.
I increasingly think that upvote/downvote culture is the worst thing to happen to the internet and the world at large.
The problem is I don't have an alternative solution to propose.
Your comment is spot-on in my opinion though - I usually start with Reddit results, but try to check against other sources before relying on it.
A forum that's meant to be 100% about humans talking to humans doesn't need an API, so why does it expose one?
Also the model of user-created and user-moderated subreddits actively enables the creation of shill accounts. It's trivial to create a subreddit and use it to farm karma with a ton of bots. If you can keep real users from ever entering your walled garden of a subreddit (of which there are many) your bots will never be detected until you wipe their comment history and set them loose on the rest of the site.
Dead Comment
Likewise, I wonder how long appending "Reddit" will work. As others have pointed out, Reddit shills are already relatively common, and it's becoming increasingly common for bot accounts to create lots of random comments to appear to be human (such as finding a thread with thousands of comments, then copying and pasting the comment to another place in the thread or to another thread, or auto-generating a simple sentence based on other comments in the thread).
Sometimes the advertising hordes move so fast they kill something before it even takes off, like what happened with Clubhouse.
Classical search engines determine trust automatically, based on various factors including "link neighborhoods" where trustworthy sites link to other trustworthy sites. These automated strategies are clearly breaking down; the spammers are winning the arms-race.
So maybe we need to go back to human-based trust.
People used to curate lists of websites, which partly solved this problem but didn't necessarily scale. I wonder if that idea could be supercharged.
Consider a browser extension that people install, which:
a) gives users a button to mark a site as trusted/favorited
b) tracks domains visited (and frequency)
Then, separately, you can manually add people you know personally to your "network". You trust them, so anything they trust is also something you might be able to trust. Manual favorites could be weighted higher than frequently-visited sites, and both could be displayed inline next to links on all pages you visit. You could also see which people the trust in a given link comes from, in case some of them consistently have bad judgement about these things and you want to remove them from your list. Then, finally, you could create a personalized search-engine that only indexes the sites determined to be trusted by your personal network.
Of course this would require placing a great amount of trust in the extension and service themselves, so maybe they would have to be open-sourced or self-hostable or something (a profit motive might create a huge amount of temptation to abuse the data). That's a stickier problem.
Edit: There was a little ambiguity left here about transitive trust; “friends of friends” type stuff. I think if this went on for unlimited hops, we’d be back at square one. So maybe it only uses direct contacts, or maybe some small N of hops (where longer ones are weighted lower?). Maybe this would be configurable, not sure.
Also re: privacy, maybe you could come up with a clever way to E2E encrypt the site visit data, even though it’s shared with many parties?
Throw in some microformats2 and/or schema.org structured data and you're good to go.
Certain search engines specialize in this type of manually-curated content; I listed some in the "non-generalist search" section of my collection of indexing search engines: https://seirdy.one/2021/03/10/search-engines-with-own-indexe...
The problem we encountered is that the vast majority of people are not hyper-organized list makers — the 1% rule of the internet [2]. To create a "human curated search engine" with any utility, you need a massive amount of manually-categorized data — data which most people are simply not interested in generating. This is why no social bookmarking site (e.g. delicious, pinboard, etc.) has ever taken off to hundreds of millions of users.
I still think there's something exciting to be built here, but it will likely need to take a more "automated" approach as you suggested.
[1] https://trove.to/
[2] https://en.wikipedia.org/wiki/1%25_rule_(Internet_culture)
So, for instance, the more weight you give to sites that are quoted by Wikipedia in your search rankings, the more content farms will have incentives to sneak edits that link to their sites.
There are ways to counter that (eg moderation), but in general, defense is more expensive than offense.
Whuffie is, roughly, money determined by your social interactions. More importantly, others also have a queryable score that's weighted according to who you esteem highly - this sounds like what you're proposing!
1) Single contributor website (blog, personal page...): It seems that we could spread the trust the whole website in the algorithm (at least more than for the next case)
2) Multi contributor website (forum, newspaper): It seems the trust should be given at an URL level
Something worth delving into if we are designing this trust based search engine in real-time here at HN ;)
https://blog.digraph.app/2020-06-13-democratization-of-searc...
In that post, I don't address the reputation management aspect as much, but it's central to making the whole thing work, and I think crowd-sourcing and a well-conceived reputation management system that can influence results are good next areas for exploration.
(At some points I feel it is thrown out as haphazardly as "correlation does not imply causation".)
But I think you might be onto something. It won't necessarily be easy but I think it deserves more than a quick dismissal.
Another "sticky" problem is how to make a living out of this I guess...
Different website. Different title. Exact same content. 4 or 5 in the first page of search results.
I'm assuming they're all ran by the same person, throwing as much ** at the wall knowing some will stick.
Many of my searchers now include "reddit" or "forum" at the end to filter out all the spam/crap.
But the web, in the sense of quality:crap ratio, has gotten substantially worse.
This flood seems like the ultimate manifestation of turnkey hosting solutions.
Imho, we could do worse than reviving an idea from email's early days vs spam: negligible per-use charging. The idea was to tax emails at $0.0001 (or somesuch). Insignificant for actual users, but financially decimates high-volume, low-value spammers.
I'm surprised they haven't done some kind of manual pruning of junk like that, or maybe they have and it's not working... but on the surface it totally seems like they could implement something that says "GitHub has content X, and these other 10 sites are 99% the same, but we've flagged GitHub as an authoritative source so they'll always outrank the clones".
Maybe it's a fear of appearing unfair. Or maybe they secretly want to hurt Microsoft by turning a blind eye. Or maybe this is actually a much harder problem. If I had to guess it's probably #3. But as a user of search it's frustrating to find the clones ranked above the real stuff.
Meanwhile I have in my inbox in the last 24h at least a half-cozen emails looking to do SEO work for my company website.
Web = untrustworthy? YUP
I'd happily pay for a serious version of 1999 Google, but updated to filter out anything advert based, and search for exactly what I want.
Search is such a fundamental function, and we've done the experiment and the advert model fails - it needs to be just another utility.
If only google was smart enough to figure this out
I've clicked on a few out of curiosity, immediately recognizing they were garbage from the description text, and it's just endless SEO links and completely random text.
You'd think one of the richest companies on earth could make a freshman intro to CS-level spam filter. If they can't, then they truly do hire the most incompetent people on earth. If they won't, then they hire the evilest people on earth.
I was recently trying to troubleshoot a very basic error message for Linux and was getting results and webpages that would list the error message in the title in some way, but then give instructions on "First, open up device manager", "Click Win+R to open windows command prompt", etc. Lots of untrustworthy ads. Different URLs, almost line-for-line identical webpages.
This was something like the top four search results (that weren't sponsored ads).
1) if this is from the project's own site, it's good.
2) if it looks like an archive of the project's mailing list, it's good.
3) if it looks like an internet forum, it might be good, or it might be just another poor soul asking the same question.
4) if it's on StackExchange, it's like on the forum, except your chances are slightly better. Karma must flow.
5) if it's on Reddit, it's like on the forum, except your chances of getting an answer are worse.
6) if it's a blog of some geek, sometimes it can be better than 1) and 2), or you might just get a straight answer.
7) in any other case it's most likely a SEO farm. Run.
If I have a linux problem these days, google usually gives me the relevant piece of source code on Github from which the error message originates. Like, you're a big boy now, go figure it out yourself.
It seems we're back to the early 2000s, when search engines not so much specialized in topics, but leaned heavily towards one type of content or the other. Holy hell, maybe one day Reddit's own search will be good enough so google can be ditched for good!
Deleted Comment
I don't want to see the same result repeated 5 times across different stack overflow mirrors.
Nothing wrong with that but that also indicates the usual clickfarm spammers from developing countries had unfiltered access to Clubhouse from day 1.
This might sound elitist but it's probably a good idea that C2C apps get their first batch of users and community leaders from high income countries before branching out to the rest of the world.
Because doing a search in a place that’s not moderated by humans would generate too much noise.
I think this kinda takes us back to the old times with Yahoo (and humans sorting the information) etc…
A giant step backwards, if you ask me.
I had to search crates.io . Let me tell you. It's not the pinnacle of search.
What I searched was `fast bitset implementation`. My results consisted of drill bits a stack overflow questions and a Baeldung article on hashSet vs long[]
This seems like a natural result of optimizing each search for revenue. Think of a search to solve an error message on your computer. There's a very small number of vulnerable people who are going to spend money as a result of that search, so optimizing for ads would mean tailoring the results specifically for those people, pushing them to sleazy sites where they might spend money on some kind of antivirus scam. The results are worthless to you, but who cares? You're worthless to Google when you're doing that kind of search. Try searching for something that people in your demographic spend money on, and the results will likely look better to you.
I do this to search for items on craigslist across the country.
This will force a reddit search.
Combined with AI imitating speech and deepfakes, and technology of inplanting false memories, we will have the matrix, just not the wau we expected ^ ^
Can people suggest good alternatives or search patterns for certain categories of information or search types?
Some of the search patterns I currently I use:
* Youtube for product reviews and demos, entertainment, music and educational material.
* Google with site:reddit.com at the start for questions best answered by other humans; crowd-sourced answers, authentic replies from mostly real people.
* Google with site:news.ycombinator.com if I want to find "forum-like" discussion on topics I'm interested in.
* Google Image search with site:amazon.co.uk when looking for niche products I need to buy, because Amazon's search is so incredibly broken and game-ified.
What I'm having a heck of a time finding is technical content; long-form programming tutorials, deep dives into academic concepts (I do a lot of signal/audio processing and search for blog posts related to these topics), circuit schematics, electronic engineering content. These used to exist on enthusiast forums 10-15 years ago, but Google often no longer surfaces hits from these forums, both because the content is old and the forum model is dying. Reddit is the "replacement" but it plagued with low-effort "look at my thing" posts that help nobody.
Oh, and the content must also be "fresh". If the content isn't "fresh" (which most of the best forum/blog posts are not), nobody shows it anymore. I can search for a specific blog post using a verbatim quote, but the result (if it exists) is buried under 10+ pages of "fresher" content, no matter how disconnected it may be from the search.
The root problem is that attention has gone from abundant to scarce, and people already have their habits. That makes it really hard to build a new forum site and attract an audience that's willing to type your URL in every day (and if they don't visit daily, forget about building a viable community). Forum hosts like Facebook and Reddit don't have this problem - you can view your Buy Nothing Group and Moms of Springfield posts interspersed with your feed of friends, or your r/factorio content interspersed with a steady stream of r/AskReddit.
There's also emerging technological barriers. If you don't sign up for CloudFlare, as a new website, you're going to get hosed - but at the same time, CloudFlare makes it basically impossible for any new search engine other than Google to spider the site. Ditto security patches, and keeping software up-to-date. Most people don't want to deal with sysadmin stuff at all, particularly if they're trying to build a community as a hobby. So that pushes people further toward hosted solutions with a turn-key secure software stack, which is Facebook and Reddit.
If you find a forum for a given subject, it is almost always an authoritative source filled with experts. This is especially true in engineering disciplines.
It's unfortunate that Reddit and social media took over and led to their decline, because it's suboptimal setup in so many ways.
- Reddit in the large is a high noise, low signal monetization chamber. Some subreddits have good moderation, but that doesn't stop the spill over and drama.
- You can't assume much about any given Reddior, and you won't typically form relationships or associations with them. It's pretty much pseudonymous.
- Reddit doesn't focus on authorship. It doesn't allow inclusion of images, media, or carefully formatted responses in threads.
- Reddit corporate is the authority and owner of all content. They can change the rules at any time, and that's a fragile and authoritarian setup for human discourse.
- Reddit corporate is constantly changing the UI and engaging in dark patterns to earn more money. This flies in the face of usability.
Forums should make a comeback. It would be better if each community had real owners and stakeholders that had skin in the game rather than a generic social media overlord that is optimizing for higher order criteria that sometimes conflict with that of the community.
But forums have problems too. They should be easier to host, frictionless to join, easy to discover, and longer lived.
Another way to think of this: every major subreddit is a community (or startup) of its own and could potentially be peeled off and grown. You'd have to overcome the lack of built-in community membership and discovery, but if you can meet needs better (better tools for organizing recipes, community events, engineering photoblogs, etc.), then you might be able to beat them. Reddit can't build everything, just like Facebook couldn't.
Its a plus in minus in a lot of ways but the biggest con is that its just straight impossible to search a discord log effectively.
I've been on the Kagi beta test for a few weeks now and, for the kind of searches I mostly do, it seems to be a massive improvement on Google. Strongly recommended.
https://kagi.com/
It may lack "instant answer" widgets or other fancy search engine features but it gets the actual "search" part of the equation so right that I find it astonishing how I ever used DDG/Google in the past.
With Kagi most of the results are what I'm looking for. If they are not, I'll still try "@google", but so far with very few queries Google's results were actually better. The biggest drawback is worse "smart cards" results, but I hope they keep those optional/unobtrusive anyway.
The strange thing is that the feeling Kagi gives me, isn't even unknown. It just feels like Google circa 2010.
In the FAQ they mention potentially charging around $10/month.
Not sure if I'm being entitled or anything, but I was expecting something more like the original WhatsApp model of a few dollars a year.
Perhaps I'm under-estimating how computationally heavy search is.
https://kagi.com/privacy
Google seems to slowly oscillate between thinking that I am a right wing loon, and thinking I am Joe Public who must not be shown misinformation. That is, sometimes google is perfectly willing to vomit forth results from the propaganda mills, even when I'm not specifically looking for it, and other times I can't get conspiratorial-minded results even when I am making an effort to find them.
This most frequently manifests itself when I am looking for sources for claims that I know exist. Like if I remember reading an earlier conspiracy that has just been invalidated, or someone posts some a video of someone reading a blog post. If google has decided I am an innocent bystander not to be shown conspiracies it can be nearly impossible to track down the original blog or posts about the conspiracy.
Recency bias is another huge problem with google results. Older content gets heavily de-prioritized, even when it is clearly what you want. Google is willing to give up on terms in your search before it is willing to show you old stuff. For example, if you tried to research early Ukrainian political corruption during Trump's impeachment, your results would be nearly entirely Trump-related content even if you tried to use google's date-filters and exclude terms like -Trump.
https://www.google.com/search?hl=en&q=site%3Areddit.com%20ne...
The entire first page is for NetflixViaVPN subreddit (not linking to avoid SEO). They have a stickied post that seems to shill two VPN providers I haven't heard of. This is plausible, as maybe Netflix hasn't either... The stickied post has comments disabled, so it's hard to tell. Then if you click other posts, a bot auto-links the stickied post, but everyone is making different suggestions that may imply the stickied post is wrong.
Interestingly, the same search on DuckDuckGo only has three posts from that subreddit. This better matches what I wanted! The first one I'm seeing is:
https://www.reddit.com/r/VPN_Guide/comments/rgh2xn/best_netf...
This seems much more plausible. All those comments suggest a provider I've heard of and that I've heard other people mention IRL. Google seems to rely too much on the URL or the page header, so it's stuck in a single subreddit.
It's not GREAT that you have to do that, but it's pretty functional and certainly better than going past page 1 of search results.
Anyways, here are some other things you can do for reference: https://support.google.com/websearch/answer/2466433
So if for instance I'm looking up info about ADHD meds as an adult, I might get tons of articles about childhood ADHD since that's where all the research is. I search Adult ADHD meds, I still get articles about childhood ADHD. So then I:
-child -childhood -adolescent -teen -kid -momgroup -mother -parent -teenager -children -kids -school -offspring -smallhumans -minor -underage +adult +work -rehab -addiction
and I still get crap blog spam that's probably related to teaching or raising children or some other bullshit like warning about the dangers of addiction or something, and never information about my ADHD or the meds for it.
It's not GREAT is the understatement of the decade.
github search is good for that. search for 'list of awesome anytopic'/'curated list of anytopic'/'list of anytopic' and you might find a repository with a curated lists of links on anytopic. (search box on the main page of github)
You even have the 'The definitive list of lists (of lists)' https://github.com/jnv/lists
You might also want to check my side project: I have a search tool / catalog of duckduckgo !bang operators, i am hoping that it allows for better discoverability of specialized search engines.
https://mosermichael.github.io/duckduckbang/html/main.html - (best viewed on a PC)
here is the project page on github: https://github.com/mosermichael/duckduckbang
I've sometimes thought the death of Google will be the self hosted search engine.
For local information, I've found forums for local sport teams to be great resources during the off season. Posters are often happy to engage in any sort of chat during the off season. Even if you haven't gotten to know the frequent posters during the sport's season you can use the (usually highly visible w/o any additional clicks) account age/# of posts/"karma" as a proxy of posters' trustworthiness. note: If you don't normally contribute on-topic (i.e., about the team and sport) posts, I would only search the forums for your questions and not post off-topic questions as that'll get you quickly banned.
Deleted Comment
Use https://hn.algolia.com/ instead. You can see your results as you type your query and even sort it based on time.
I think we're well on the way ...
Was recently pretty shocked, searched for "gas heating repair" and got back at the top some sites with my suburb name in the title. Naturally I thought, wow, if there is a local place I should go there. Clicking into it, it has everything about my suburb - a picture of the local park, and whole paragraphs of random text containing bits and pieces about the local area interspersed with odd sentences about gas heating ("Cold mornings in XXX can be confronting without effective heating" etc). The text kind of makes sense but also reads like it was generated by GPT3.
Of course, then I realise, this is all SEO. They have generated a page like this for every suburb in my city. There are tens of thousands of such pages they are hosting. The most shocking thing is this is a small time gas repair dealer. They clearly don't know how to do this, they've gone with a low budget to an SEO firm who has effectively generated a giant plume of toxic content into the web atmosphere, all to create a marginal benefit for this one small company.
If a small time low budget unsophisticated company can do this, then I have to assume it's happening everywhere. On a mass scale we have giant smoke stacks all over the internet spewing toxic plumes into the atmosphere. And the humans are gasping trying to find the small bits of remaining breathable air.
Well, some things are (reverse image search, ease of accessing 'Cached' pages -- now I have to go to archive.org Wayback, etc), but forum search has always been bad.
Long before Reddit was big, USENET/DejaNews and forum software like PHPbb/UBB ruled supreme (and before Markdown there was UBB Code). Google, despite owning DejaNews, did not often surface links into USENET content, and a lot of forums, for whatever reason, were not indexed by Google. For example, I used to spend a lot of time reading the latest on PC/3D Hardware stuff on Beyond3D, Overclockers, Rage3D, etc and I almost always had either use site specific search (dejanews.com or say, PHP BB's built in local search), or I had to add site:beyond3d.com for example.
And is a large amount of confirmation bias going on in these Google threads that appear. Some people make assumptions that their search patterns are representative of the billions of searchers ("argh, I searched for pytorch k-means and a GitHub wrapper site appeared!") and that their experience is a representative sample, while others focus only on what has gotten worse, and not what has gotten better.
What's clearly gotten worse is webspam. But while it has degraded the Googlee experience, it's not clear any of the other search engines are any better at filtering it out, except by luck because perhaps they don't crawl as many sites as often.
Whether or not google search right now is as good as it could be should not be the main point of discussion.
We have to remember to acknowledge that the web google is indexing now differs drastically from the web it was indexing 20 years ago. Web pages are now less likely than ever to be freely accessible plain text put forth in good faith for public consumption. Google (in addition to dealing with big walled gardens designed explicitly to hide content from google) is trying to sift through basic spam, industrial scale SEO exploitation, and nation-state cyber warfare.
Bitching about google search being bad almost feels like yelling at the canary in the coal mine when it passes out.
The issue (at least for me) is that google is no longer actually searching for the thing I ask for, and it's being blatantly disrespectful of users who cared enough to learn how to actually use the search features.
Quick example from today? I did a literal two word search - gulp admzip - and while the result are okish, an increasing amount of space is taken up by results with this handy little blob at the bottom:
"Missing: gulp | Must include: gulp"
"Missing: admzip | Must include: admzip"
WTF are they smoking? I asked for two fucking words, and the top result doesn't include one of them. Then the second result doesn't include the other.
So then I add quotes around the phrase I want "gulp admzip" because I'd really only like to actually see results that include that EXACT phrase, and... drumroll... IT DOES IT FUCKING AGAIN: "Missing: gulp | Must include: gulp"
And that literally has nothing to do with the quality of the items it's searching, and everything to do with Google deciding what I meant - Clearly I meant the npmjs.com package adm-zip, because that item gets vastly more views than any of the real search results.
I couldn't have possibly meant to restrict the search to the actual fucking phrase I told it to search for, because there aren't that many results, and they don't get many views.
The problem as I see it is Google has created a bunch of perverse incentives to make your page rank higher. One big problem is Google gives higher rank to "comprehensive" articles. On the one hand that would seem like a good thing right? But what you end up getting is endless affiliate articles that don't seem to be written for humans. And they are really easy to spot if you know what to look for.
A great example is webhosting reviews. Search "best web hosting" and click any of the 1st page results and you will almost always get an article that just rambles on and on and on with headings like: best web hosting for email, best web hosting for blogs, best web hosting for email marketing. To a human, it's an incredibly disorganized mess, but to Google's bots, its "highly comprehensive and authoritative".
Webspam will seek to game whichever search company has dominant market share and they will structure their spam to overcome the filter and ranking specifics of that engine.
Considering tools like GPT-3, one could easily imagine in the limit, a spammer running a large number of searches through a search engine, finding out what ranks high, and the training a generative model on that dataset to produce similar articles. Auxiliary signals like inbound links and DNS records they can also usually work around by purchasing domains or buying inbound links.
It will always be a war and there is never going to be a victory over webspam. Even with something like web3 where posting content costs money I can imagine ways spam.
Man, that's almost an understatement haha. I always wonder if it's just a "Hard Problem", as I still don't know any forum software that solved it.
We're back to the Web needing a search engine.
[EDIT] I should add that the ads Google was showing me didn't even do a very good job of showing me the very specific kind of thing I was looking for, even though there must be thousands of stores around the world selling pieces that fit the keywords. The ads were for jewelry, but most of them weren't anything like what I was trying to find. In this case an entire page of ads but all from different sites and mostly the thing I was looking for would have been better than nothing, but it couldn't even do that.
[1] https://search.brave.com/ [2] Beta at the moment - https://kagi.com/
https://twitter.com/vladquant/status/1494076266508537858
Obviously I've just built up a list of good sites in my head which I trust... Google search is good for discoverability if you're new to the web I guess? Although in the old days that's what web directories where good for:
https://en.wikipedia.org/wiki/List_of_web_directories
- It contains 100% signal- no noise- and provides helpful related links if you need more information.
- Pages are organized and brutalist.
- Every page has a steward who (thanklessly) keeps the information accurate, up-to-date, and ad-free.
Contrast this to Google:
- 50-100% noise (depending on the query; more information requires more queries and therefore less signal)
- SERP pages are disorganized and absolutely riddled with UX dark patterns (modals, banners, autoplaying video, etc). Many pages with good info are over-styled/over-javascripted/over-languaged, and finding the one or two sentences you're looking for is a chore.
- One-off SEO spam plagues everything; ads and affiliate links are pervasive. Stewardship is a waste of time.
Wikiepedia is a very good resource for a lot of things, and a good jumping off point, but you shouldn't assume that you are getting "factual information", especially when it comes to hot button social or geopolitical issues.
https://www.craigmurray.org.uk/archives/2018/05/the-philip-c...
This is almost certainly not true. I don't think kids (who make up the vast majority of 'new to the web') care about using Google for discoverability. They'll use YouTube, Twitch and Instagram to find things they care about. Google is for answering questions, not finding new things.
And honestly, it's not generally that good at answering questions.
Source: my 10-year old.
... such as? The only general-purpose online store available to me is Amazon as far as I know, and I certainly wouldn't call them respected.
I feel like that this has not changed the last 20 years. Yes - google was at some point like a miracle that seemed to solve lots of problems around searching the www for information.
While google "refined" its search and monetarized it the web still evolved and is evolving to something.. different. Many of websites most people already know, competing around google top rankings and ad revenue; there are even people dedicated to "make $website more visible to the web (what they really mean is google)" for lots of money while the real internet goes on in the background.
We need more ways to search the web. We need lots of different search engines that are competing and working together also. The web is still young and no one really knows what it will be in the future. (I fear it has to do with ads. Lots. Of. Ads)
My fear is that walled gardens might win in the future because who guarantees you that websites won't move to Facebook Pages, Facebook Groups, Slack and Discord channels etc. Open web is weaker than ever just look at LinkedIn; walled garden, throws you Register form in the face when you try to access it and won't let anybody crawl or scrape their content except Google who drives more traffic to their walled garden.
We don't need your unrelated ads, we already know what to buy, don't patronize us. We need help getting to the product page from keywords. We need real reviews. We need a shopping experience we can trust.
Also you say the results were "all from different sites". Is that a good or a bad thing? I imagine having too many results from the same site would be less informative, no?
I'm very curious to try the same search query myself, but of course I understand that it may not be something you'd want to share.
This is a structural problem and anything that gets large enough will succumb to the same forces. If the incentives are for optimizing ad revenue then that's what all corporate machines will do at scale, regardless of their initial motives and incentive structure. It doesn't help that Google is also an ad network, hence the ouroboros aspect.
Google made links on the web the measure of how good a page was. That became the target of everyone trying to do SEO. As a result, it stopped being a good measure of how good a page was.
But in the long run, nothing will work in that environment, because every measure will be gamed as soon as people figure out that Google is using it. Google's only choice is to try to stay ahead of the SEO crowd, and I'm not sure they can do that (well) for too much longer. In fact, if the article is to be believed, they're already starting to fail.
This is also the case for social media platforms. They're incentivized to surface content that generates engagement and ad revenue. Basically ads are at the root of all problems when it comes to the internet and the content on it.
The problem begins and ends with the conflict of interest that Google both sells ads and selects search results. If they didn't have a vested interest in people visiting sites with their ads on them, they could decimate the number of spam results.
The only thing that influences Google search results is Google's desire to keep as many people using Search as often as possible, since nearly all of Google's money comes from showing those text ads at the top of the Search results. This is all public information, you can read it in the 10K etc.
So if Search sucks, it's not because Google has the wrong incentives but because they can't solve the problems Search faces.
It also seems to me that internet has somehow became a middle man and is not providing human deep enough interactions, especially outside chat-like website (basically any exchange, business)..
I could envision a whatsapp like system with quality control for producers and transparent transaction/tracking/accounting management offered by the network so people spend less time on side-loads and just focus into helping each others and doing what they need to.
I'm really interested in the idea of decentralized search where everyone has the power to choose for themselves who to trust.
> How would the sites keep the lights on without ads?
Making them turn off the lights is the goal. Good riddance I say, once we get there.
It's a new internet protocol (NOT www) designed to be minimalist and interesting to hobbyists.
> How would the sites keep the lights on without ads?
The same way they did in the web 1.0 days - somebody would maintain the server themselves, or pay to have it maintained.
Discussion: https://news.ycombinator.com/item?id=30072085
If there were a better solution we’d all already be using it. Certainly you can rely on savvy people to produce free stuff, but the total amount of content will be drastically lower and therefore fewer consumers.
The best thing is to just use bookmarks and your favorite sites’ own search.
There’s just too much trash on the net
Dead Comment
It's probable that a huge amount of useful information will soon become much more difficult to access, and/or diluted by stealth advertising, as Reddit looks to aggressively monetise its position. I'm interested to see if a credible alternative emerges and if there is any effort to move some of the existing useful data off the platform.
Now I don't trust a single damn thing I see on that site.
Yes, Reddit isn't perfect, but I've been hard pressed to find better options.