Is Google Getting Worse? A Longitudinal Investigation of SEO Spam in Search [pdf]

I don't understand Google anymore.

Surely people can relate to the situation where you end up on an article based on some technical query you have. The article repeats your question 7 times, has endless casually-related filler text that still does not answer the question and then ends with: try to unplug it.

It is so freaking obvious that it's a malicious content farm, but Google with all of its technical might seem unable or unwilling to detect it. If tech can't do it, organize some type of curation or feedback?

Same for image search. You search for "red flower Thailand" and flowers of various other colors from various locations appear. The idea that Google is spectacularly good at subject detection from imagery does not seem to actually work out in practice.

Most people's search queries consist of just 2-3 words. Nowadays Google consistently just drops the last word as if it knows better than I do what I need.

High value elaborate articles on various topics do not rank. Instead, dated articles do. You have to manually bookmark high quality content as you see it, because you'll never find it back via search.

Is everybody asleep at Google? This is not a small thing, this is your bread and butter. Teens are using Tiktok for search, you're in real trouble and better start cleaning up your act.

dr_kiszonka · 2 years ago

> Same for image search. You search for "red flower Thailand" and flowers of various other colors from various locations appear.

Now, if any of these flowers are next to a red dress, tapping the dress will reveal links to places you can buy it.

Google is not asleep. It has just got its priorities wrong. (Or rather, incentives in this organization seem to reward not what users like me appreciate.)

mitthrowaway2 · 2 years ago

Huh, oddly one of the harder things for me to search for is places to buy things, unless they're very mass-market consumer goods.

wutwutwat · 2 years ago

The thing is, all the stuff you’re listing as things they suck at properly detecting, filtering, categorizing, they used to be extremely good at. The google search that exists today is markedly worse than the Google search of 5 years ago. Wtf happened to cause search to just rot away into a useless mess of results that used to be very high quality? It’s such a night and day regression that I’ve legit wondered if this is a sign of the Mandela effect.

adaptbrian · 2 years ago

What you describe here in some order could be pointed at the fact they let someone who used to focus on Ads run all of organic search AND ads. This used to never happen, they had proper separation of church and state. The other part to blame imo is turning down the link graph, over reliance on NLP, and attempting to continue to prop up traffic to old media sites.

https://www.wired.com/story/prabhakar-raghavan-isnt-ceo-of-g...

iteratethis · 2 years ago

Couldn't agree more.

And it's not just Google. Amazon is a mess. Social media is a mess. It all used to kind-of work but it's rapidly falling apart.

jamincan · 2 years ago

I wonder if they've misapplied the Youtube algorithm to searching. Other people who liked the pictures of red flowers also liked green flowers and purple flowers, so let's include that in the results, since that will probably generate more engagement and more engagement is obviously always good.

cyanydeez · 2 years ago

they fundamentally changed their rankings to favored Google ad dollars

it's really not a deeper technical problem.

bmurphy1976 · 2 years ago

It always starts at the top. Sundar sure seems like Google's Balmer. He's good at keeping the lights on but from an outsider's perspective he doesn't seem to have any vision.

balder1991 · 2 years ago

As an example of how bad the searchability is nowadays, I’ve been creating and expanding my own knowledge base (something like a personal wiki with links to interesting content I find) for about a year. It seems to work very well despite the effort it takes to keep it organized.

chubot · 2 years ago

I've maintained my own hosted wiki since 2004, and yeah I'm glad I didn't completely outsource information management. It's definitely getting hard to find certain things I know I've seen.

Now I just need some kind of open source search engine to run on it ... (a bunch of text files that render to HTML, and ideally following the links 1 or 2 levels deep)

~20 years ago Google desktop search was a fantastic piece of software ... very fast and accurate on your local files. I don't think something like that exists now, and maybe never existed for Linux

Search engines are extremely modular and Unix-y. You have a bunch of indexed corpora and you intermingle them at ranking time, with respect to a query. But unfortunately there is no real incentive to provide something that has measurably good results and is also open to your own data and modifications

The incentive is to make a walled garden out of it

iteratethis · 2 years ago

Same here, but in my notes app. Been doing it for years.

And if I may expand a little, not just for crappy Google, I also create alternative local knowledge bases at work.

I can't find anything at work. Everything is spread out across chat, Wikis, SharePoint, email. All having different owners, content may at any time disappear or move, there's constant authorization headaches.

Whenever I come across something useful that I expect to be of some future use, I make a local copy. File, web page, wiki, anything. Because our information systems are a massive failure.

phs318u · 2 years ago

I was about to ask you to share your list, but that got me wondering: is there any tooling for curating, sharing, and most importantly, consolidating curated lists of sites (based on tags rather than categories), such that the consolidated list is then searchable?

araes · 2 years ago

This is basically what I use my Wikipedia User Page for. Keeping links to all the news websites I happen to have found reasonably interesting content at. Stories I like that Wikipedia will probably not accept. Articles that I want to keep, yet might not be accepted for an article, or I have little faith they'll remain in the article. Probably just need to dump my bookmarks in occasionally.

alberth · 2 years ago

Honest question, how much of this is simply due to Google slowing showing more & more ads on Page 1 of search results?

There use to be a time when paid placement was only 1-2 results.

It’s frequent now that the top 5-6 results are paid placement.

(And when I’m doing a search for a specific product I know I want, competitors are bidding up those search terms which is annoying because I’m being shown not what I’m explicitly searching for)

godelski · 2 years ago

I have a hypothesis that a lot of this is the result of hyper focusing on short term reward. Just think about how we measure a top exec's performance. It is often dependent on cutting costs and increasing revenue. Cycle through that a bit and if the previous person did their job well then they cut a lot of fat. In fact, they probably cut as much as they thought they could get away with. So next person comes in and they gotta start cutting more than fat. Of course you could go other avenues to increase revenue but cost cutting measures are the easiest and quickest.

EA-3167 · 2 years ago

Kagi just renders those toxic sites into a grouping called "Listicles" which I then ignore. It's far from perfect, but it's clear that a company with far less money and access than Google doesn't find this an impossible problem to address.

So I would suggest that Google knows what it's doing, it just makes them money.

tomcam · 2 years ago

Their bread and butter is advertising, not search

sph · 2 years ago

Also, malicious content farms pay Google (to show their ads), you don't. So for Google, they are the customer they need to serve, not you.

urbandw311er · 2 years ago

Well done on putting this far more succinctly than I could have. You’ve absolutely nailed the top three issues with Google search in a nutshell.

iteratethis · 2 years ago

Thanks, I might as well keep going then :)

Google has commercialized a huge amount of search terms. I'll use biology as an example. You search for particular species and search results prioritize products that kill the species. You search for a particular plant and you'll have a hard time learning about the species as it only shows cultivated versions and products related to how to care for them.

Pure information/knowledge for the sake of learning and curiosity is de-prioritized.

Back to image search, it's unable to figure out original sources or doesn't care. Pinterest is the well known manifestation of that.

Google shopping results: completely broken. Click through on the products and half the time the price, availability, discounts and stock do not match.

Everything is so goddamn broken, and nobody at Google seems to care. I can't explain it, but it's been going on for a good 6-7 years or so.

fennecfoxy · 2 years ago

It's because Google has never been focussed on search quality imo. No search engine produces high quality results anymore.

Especially since you get the clear spam sites that somehow reference your query in the page content (where they've just spammed loads of keywords, but also pretty sure some spam sites are doing something dynamic with it).

jrochkind1 · 2 years ago

never???

yonatan8070 · 2 years ago

Of course content farms rank highly on Google, they're filled with Google ads, so why would Google demote what makes money?

cyanydeez · 2 years ago

Google has maximized advertisement $ and that's all.

We're all technically minded here but very few people really understand how technical choices add up to greater detriments.

and that's today's Google. they minimized the index and maximized the searches that yield profit through Google ads. those websites you hate? they monetize Google ad words.

jay_kyburz · 2 years ago

>You search for "red flower Thailand" and flowers of various other colors from various locations appear.

https://imgur.com/a/sJCECzQ

Looks mostly red to me. A little pink too I guess.

godelski · 2 years ago

FYI, you read the parent far too literally. I think most interpret the example as an off hand illustrative example, not a literal one.

mech422 · 2 years ago

the one that pisses me off most is installing/configuring a software package... Top articles always end up being "apt-get install foo" and never address the configuration at all.

VladimirGolovin · 2 years ago

ChatGPT 4 is fantastic for this kind of questions. No Google spam to filter out, and you can ask follow-up questions on topics such as configuration.

QuantumGood · 2 years ago

I skim from the bottom up now

Dead Comment

> We find that only a small portion of product reviews on the web uses affiliate marketing, but the majority of all search results do. [...] We further observe an inverse relationship between affiliate marketing use and content complexity, and that all search engines fall victim to large-scale affiliate link spam campaigns.

I think this is an excellent methodology for testing the quality of search results. I would love to see a standard search engine test and scoring system based on this, maybe similar to some of the LLM scoring systems.

oakashes · 2 years ago

This doesn't apply to the content complexity finding, but the finding that "product reviews which are in top search results are more likely to contain affiliate links than product reviews which are not" can also be explained by the fact that if and only I am getting a bunch of hits on my product reviews, I'm incentivized to monetize that with affiliate links.

belval · 2 years ago

We should also ask ourselves if affiliate links are really that bad. Someone could be making honest complete reviews and monetizing those with affiliate links, does that inherently mean that the search results are lower quality?

That approach also misses all the copied-a-github-issue low-effort content that seem to crop up on Google.

ryanisnan · 2 years ago

Forgive my naivety, but wouldn't a simple way for a search engine (like Kagi) to avoid falling victim here to detect affiliate link programs? There's got to be a small handful of patterns for affiliate link tracking:

1. Domain Interception & HTTP redirects 2. Tracking codes embedded in the URL directly

pants2 · 2 years ago

It looks like Kagi does this[1]:

> Kagi surfaces shopping results featuring unbiased reviews and no affiliate links to help you identify the best product across categories. Top results include discussions focused on helping you find the best item to purchase - you are not bombarded with affiliate links and ads. Continue to scroll and you will see product comparisons across multiple vendors so you can pick what best suites you. Kagi's shopping search will always return a detailed discussion of which product to buy not a competition amongst advertisers promoting where you should buy. Kagi is focused on providing you the best results to make an informed decision not polluted by affiliate links and advertisements.

1. https://help.kagi.com/kagi/features/shopping.html

g_p · 2 years ago

Currently, Kagi has (if you hover/click the shield icon to the right of a result) an indication of the information it knows about a website (as well as a way for you to rank it higher or lower for yourself).

One of these is "ads/trackers". I imagine that it would be feasible for this to include some of the more common affiliate URL types, or third party lead/affiliate tracking bounce hops like awin.

Clearly there will always be some amount of ability to "defeat" this kind of measure by obfuscating links, but eventually the user needs forwarded to a link that has a referral parameter or a site that sets an affiliate cookie or similar.

The "tracker category" also can give a bit of extra information - things like "invasive fingerprinting, advertising"

eek2121 · 2 years ago

I am working on a search engine that checks a site for affiliate links from known providers and demotes sites that have a large amount of them.

In addition:

* it demotes sites with popups (think newsletter sign ups)

* it demotes sites that block (or complain about) ad blockers

* it demotes sites with a high number of ads and favors sites with no ads

* it demotes sites using certain sketchy ad companies.

* It demotes sites that have paywalls

* It detects possible link networks and flags them for human review/removal.

* sites with RSS feeds get promoted.

* There is a toggle to hide all sites with ads or external trackers, but it is still WIP (The whole project is).

There are many other features. No idea if I am going to make it public, I created it to update my skillset. I actually thought about setting up a nonprofit and making it open source, but I haven’t decided.

schmorptron · 2 years ago

I don't know if this would be a very long-term solution if the big ones (ok, google) did it. Advertisers would catch on very quickly, and some legitimate review sites which might get funding through affiliate links unrelated to the product being reviewed would lose out to straight-up paid for "reviews" that are funded wholly by the manufacturer and just don't use affiliate links.

gopher_space · 2 years ago

What would be the difference between an affiliate link program and a web ring?

I don't know if discovery is actually a bottleneck to be automated away. It might be the fun part. I'm thinking back to the Napster approach where you could browse other people's libraries for music ideas.

autokad · 2 years ago

this is one way to do it, but I wouldn't say its sufficient. If I search for 'things to do in Seattle', you get many 'blogs' and such that a writer gets paid by sources to insert their place into the things to do list. I didn't word that well, so for example: I own a coffee shop, I pay them moneys, and the '25 things to do in Seattle' writer puts my coffee shop in the list.

If I do an image search for the word 'strawberry', how many of those results are not stock images, images from a store, etc. of a strawberry? can you find an actual picture of a strawberry sitting in the wild? or just some picture of a strawberry a person uploaded without trying to sell you something?

gumby · 2 years ago

Then the search engines will game the benchmarks. A different definition of "Search Engine Optimization" I suppose.

pants2 · 2 years ago

I know that tends to happen for LLMs but I don't know if Google/MS would care enough about some obscure benchmark system to try to game it.