This just proves all the "suspicions" privacy-conscious users have had about large corporations fingerprinting users, often in very obvious ways. There's often no better place to find ideas for surveillance than the people conscious about being surveilled.
I found it VERY amusing if you go to r/SEO just yesterday there were moderators and flaired users (you know, the elites of the SEO community, lol) insisting much of this was "debunked" years ago.
They of course deleted their posts, but the threads are still up. What a den of scammers over there.
It's just endlessly fascinating to me the grift on rSEO
How these types first gain moderator status on a few subs and then the spam begins (picture of spam https://pixeldrain.com/u/a6qUPjTq )
I haven't been able to find a single legitimate expert in the entire sub, and I've checked about every flaired user and moderator.
You have lots of people like the above, or https://www.reddit.com/user/jesustellezllc/ that claim to run an agency in Frenso California called Ozelot Media, but when you look him up there's nothing. When you google "SEO" + "Fresno California", Ozelot media isn't even in the top 100 results. Lol, I thought that was the job of a SEO-type? Why let that stop the grift though?
And it's Apache licensed, which grants a patent license. Some of the comments refer to specific aspects of how page rank is calculated. Pagerank itself is past patent protection but I wonder if this also accidentally might grant licenses to other patents.
There's still an angle where the copyright owner claims that the person who caused this to happen did not have the authority to apply the license to it.
> My anonymous source claimed that way back in 2005, Google wanted the full clickstream of billions of Internet users, and with Chrome, they’ve now got it. The API documents suggest Google calculates several types of metrics that can be called using Chrome views related to both individual pages and entire domains.
What answer do the engineers at google working on this have for this violation of privacy?
I am not an engineer at Google but this is I would say if I was.
We don't know who you are, you are just a number in a database, and we don't even know what number, we just get the total number of visits for each website, not who visited it. It is like counting cars on a highway, not following your car. Plus, it serves the useful purpose of providing you with better search results, the terms and conditions allow it, and it can be disabled.
The obvious response being that counting cars on the highway is a necessary first step on the road to identifying and then tracking their movements.
Similar to how insurance companies have offered voluntary, “anonymized” data dongles for discounts that are now being used (or at least revealed to be used) to collect data most often used to reject claims.
Personal (not work related opinion): This basically can’t happen with things like DMA and GDPR. DMA in particular means you can’t share data across “products” without explicit consent. So you could for example collect websites that don’t work for the purposes of improving Chrome, but not then share that with the Ads/Search orgs for personalisation or targeting, as far as I understand the legislation.
Personal opinion about work at Google (still not googles opinion) I’m consistently impressed with how seriously this stuff is taken and the amount of work that goes into making sure that things like this sharing can’t happen accidentally, and that user choice is respected. The engineers on the ground are absolutely making sure this all works, and most of us care deeply about user privacy. I have personally worked both on implementing new features that significantly push forward privacy, and on implementing privacy controls for regulatory purposes.
The thing is that preventing "sharing" isn't sufficient. People who are concerned about privacy don't want any such data collected or stored in the first place, ever. The implicit "sharing" of my data with Google (or whatever company) is a problem in itself. Regardless of how "seriously" Google (or whatever company) takes it, for a lot of the data I don't want them to ever have it in the first place.
> I’m consistently impressed with how seriously this stuff is taken and the amount of work that goes into making sure that things like this sharing can’t happen accidentally
I believe the law is violated when it's sufficiently profitable -- it just requires VP permission.
No public sources for this except Jedi Blue, the old anti-poaching case, etc.
> This basically can’t happen with things like DMA and GDPR
I'm sorry but this is just wishful thinking. It might be what the spirit of the DMA & GDPR want but definitely not the reality thanks to inadequate or outright non-existent enforcement.
There are businesses out there whose entire business model and revenue stream are based on violating the GDPR. Not some kind of internal conspiracy or rogue employee, but the entire company is doing it in the open and the result of its doings (targeted ads or spam) are visible out there in the open for all to see.
Facebook, credit bureaus, data brokers, "consent management platforms", etc. All these companies' business models are big, obvious breaches of the GDPR. Yet, they are... still alive and kicking?
There is no chance that a concealed GDPR breach (whether intentional or accidental) will get addressed when the biggest intentional breaches are still allowed to continue out there in the open.
I suspect something very similar is going to happen with the DMA - Apple is already acting in bad faith but have yet to see any consequences.
> What answer do the engineers at google working on this have for this violation of privacy?
The same answer you probably have for the millions of questions about what the things you do that some other people find offensive to their personal views and beliefs.
See, that’s the nice thing about the GDPR: You cannot hide unexpected hostile stuff in the ToS anymore. If you don’t tell me what you do with my data in a way that is obvious, easy to understand, and most importantly easy to disable, it’s illegal.
Sometimes I wonder how much better the internet would be hits on Google weren't directly tied to revenue from Google itself through its ad program. I am certain Google has made the internet and the world a worse place to live.
As a user of Kagi and search.marginalia.nu I can tell you:
Quite a bit.
So much that now that I have what "everyone" asked Google for for years - that is blacklists - I hardly use them.
Why? Because with Kagi I get much better results out of the box.
I am fairly sure Googlers will tell me there are multiple safeguards to prevent the inclusion of Google ads from affecting ranking, to which I just have to say that the results speak for themselves.
Please note: I have only used Kagi for two years. I am only one user. But I am a user with 20 years of experience with Google and that got to count for something.
I actually use pinning, blocking and raising/lowering the value of individual sites every day. I wish this is the direction search engines went in the first place and it's the direction I hope Kagi continues. I want a personalized search engine that's personalized by me, not by a company trying to profile me and make money off of my clicks.
I was excited to try Kagi, but I couldn't justify the cost. I find DDG with the occasional Google search to function almost as well. I'll try Kagi again at some point, but it wasn't the panacea people here made it out to be
I switched to Kagi in June last year. I just realized I tried it initially because I wanted to try out blocking sites in search results, and I have only ever needed to block three domains.
Kagi is worth the money, but it isn't magic. It's about as good as Google was ~five years ago, before they made all the search operators stop work. There's also a whole bunch of things it's worse at that Google - especially local search and shopping. Plus I still get plenty of blogspam and AI generated crap from Kagi.
I made my decision two years ago and I would probably do it even if it was just on par with Google, to support competition and to avoid supporting Google.
But in hindsight it is just exeptionally much better. There is no going back unless Kagi does something monumentally stupid.
How much of that is due to ad-tech companies like Google conditioning people into thinking that way? What if online payments weren't so god awful and allowed people to throw in a few dollars as easily as they might at a toll booth? That's still an unsolved problem too. Credit card companies have solidified their involvement in every facet of the process and the alternatives are non-starters for frictionless commerce.
I'm still happy to put my money where my mouth is and do pay for services which are genuinely useful to me. But this is not the kind of internet I imagined when growing up.
It's not that people don't want to pay it's that it's difficult to pay small sums. The web browsers could solve this problem but they make money from ads so it's not in there best interest.
Google was really great and revolutionary, they helped zillions of small companies to thrive. It was another cycle.
Then, now, it is like media before the 90s: you need to pay a lot of money to be in the center page of the newspaper.
But, hopefully we are talking about LLMs now, seems like one of the answers to search engines in general. Beyond AI, I see LLMs as a good evolution from PageRank.
A little bit general but lately I use the expression: "Complexity as Scam". Google always pointed to their "algorithms" and played with this term as if algorithms couldn't be adjusted to whatever you want to be. Initially the coined term was sound because it was based on a scientific paper and eventually it evolution but it seems like the PageRank original idea has detoured from being a "pure" graph algorithm.
Another context where I use "Complexity as Scam" is Web3. It is like Matryoshka dolls where there is always one more step of complexity to probe a point, but it never ends.
It's not black and white. There was a lot of junk that was forced on us and that was removed thanks to Google. But I agree the direct relationship is inherently corrupting.
Larry Page and Sergei Brin even stated very clearly in their original paper that using ads as revenue source can impact the quality of results returned from the search engine.
I mean... maybe, but not really. The first problem of the internet was that there wasn't that much content specifically. The first internet companies were the broadband providers who were developing content themselves, like AOL.
Google and the ad ecosystem they acquired was basically the flywheel that spurred content creation at scale. Anyone could jump in, follow a few guidelines and earn a living by producing content on the internet. The Youtube acquisition and monetization followed the same pattern.
Over time the market consolidated and got less and less competitive: less platforms with complete control of traffic and one-sided revenue sharing agreements. The guidelines so to speak on how content should look and feel like were algorithmically made stricter and stricter until everything looks, feels, sounds and reads the same.
The problem right now is that the platforms are still tightening their grip, and it's all tied to the approach of using AI to replace the content creators on the platforms from Google to Spotify to Meta, and carving the spared money to shareholders. And while the web has been shitty for a few years now, we're now seeing a sudden drop in quality because the average user has no recourse or alternative, and neither does the average creator have the means of distribution and monetization (not just publishing, that's been solved) to even find, let alone meet the new kinds of demand.
I'm certain that in a few years this will even out: new search engines, new aggregators and new feeds will emerge, but the content - money - network problem triangle remains as a fundamental problem of the internet.
You mean a world where people still knew how to use a library catalog, still relied on more than one source of information and curious crazy tid bits are still out there?
The internet is boring. And the trash is still there. Its just become reputable instead.
Yes, I did! I used to use Yahoo search where the results were more hand-curated and people did not create websties for intensive commercial purposes with useless SEO fluff like it is today.
I imagine it would be a different flavour to what we have today, but the same intensity. Anything that so deeply penetrates daily life across the globe is going to bring enormous problems with it.
There is something truly strange about the idea than people "trust" a website operator and can rely on it to provide them with useful information when that same operator is well-known to be secretive, deceptive and dishonest in order to protect its own interests. It's like imagining that a fact witness who tells the truth on some occasions and lies on others is credible.
I work in search and didn't find anything surprising in here. But that's mostly because I've just assumed Google has been lying for years about many things, such as not using click data or Chrome data.
I've directly seen people who have successfully manipulated search rankings by having logged-in chrome users search for a term, and then click on a given page. Works like a charm (though may not stick once the manipulation is done, unless organic users also prefer it).
If anyone is surprised about chrome sending urls to Google, you can turn the “feature” off by unchecking “Make searches and browsing better” in the sync section of Google chrome settings.
I only have chrome installed for a couple of work related sites that don't display correctly on firefox. I dont get to choose not use the work related site and MS edge likely isn't any safer and also is not available on my choice of operating system
"But what if I don't want my own computer to build and share a detailed profile of everyone I know, everywhere I go, all my preferences, and how to manipulate me?"
"Well obviously it's your fault for not picking the 'Don't Be Cool' option on subpage 27b-6, duh!"
Yeah. It's victim blaming. Reminds me of "they should have shouted louder".
The confusing thing is the crime itself is small on an individual level. The question is: does it add up cumulatively if a small crime is committed against many?
Presumably no, I haven't seen any overly creepy shit in Chromium. There's a project called ungoogled-chromium that tracks all the Google junk in Chromium and gets rid of it, their patch set is actually surprisingly small:
> Thousands of documents, which appear to come from Google’s internal Content API Warehouse, were released March 13 on Github by an automated bot called yoshi-code-bot
Does anyone know more about yoshi-code-bot and how were these documents suddenly published?
Was it a script misconfiguration? A manual push? Something else?
I found it VERY amusing if you go to r/SEO just yesterday there were moderators and flaired users (you know, the elites of the SEO community, lol) insisting much of this was "debunked" years ago.
They of course deleted their posts, but the threads are still up. What a den of scammers over there.
https://www.reddit.com/r/SEO/comments/1d1eqjj/comment/l5tvfw...
https://www.reddit.com/user/WebLinkr/
I love how reddit is turning into the new SEO scam over night because of this stuff. Great work as always Danny Sullivan!
How these types first gain moderator status on a few subs and then the spam begins (picture of spam https://pixeldrain.com/u/a6qUPjTq )
I haven't been able to find a single legitimate expert in the entire sub, and I've checked about every flaired user and moderator.
You have lots of people like the above, or https://www.reddit.com/user/jesustellezllc/ that claim to run an agency in Frenso California called Ozelot Media, but when you look him up there's nothing. When you google "SEO" + "Fresno California", Ozelot media isn't even in the top 100 results. Lol, I thought that was the job of a SEO-type? Why let that stop the grift though?
What answer do the engineers at google working on this have for this violation of privacy?
We don't know who you are, you are just a number in a database, and we don't even know what number, we just get the total number of visits for each website, not who visited it. It is like counting cars on a highway, not following your car. Plus, it serves the useful purpose of providing you with better search results, the terms and conditions allow it, and it can be disabled.
Similar to how insurance companies have offered voluntary, “anonymized” data dongles for discounts that are now being used (or at least revealed to be used) to collect data most often used to reject claims.
This is not what a clickstream is. A clickstream requires that the sequence of clicks be preserved, and preserving that sequence undermines anonymity.
It certainly is not "to improve the net or advertising" - that would be the lying part.
Google has done some good for the net, but the scales of their contributions slowly but steadily move to the negative side.
Basically if you believe lies you tell yourself, they tend to turn into truths in your mind over time. Even if you were doing it “ironically.”
Personal opinion about work at Google (still not googles opinion) I’m consistently impressed with how seriously this stuff is taken and the amount of work that goes into making sure that things like this sharing can’t happen accidentally, and that user choice is respected. The engineers on the ground are absolutely making sure this all works, and most of us care deeply about user privacy. I have personally worked both on implementing new features that significantly push forward privacy, and on implementing privacy controls for regulatory purposes.
I believe the law is violated when it's sufficiently profitable -- it just requires VP permission.
No public sources for this except Jedi Blue, the old anti-poaching case, etc.
I'm sorry but this is just wishful thinking. It might be what the spirit of the DMA & GDPR want but definitely not the reality thanks to inadequate or outright non-existent enforcement.
There are businesses out there whose entire business model and revenue stream are based on violating the GDPR. Not some kind of internal conspiracy or rogue employee, but the entire company is doing it in the open and the result of its doings (targeted ads or spam) are visible out there in the open for all to see.
Facebook, credit bureaus, data brokers, "consent management platforms", etc. All these companies' business models are big, obvious breaches of the GDPR. Yet, they are... still alive and kicking?
There is no chance that a concealed GDPR breach (whether intentional or accidental) will get addressed when the biggest intentional breaches are still allowed to continue out there in the open.
I suspect something very similar is going to happen with the DMA - Apple is already acting in bad faith but have yet to see any consequences.
Deleted Comment
The same answer you probably have for the millions of questions about what the things you do that some other people find offensive to their personal views and beliefs.
Quite a bit.
So much that now that I have what "everyone" asked Google for for years - that is blacklists - I hardly use them.
Why? Because with Kagi I get much better results out of the box.
I am fairly sure Googlers will tell me there are multiple safeguards to prevent the inclusion of Google ads from affecting ranking, to which I just have to say that the results speak for themselves.
Please note: I have only used Kagi for two years. I am only one user. But I am a user with 20 years of experience with Google and that got to count for something.
I don't know how people keep talking about it. The results, as you say, speak for themselves.
No matter what, whatever we ended up with was going to be shitty and exploitive.
I made my decision two years ago and I would probably do it even if it was just on par with Google, to support competition and to avoid supporting Google.
But in hindsight it is just exeptionally much better. There is no going back unless Kagi does something monumentally stupid.
I'm still happy to put my money where my mouth is and do pay for services which are genuinely useful to me. But this is not the kind of internet I imagined when growing up.
Then, now, it is like media before the 90s: you need to pay a lot of money to be in the center page of the newspaper.
But, hopefully we are talking about LLMs now, seems like one of the answers to search engines in general. Beyond AI, I see LLMs as a good evolution from PageRank.
A little bit general but lately I use the expression: "Complexity as Scam". Google always pointed to their "algorithms" and played with this term as if algorithms couldn't be adjusted to whatever you want to be. Initially the coined term was sound because it was based on a scientific paper and eventually it evolution but it seems like the PageRank original idea has detoured from being a "pure" graph algorithm.
Another context where I use "Complexity as Scam" is Web3. It is like Matryoshka dolls where there is always one more step of complexity to probe a point, but it never ends.
A barrier whose erosion has been well documented over the last 10 years.
Google and the ad ecosystem they acquired was basically the flywheel that spurred content creation at scale. Anyone could jump in, follow a few guidelines and earn a living by producing content on the internet. The Youtube acquisition and monetization followed the same pattern.
Over time the market consolidated and got less and less competitive: less platforms with complete control of traffic and one-sided revenue sharing agreements. The guidelines so to speak on how content should look and feel like were algorithmically made stricter and stricter until everything looks, feels, sounds and reads the same.
The problem right now is that the platforms are still tightening their grip, and it's all tied to the approach of using AI to replace the content creators on the platforms from Google to Spotify to Meta, and carving the spared money to shareholders. And while the web has been shitty for a few years now, we're now seeing a sudden drop in quality because the average user has no recourse or alternative, and neither does the average creator have the means of distribution and monetization (not just publishing, that's been solved) to even find, let alone meet the new kinds of demand.
I'm certain that in a few years this will even out: new search engines, new aggregators and new feeds will emerge, but the content - money - network problem triangle remains as a fundamental problem of the internet.
The internet is boring. And the trash is still there. Its just become reputable instead.
https://ipullrank.com/google-algo-leak
I've directly seen people who have successfully manipulated search rankings by having logged-in chrome users search for a term, and then click on a given page. Works like a charm (though may not stick once the manipulation is done, unless organic users also prefer it).
Creepy.
"Well obviously it's your fault for not picking the 'Don't Be Cool' option on subpage 27b-6, duh!"
The confusing thing is the crime itself is small on an individual level. The question is: does it add up cumulatively if a small crime is committed against many?
Before that, you can make it audible: <https://github.com/berthubert/googerteller>
[1] https://github.com/ungoogled-software/ungoogled-chromium/tre...
Deleted Comment
Does anyone know more about yoshi-code-bot and how were these documents suddenly published?
Was it a script misconfiguration? A manual push? Something else?
Created 1,891 commits in 19 repositories
All 19 is under googleapis
This looks like a bot Google uses to publish their stuff on github and so likely it's a misconfiguration.