Chrome's hidden X-Browser-Validation header reverse engineered

dsekz · 5 months ago

Dug into chrome.dll and figured out how the x-browser-validation header is generated. Full write up and PoC code here: https://github.com/dsekz/chrome-x-browser-validation-header

Why do you think Chrome bothers with this extra headers. Anti-spoofing, bot detection, integrity or something else?

userbinator · 5 months ago

Making it easier to reject "unapproved" or "unsupported" browsers and take away user freedom. Trying to make it harder for other browsers to compete.

ajross · 5 months ago

That can be done already based on User-Agent, though. Other browsers don't spoof their agent strings to look like Chrome, and never have (or, they do, but only in the sense that everyone still claims to be Mozilla). And browsers have always (for obvious reasons) been very happy to identify themselves correctly to backend sites.

The purpose here is surely to detect sophisticated spoofing by non-user-browser software, like crawlers and robots. Robots are in fact required by the net's Geneva Convention equivalent to identify themselves and respect limitations, but obviously many don't.

I have a hard time understanding robot detection as an issue of "user freedom" or "browser competition".

Avamander · 5 months ago

> Why do you think Chrome bothers with this extra headers. Anti-spoofing, bot detection, integrity or something else?

Bot detection. It's a menace to literally everyone. Not to piss anyone off, but if you haven't dealt with it, you don't have anything of value to scrape or get access to.

motorest · 5 months ago

> Bot detection. It's a menace to literally everyone. Not to piss anyone off, but if you haven't dealt with it, you don't have anything of value to scrape or get access to.

What leads you to believe that bit developers are unable to set a request header?

They managed fine to set Chrome's user agent. Why do you think something like X-Browser-Validation is off limits?

lxgr · 5 months ago

Do you mean bot and non-Chrome-using human detection?

IshKebab · 5 months ago

Bots can easily copy the header though so I don't see how that helps?

ohdeargodno · 5 months ago

Bullshit. You don't have anything of value either. Scrapers will ram through _anything_, and figure out if it's useful later.

twapi · 5 months ago

Seems like they are using these headers only for google.com requests.

xnx · 5 months ago

Yes I think it is part of their multi level testing of for new version rollouts. In addition to all the internal unit and performance tests, they want an extra level of verification that weird things aren't happening in the wild

AznHisoka · 5 months ago

They probably are using it to block bots scraping Google results is my theory

exiguus · 5 months ago

I have two questions:

1. Do I understand it correctly and the validation header is individual for each installation?

2. Is this header only in Google Chrome or also in Chromium?

gruez · 5 months ago

>1. Do I understand it correctly and the validation header is individual for each installation?

I'm not sure how you got that impression. It's generated from fixed constants.

https://github.com/dsekz/chrome-x-browser-validation-header?...

dlenski · 5 months ago

I had the same question (2). https://news.ycombinator.com/item?id=44560664

If it's only in the closed-source Chrome, then it seems it's intended to help Google's servers distinguish between Google's own products and others.

But I've never seen a Google site which worked less-well in Chromium than in Chrome, so I'm somewhat skeptical of this. Perhaps there are exceptions

wernerb · 5 months ago

Is it not likely that it protects against AI bot Llama?

wut42 · 5 months ago

I don't see how you can "protect" against a large language model that cannot do browsing.

Deleted Comment

userbinator · 5 months ago

This should be somewhat alarming to anyone who already knows about WEI.

I wonder if "x-browser-copyright" is an attempt at trying to use the legal system to stifle competition and further their monopoly. If so, have they not heard of Sega v. Accolade ?

I'm a bit amused that they're using SHA-1. Why not MD5, CRC32, or (as the dumb security scanners would recommend) even SHA256?

ulrikrasmussen · 5 months ago

I am also alarmed. Google has to split off its development of both Chrome and Android now, this crazy vertical integration is akin to a private company building and owning both the roads AND the cars. Sure, you can build other cars, but we just need to verify that your tires are safe before you can drive on OUR roads. It's fine as long as you build your car on our complete frame, you can still choose whatever color you like! Also, the car has ads.

nurettin · 5 months ago

Ok but The Road is the internet, how much of that does google/alphabet actually own?

JimDabell · 5 months ago

> I wonder if "x-browser-copyright" is an attempt at trying to use the legal system to stifle competition and further their monopoly. If so, have they not heard of Sega v. Accolade ?

My first thought was the Nintendo logo used for Gameboy game attestation.

I wonder what a court would make of the copyright header. What original work is copyright being claimed for here? The HTTP request? If I used Chrome to POST this comment, would Google be claiming copyright over the POST request?

notpushkin · 5 months ago

com.apple.Dont_Steal_Mac_OS_X

Retr0id · 5 months ago

SHA-1 is a head-scratcher for sure.

I can only assume it's the flawed logic that it's "reasonably secure, but shorter than sha256". Flawed because SHA1 is broken, and SHA256 is faster on most hardware, and you can just truncate your SHA256 output if you really want it to be shorter.

adrian_b · 5 months ago

SHA-1 is broken for being used in digital signature algorithms or for any other application that requires collision resistance.

There are a lot of applications for which collision resistance is irrelevant and for which the use of SHA-1 is fine, for instance in some random number generators.

On the CPUs where I have tested this (with hardware instructions for both hashes, e.g. some Ryzen and some Aarch64), SHA-1 is faster than SHA-256, though the difference is not great.

In this case, collision resistance appears irrelevant. There is no point in finding other strings that will produce the same validation hash. The correct input strings can be obtained by reverse engineering anyway, which has been done by the author. Here the hash was used just for slight obfuscation.

Dead Comment

mindslight · 5 months ago

> have they not heard of Sega v. Accolade ?

My mind went here immediately as well, but some details are subtly different. For example being a remote service instead of a locally-executed copy of software, Google could argue that they are materially relying on such representation to provide any service at all. Or that without access to the service's code, someone cannot prove this string is required in order to interoperate. It also wouldn't be the first time the current Supreme Court took advantage of slightly differing details as an excuse to reject longstanding precedent in favor of fascism.

wongarsu · 5 months ago

And even if it falls under fair use in the US, they could still have a case in some other relevant market. The world is a big place

PeterStuer · 5 months ago

WEI? As in Windows Experience Index? Can you elaborate?

runiq · 5 months ago

Web Environment Integrity: https://en.wikipedia.org/wiki/Web_Environment_Integrity

lxgr · 5 months ago

Probably any cryptographic hash function would have done.

My suspicion is that what they're trying to do here is similar to e.g. the "Readium LCP" DRM for ebooks (previously discussed at [1]): A "secret key" and a "proprietary algorithm" might possibly bring this into DMCA scope in a way that using only a copyrighted string might not.

[1] https://news.ycombinator.com/item?id=43378627

cebert · 5 months ago

I have to imagine Google added these headers to make it easier for them to identify agentic requests vs human requests. What angers me is that this is yet another signal that can be used to uniquely fingerprint users.

gruez · 5 months ago

It doesn't really meaningfully increase the fingerprinting surface. As the OP mentioned the hash is generated from constants that are the same for all chrome builds. The only thing it really does is help distinguish chrome from other chromium forks (eg. edge or brave), but there's already enough proprietary bits inside chrome that you can easily tell it apart.

thayne · 5 months ago

> The only thing it really does is help distinguish chrome from other chromium forks (eg. edge or brave)

You could already do that with the user agent string. What this does is distinguishes between chrome and something else pretending to be chrome. Like say a firefox user who is spoofing a chrome user agent on a site that blocks, or reduces functionality for the firefox user agent.

thayne · 5 months ago

I'm more concerned that whether intentional or not this will probably cause problems for users who use non-chrome browsers. Like say slowing down requests that don't have this header, responding with different content, etc.

userbinator · 5 months ago

User-agent discrimination has been happening for literally decades at this point, but you're right that this could make things worse.

qingcharles · 5 months ago

How does that work, though? I have a bunch of automated tasks I use to speed up my workflows, but they all run on top of the regular browser that I also use. I don't see how this war is winnable? (not without tracking things like micro-movements of the mouse that might be caused by being a human etc)

jakub_g · 5 months ago

FYI: Google enterprise workspace admins can enable policies which e.g. prevent login ability to google.com properties to only Chrome browsers.

I wonder if this is header is not connected in some way to that feature.

cj · 5 months ago

Seems unnecessary.

The same policies also offer the ability to force-install an official Google "Endpoint Verification" chrome extension which validates browser/OS integrity using Enterprise Chrome Extension APIs ("chrome.enterprise") [0] only available in force-installed enterprise extensions.

FWIW, in my years of managing enterprise chrome deployments, I haven't come across the feature to force people to use Chrome (there are a lot of settings, maybe I've missed this one). But, there definitely is the ability to prevent users from mixing their work and non-work gmail accounts in the same chrome profile.

[0] https://developer.chrome.com/docs/extensions/reference/api/e...

Edit: Okay, maybe one hole in my logic is the first-sign in experience. When signing into google for the first time in a new chrome browser, the force-installed extension wouldn't be there yet. Although Google could hypothetically still allow the login initially, but then abort/cancel the sign in process as part of the login flow if the extension doesn't sync and install (indicating non-chrome use).

jakub_g · 5 months ago

In my current job we do have force-Chrome setting enabled. I can't log in to Gmail through any other browser. Neither SSO login to GitHub via Google.

thayne · 5 months ago

Why would they think this was a good idea after losing the chrome anti-trust trial? I don't know the intended purpose is for this, but I can see several ways this could be used anti-competitive way, although now it has been reverse engineered, an extension could spoof it. On the other hand, I wonder if they intend to claim the header is a form of DRM and such spoofing is a DMCA violation...

jsnell · 5 months ago

> after losing the chrome anti-trust trial?

There hasn't been such a trial.

Retr0id · 5 months ago

x-browser-copyright seems like an attempt at something similar to the Gameboy's nintendo-logo DRM (wherein cartridges are required to have the nintendo logo bitmap before they can boot, so any unlicensed carts would be trademark infringement)

userbinator · 5 months ago

http://en.wikipedia.org/wiki/Sega_Enterprises_Ltd._v._Accola... is the legal precedent that says trying to do that won't work, but then again maybe Google thinks it's invincible and can do whatever it wants after it ironically defeated Oracle in a case about interoperability and copyright.

krackers · 5 months ago

>an extension could spoof it

not if they make it dynamic somehow (e.g. include current day in hash). Then with MV3 changes that prevent dynamic header manipulation there is no way for an extension to spoof it.

thayne · 5 months ago

> Then with MV3 changes that prevent dynamic header manipulation

That doesn't apply to Firefox

binary132 · 5 months ago

I think it’s difficult to argue that Google doesn’t have the right and capability to build their own private internet, I just also think they’d like to make the entire internet their own private internet, and do away with the public internet, and I’d really prefer they not do that.

Dead Comment

aussieguy1234 · 5 months ago

So this is basically hidden client attestation?

Aaargh20318 · 5 months ago

Not really. It's just an API key + the user agent. There is no mechanism to detect the browser hasn't been tampered with. If you wanted to do that you'd at least include a hash over the browser binary, or better yet the in-memory application binary.

delusional · 5 months ago

That would provide not extra capability. Anybody smart enough to modify the chrome executable could just patch the hash generation to also return a static (but correct) hash.

delusional · 5 months ago

Is an "api key" like this covered by copyright? Would that technically mean that spoofing this random sequence of numbers would require me to agree to whatever source license they offer it under, since I wouldn't know the random sequence unless I read it in their source?

That's an odd possibility.

userbinator · 5 months ago

Anti-reverse-engineering clauses in EULAs are limited and exceptions are always present for interoperability. The same goes for copyright. It's hard to argue that this key is secret if it's widely and publicly distributed.

Ironically, Google just fought with Oracle a case around similar concepts.