Google broke a conditional statement that verifies passwords on Chrome OS

SahAssar · 5 years ago

Just reading the title it sounded very much like "Single Point of Failure: The (Fictional) Day Google Forgot To Check Passwords": https://www.youtube.com/watch?v=y4GB_NDU43Q

0x0 · 5 years ago

This actually happened for real with Dropbox about 10 years ago: https://techcrunch.com/2011/06/20/dropbox-security-bug-made-...

cmeacham98 · 5 years ago

That exact news article appears in the video.

zinekeller · 5 years ago

Fortunately (or not, depending if you have a Chromebook and store files locally there), the inverse has happened and it locked up your book (fail-deadly versus fail-safe).

utf_8x · 5 years ago

> fail-deadly versus fail-safe

Speaking of... https://www.youtube.com/watch?v=TUKQfMFKfjk

jedimastert · 5 years ago

I love Tom's speculative fiction talks. He tells such a good story.

fouric · 5 years ago

From the article:

> The line should read "if (key_data_.has_value() && !key_data_->label().empty()) {" but instead of "&&"—the C++ version of the "AND" operator—the bad update used a single ampersand, breaking the second half of the conditional statement.

Someone correct me if I'm wrong, but this seems like it wouldn't be possible in a language that didn't conflate boolean values with bitvectors.

Edit: I was wrong[1].

[1] https://news.ycombinator.com/item?id=27923785

Trollmann · 5 years ago

You can use `and` and its siblings instead of && and similar in C++11. Most people here, and in other boards, will try to convince you that it hurts readability because 'that‘s how we always did it' (read I‘m used to it and don‘t like change).

gavinray · 5 years ago

I was surprised to read that `and`, `and_eq`, `xor`, etc are all supported "secondary/alternative operators"

Never seen them used, but I use them in my C++ code when I have to write it to accomplish something else:

https://en.cppreference.com/w/cpp/language/operator_alternat...

areyousure · 5 years ago

While it is true that you can use `and` in C++11, that's a bit of an understatement: these keywords have in fact been present since the very first ISO C++ standard C++98!

yongjik · 5 years ago

> that‘s how we always did it

... which, frankly, is a very good reason. Don't needlessly change something people are used to.

Why is the blinker control on the left side and the wiper control on the right side? Because that's what people expect.

fouric · 5 years ago

I am also of the opinion that `and` is more readable than `&&` (and isn't as easy to typo in a catastrophic way) - although my main point was about the weaker type system.

asddubs · 5 years ago

you can even use it for rvalues!

CobrastanJorji · 5 years ago

Forgive my poor C++, but has_value() sounds like a bool, and empty() sounds like a bool. Why wouldn't "bool1 & bool2" work correctly?

dataflow · 5 years ago

Dereferencing via key_label_-> has undefined behavior when !key_label_.has_value(), so the compiler treats that case the same as if key_label_.has_value(). Or to put it another way, the compiler reasons that key_label_ must have had a value (since you dereferenced it), and thus optimizes out the check for `has_value()` (since clearly it was unnecessary).

kevinmgranger · 5 years ago

If I had to guess, the second expression depends on the first being true. So if `has_value()` is false, perhaps `key_data_->label()` is undefined behavior.

Cthulhu_ · 5 years ago

In other languages, booleans are their own values (iirc in C/C++ they're aliases for 0/not-0) for which the `&` operand is simply not implemented.

There's also linters and other quality control analyzers that will warn on a bitwise operator on booleans.

gmfawcett · 5 years ago

This is one of those times when, for all its verbosity, Ada got it right. For boolean conjunction you must choose between writing "A and B", or "A and then B". The "and then" version is short circuiting, while the bare "and" version is not.

gene91 · 5 years ago

This is a typo of missing an ampersand. If the author were writing Ada, it would have been a typo of missing a “then”. How is it different?

thesuperbigfrog · 5 years ago

The equivalent Ada code does not exist as the Ada boolean type only has a logical AND operator (no bitwise operations exist):

http://www.ada-auth.org/standards/12rm/html/RM-A-1.html#I538...

shawnz · 5 years ago

Visual Basic also has this distinction (but the keywords are "And" and "AndAlso")

shawnz · 5 years ago

Other than this particular edge case it's not obvious to me why bitwise operators shouldn't be applicable to boolean values. Aren't they just single-bit vectors?

Of course it makes sense that logical operators wouldn't be applicable to numeric types, but why the reverse?

qalmakka · 5 years ago

If you stop integer values from being implicitly convertible to booleans, you break the language and its compatibility with C.

The inverse is debatable, though. I honestly don't see why someone would do bitwise operations on boolean, unless they really wish to die a slow death from thousand cuts. Integer promotion rules make even the simplest operations a wild mess, and there's nothing in the standard that prevents bool from having a sizeof equal to int (thus further complicating everything).

It might be sensible for newer language revisions to consider deprecating certain operators from being applied to bool, given that there is a strong change that 99% of the time their use is not on purpose, but I guess it depends on how much disruptive such change would turn out to be.

phkahler · 5 years ago

>> Other than this particular edge case it's not obvious to me why bitwise operators shouldn't be applicable to boolean values.

That is a really good question. At some point they decided that true=1 and false=0 but I'm not sure how deep that goes. In particular what does ! actually do in this context?

wyager · 5 years ago

The most general solution here is to use algebraic types and pattern match on the value being “None” or “Some data”.

tialaramex · 5 years ago

Certainly I find

  if (key_data_.has_value() && !key_data_->label().empty())
    return key_data_->label();

to be less readable than:

  if let Some(label) = key_data_ {
    label
  }

It's notable that people complain Rust adds too many odd punctuation marks, and yet in this example it's C++ with an exclamation mark, question marks, and two different structure member operators (dot and arrow) while Rust just does the pattern match.

Cthulhu_ · 5 years ago

It's not, because that would involve re-engineering and re-writing millions of lines of code.

The simpler solution is to fix this gap in their automated testing. Another one is to configure a linter to error on bitwise operations on boolean values.

Deleted Comment

dylan604 · 5 years ago

That so sounds like an error I would make and not some superDev working at the Googs making around 4x the amount of money I make.

myko · 5 years ago

They're just people. If you want to get in you probably could, with some effort.

anyfoo · 5 years ago

Not sure why you think a "superDev" would be less susceptible to simple typos?

grishka · 5 years ago

Certainly won't compile in Java.

shawnz · 5 years ago

In fact you can apply the single ampersand & operator to booleans in Java, and just like in the example here it has the same behaviour as && but with no short-circuiting.

taeric · 5 years ago

Oddly... It would? Try:

    Map<String, Boolean> map = ...
    if (map.containsKey("hello") & map.get("hello")) { ...

Traster · 5 years ago

I bet the post-mortem for this is going to be fun. How this wasn't covered by multiple unit tests, a code review and a roll out strategy is.... impressive.

shirro · 5 years ago

This isn't the failure of an individual. It is a failure of management.

Dead Comment

sneak · 5 years ago

Bitwise & versus logical && is a classic, right up there with an assignment in a comparison (when an equality check is intended), = for ==.

That this was missed is pretty surprising, given that it's Google and the stakes involved in the encryption/key management code in a secure platform device.

I wonder if we'll even get a postmortem, as this simply cannot happen unless several someones all Seriously Fucked Up simultaneously.

- code error (expected, humans are fallible)

- review error (less expected, this is the kind of thing code review exists for)

- static analysis / linting check failure

- integration test failure (interactive login no longer works on this build)

This is a massive, overlapping fuckup. Heads should probably roll here, especially given that when you opt in to ChromeOS as a user, Google now has "one job" (DFIU).

Guidii · 5 years ago

"Heads should probably roll here..."

Google has a blameless post-mortem process (see https://sre.google/sre-book/postmortem-culture/) which essentially boils down to:

0) We want to fix the situation. By the time the post-mortem happens, that should already be done.

1) We want to make sure this never happens again. In order to do that we need to understand as much as we can about what happened.

2) We want everyone to share everything they know about the event. If the participants worry about fallout, they will be compelled to cover up.

3) Most failures happen at a number of points, so you need to dig in deeply if you want to get full value out of the "event".

4) Good post-mortems lead to process fixes to reduce (or eliminate?) the chance of similar events, or to make fixes throughout the code base.

The other thing is that after the post-mortem, all of the folks involved have learned from it. Usually they're better engineers because of that expensive lesson you've just paid for.

Disclaimer: Googler, participant in several post-mortems, opinions my own.

nogridbag · 5 years ago

I've seen the = instead of == in the wild with disastrous results: the assignment was on an ORM backed object, e.g.

    // accidentally mutates all names to Sam and persists to the DB
    myList.filter { myDomainObj.name = 'Sam' }

But it makes me wonder why our programming languages would use characters which can often lead to this type of error.

mdoms · 5 years ago

> Heads should probably roll here,

I disagree. It's obviously a major balls up by multiple people, but at the end of the day this is an organizational failure. You don't fire people for this, you learn from it and fix the organizational holes that caused it.

brundolf · 5 years ago

Especially since it's so easy to lint for. This isn't some cross-file memory mishandling bug, it's a single token that's being used where it shouldn't be

khazhoux · 5 years ago

> this simply cannot happen unless several someones all Seriously Fucked Up simultaneously.

This is exactly how any and every bug enters a mature software product: it slips past multiple layers.

> This is a massive, overlapping fuckup. Heads should probably roll here

Some process should change, to fix the problem. But firing people is not necessarily the best solution.

agilob · 5 years ago

RedHat once committed wrong value of MTWO variable, it was 2, there was another variable called TWO which also had value of 2. It was fixed a few months later with a commit like "change value of MTWO to -2".Sorry, but can't find it find now

fierro · 5 years ago

glances at screen LGTM

kps · 5 years ago

I'm quite surprised there isn't a compiler warning for a bitwise operator at the root of a condition. I've been writing `if ((flags & FLAG) != 0)` for decades years for nothing?!?

james_in_the_uk · 5 years ago

Straight up negligence.

jasonvorhe · 5 years ago

Reading stuff like "heads gonna roll" in the comments is kind of shocking to me. Have you never screwed up in your job?

How long have they shipped a secure Linux to a couple of hundred different devices simultaneously, while improving security, fixing critical bugs soon and delivering a pretty great user experience since they first launched Chrome OS? Sure, were soft bricks before, critical fixes introduced other bugs, duh. The same happens on Windows, macOS, Linux distributions - we're all human, we make mistakes, we try to learn from each mistake.

Don't give them a hard time. They already were in panic mode already.

nameless912 · 5 years ago

Yup. My second or third month at my first job, I accidentally knocked out internet access for thousands of people for a couple of hours because I deleted something in production rather than in dev. It was terrifying and I was sure I would be fired.

Rather than being canned, of course, the team managing the tool that allowed me to delete the resource learned the following:

1) people should not be able to delete things without confirmation in production

2) it should be very obvious which environment you're in

3) normal humans shouldn't have access to production except for operations folks.

That's the value of blameless post mortems.

tyingq · 5 years ago

I'm baffled that there's no test for this, manual or otherwise, before this rolls out to production. That you can change ChromeOS code in the login path, and the very first time it's exercised is by an end customer.

secondcoming · 5 years ago

Edit: Someone has explained the issue below

I don't understand how that broke anything?

The C++ codegen for && and & is the same:

    bool good(bool a, bool b) { return a && b; }

    good(bool, bool):                              
        mov     eax, edi
        and     eax, esi
        ret


    bool bad(bool a, bool b) { return a & b; }

    bad(bool, bool):
        mov     eax, edi
        and     eax, esi
        ret

[0] https://godbolt.org/z/84MMqTehs

formerly_proven · 5 years ago

The code is:

    key_data.has_value() && !key_data_->label().empty()

key_data is probably something like an std::optional, so short-circuit evaluation guards the dereferencing of the optional. If you write this with a binary and, no short-circuit evaluation can happen, so undefined behavior ensues.

I wanna point out that std::optional, in the typical C++ stance, has a "I know what I'm doing" API and a "Humans make mistakes API": optional->... is UB if the optional is empty, optional.value()... raises an exception. Arguably, in code like this you probably only want to use the latter kind of API. (The pattern of "operator can UB, equivalent method asserts pre-condition" is very common in the C++ standard library)

jeffbee · 5 years ago

Correct, key_data_ is a base::Optional<KeyData>. Chromium's base::Optional is very similar to std::optional, but predates c++17.

bialpio · 5 years ago

Except that Google uses C++ with exceptions disabled - IIUC, the throwing code would effectively behave like a call to std::terminate(), I haven't looked at the code to know if it would actually be a preferred behavior in this case.

Source: I'm a Chrome contributor.

inshadows · 5 years ago

Where is the undefined behavior? AFAU, UB is a compile time concept. As other commenter has asked, how can the compiler infer that has_value() call has any relation to key_data_ non-NULL-ness?

EDIT:

> optional->... is UB if the optional is empty

I understand what you mean: dereferencing optional is undefined if has_value() is not checked for true in the same thread before. But that undefined behavior happens at runtime, i.e. compile time UB is not invoked, the statements are kept without any optimizations. Right? The terminology is confusing.

taeric · 5 years ago

The short circuiting was preventing undefined behavior on the second expression.

That is, yes, it is the same with booleans. Now try with expressions.

magpie09 · 5 years ago

I made what I think is a minimal example here: https://godbolt.org/z/dcnzonjar

Interestingly I couldn't get it to fail on GCC (even with -fnoexceptions and -Ofast), only on Clang. Maybe someone else has some info as to why this is?

There is a clear difference in the assembly output for the correct and incorrect implementations, with a testb instruction run much earlier in the correct version. You can see this at line 22 of the output, if you highlight it the && in the source is highlighted in bold.

tmiahm · 5 years ago

The difference is your example uses bool values, the code in question uses expressions that (should) evaluate to bool. && is short-circuited if a is false, such that b is not evaluated.

Deleted Comment

jeffbee · 5 years ago

This is clearly a typo that happened to stomp some logic that is not covered by a test. The review 3002282 includes the bug, which deletes one of the & from a && operator. It says it was cherry-picked from review 2994464, which does not change that line. So, slip of the backspace key plus poor testing.

tyingq · 5 years ago

Yeah, the biggest thing to me here is that there's no "login succeeded" unit test, integration test, manual test, etc. Whatever customer logged in first was the first person to actually run the code.

lmilcin · 5 years ago

Umm... no automated tests? Junior engineers responsible for security of half of the world without proper code review?

tpmx · 5 years ago

Half of the world? We're talking primarily north-american school kids.

https://trends.google.com/trends/explore?q=chromebook

sellyme · 5 years ago

>security of half the world

Chrome OS has less than 2% market share.

baud147258 · 5 years ago

I heard elsewhere that its market share is closer to 10% (and that it's now the second most popular desktop OS). 2% would be considering mobile OS too, I think

https://www.maketecheasier.com/chrome-os-tops-macos-2nd-most...