First look at Apple/Google contact tracing framework

Note that years ago, Moxie has studied a similar problem of how to let users know if their contacts use Signal or not without uploading the whole address books like e.g. WhatsApp does [0]. It's similar because in both instances you want to "match" users in some fashion using a centralized service while keeping their privacy.

He ruled out downloads of megabytes of data (something that the Google/Apple proposal would imply) and couldn't find a good solution beyond trusting Intel's SGX technology, arguably not really a good solution but better than not adopting it at all [1].

You have kind of a computation/download/privacy tradeoff here. You can increase the time interval of the daily keys to weeks. Gives you less stuff to download but the devices have to do more hashes to verify whether they have been in contact with other devices. You can increase the 10 minutes to an hour. That means less privacy and more trackability, but also less computation needed.

My guess to why Google/Apple didn't introduce rough location (like US state or county) into the system was to prevent journalists from jumping onto that detail and sensationalizing it into something it isn't (Google/Apple grabbing your data). Both companies operate the most popular maps apps on the planet as well as OS level location services that phone home constantly so they are already in possession of that data.

[0]: https://signal.org/blog/contact-discovery/

[1]: https://signal.org/blog/private-contact-discovery/

olliej · 5 years ago

Increasing the lifetime for what are currently "daily keys" reduces the precision of the contact reporting - e.g. your example of a week means that a positive user would need to report at least 3 weeks of keys, so someone can now do correlation over 3 weeks instead of X days.

There's no inclusion of location data as that has no value - the only thing that this protocol cares about was whether you were in the vicinity of someone who has tested positive for cover-19, and so suggest you get tested. Knowing where you are/were has no value for that purpose.

rrrrrrrrrrrryan · 5 years ago

I think he was trying to say you could reduce the computation by narrowing the space-time radius, then searching for matches. Even a state-level restriction would be enough to substantially narrow down the possible matches without sacrificing anonymity.

toohotatopic · 5 years ago

You don't need full SGX if you trust the provider.

People already trust providers with their medical data. Why not trust some computation service to do the matching? This is a moment for trustworthy institutions to create data centers and get customers by their reputation.

Combine a big market of trustworthy providers and SGX, and abuse becomes much more difficult.

doomrobo · 5 years ago

To answer your question: the handling of medical data is governed by HIPPA. Everything else (outside banking data) in the US (outside of California) is pretty much fair game.

lilyball · 5 years ago

> My guess to why Google/Apple didn't introduce rough location (like US state or county) into the system was to prevent journalists from jumping onto that detail and sensationalizing it into something it isn't (Google/Apple grabbing your data). Both companies operate the most popular maps apps on the planet as well as OS level location services that phone home constantly so they are already in possession of that data.

Apple is not in possession of the location of your phone. Their mapping system is designed to keep all queries to the servers anonymous using random rotated identifiers, even going so far as to keep the server from being able to see the full route from start to end (IIRC it's broken up into at least two chunks that are requested separately, though I don't know the details).

est31 · 5 years ago

Do you mean this?

> To protect user privacy, this data is associated with an identifier that rotates at the conclusion of a trip, not with the user’s Apple ID or any other account information. Rotating the ID at the conclusion of the trip makes it harder for Apple to piece together a history of any user’s activity over time.

https://www.apple.com/privacy/docs/Location_Services_White_P...

I think it's a nice gesture, however I wouldn't say that Apple isn't in possession of that data. The phone already uses other Apple services that are linked to your Apple ID and those services tell your IP address to Apple. Even if Apple can't track you via the rotating ID (not sure how it's made, maybe they actually can't), your IP address will reveal you, at least as long as you are using ipv6 which Apple has been heavily pushing in the past years.

They might not have the data refined, but even the whitepaper says it only makes piecing together the location history harder, not impossible.

tinus_hn · 5 years ago

The proposed system requires download of 16 bytes per infected user per day. Unless this really gets out of hand that’s not in the megabytes range.

jeremyjh · 5 years ago

Yes, this is where OP lost me.

> Published keys are 16 bytes, one for each day. If moderate numbers of smartphone users are infected in any given week, that's 100s of MBs for all phones to DL.

"Moderate" rate of infections is not millions of new cases per week worldwide. That would be such a catastrophe that contact tracing would be useless.

Deleted Comment

Again, this solution _cannot_ work and it is a _threat_ to a permanent loss of privacy.

This is like the government and the adtech companies sleeping in the same bed, without any other power opposition in the balance.

1) The "solution" is created by a monopoly of 2 american private corporations.

2) It can only work reliably if everyone wear an (Apple or Android) phone at all time, and consent to give data

3) You are not necessarily infected if you cross an infected in the street at 5 meters. This will have too many false positives and give fuzzy information to people

4) It doesn't help people who are infected and _dying_

It just _doesnt make sense_. To me, it looks like electronic voting, but worse. No one can understand how it works, beside experts.

Today it is reviewed, but then the app will be forgotten and updated in the background with "new features" for adtech.

We are forgetting what we are fighting : a biological virus. All effort should go toward understanding the biological machinery of the virus and the hosts, in order to _cure_ the virus. We should be 3D printing ventilators, analysing DNA sequences, build nanorobots and synthesis new molecules.

fabian2k · 5 years ago

From looking at the specification, I don't see any serious loss of privacy there, if this is implemented as stated.

2) You don't need 100%, you only need enough to drop the R0 below 1. You'll likely need a majority of people using this, which is hard enough, but you don't need everyone using it.

3) The apps are not supposed to include every single registered contact, only contacts that are over a bit longer timeframe. A typical value I've heard is 15 minutes close contact, that is considered a high risk contact when contact tracing.

vermilingua · 5 years ago

Minor nitpick: the R0 is the basic reproduction number, it describes the infectivity if no measures are taken to control the spread.

You’re looking for R, the effective reproduction number, which is R0 plus all controls.

dbbk · 5 years ago

> 1. You'll likely need a majority of people using this, which is hard enough, but you don't need everyone using it.

It's going to be built into iOS and Android at the operating system level, and I assume have a very clear prompt to opt-in. It would not surprise me if it quickly reaches >50% of active users, at least for iOS.

Getting a timely Android update on the other hand...

tinus_hn · 5 years ago

So what are you doing here posting, why have you not solved all the problems already? Get to work!

adamweld · 5 years ago

1) and 2) - the fact that Google and Apple have what is essentially a monopoly on smartphone software is exactly what makes this a good approach. it's the easiest way to reach a high percentage of the population.

3) false positive are a hell of a lot better than having no way to trace back contacts while someone was asymptomatic but contagious.

4) it helps stop others from becoming infected and possibly dying. how is that not a good thing?

> We should be 3D printing ventilators, analysing DNA sequences, build nanorobots and synthesis new molecules.

3D printing ventilators is a horrible idea, and everything else towards a vaccine takes _time_. This is something that can be rolled out today and that will help the situation. You can uninstall the app when this is over.

Dahoon · 5 years ago

As both are untrustworthy American corporations no matter what they do or say it will always be a huge privacy issue. I would go back to my old SE810i phone the instant this was forced on iOS and Android users. People are already doing this (especially young people) so this will be apple and google shooting themselves in the foot.

antpls · 5 years ago

> 4) it helps stop others from becoming infected and possibly dying. how is that not a good thing?

The virus will always be here, we cannot hide forever, we must find a way to cure it or reduce its biological effect. Once covid19 goes away (if ever), and a new virus appears, NO ONE will have that app turned on, and by then, the new virus will have spread just like covid19.

I have a very simple solution to win time : total confinement of people of more than 60 years old when a new virus is detected, and wash hands.

Also check hemo2life, which is an example of what we could do in terms of medicine

jhoechtl · 5 years ago

I am so terribly frightened by that move I am seriously considering getting rid of Android. Of what I have heard it's going to be backed into the OS and not installed as an app I could de-install / block, right?

What truly open Smart phone OSes are available besides Android and iOS?

tcd · 5 years ago

None. If you're concerned, do not carry a spy around with you.

I know, it's a scary thought compared to 30 years ago, but it's possible.

tobylane · 5 years ago

Librem is the usual answer, but aren’t there other, existing, baked in parts of Android that compromise your privacy? It won’t change much because of this project.

https://en.wikipedia.org/wiki/Librem_5

hn_throwaway_99 · 5 years ago

Regardless of the technical issues with this, I think the "prank" issue Moxie brings up is much more serious. We've already seen the phenomenon of "Zoom bombing", I can imagine "tracer bombing" would be a much more serious issue. The only way I could see this working is that if when you enter a positive result you have to enter some sort of secret key from the testing authority, but that's totally not tenable given a lot (most?) testing these days is from private providers.

Reelin · 5 years ago

Why wouldn't the patient provide their framework info (if they so chose) at the time of sample collection? Then the medical authority could report it to the local government on the patient's behalf in the event of a positive test. Other end users then decide which (if any) "reporting authorities" to pull data from and check against.

This also seems to address Moxie's concern about public location data being necessary (unless I've missed something). If I only pull all the positive tests from my local county or state, that should hopefully be a small enough dataset to be manageable even on fairly resource constrained low end devices.

ehsankia · 5 years ago

My understanding too was that there was a middleman involved in collecting and distributing the keys, to avoid people spamming the system. You want to be 100% sure it's a positive, and not put the trust in the user. Otherwise random people could just say they have it. The local government would have to submit the keys as you mention and act as moderators for that region.

The media reports about the german version of this include getting a one-time code from the health authorities that you have to enter into the app to mark yourself as infected.

As far as I understand, the proposal from Google and Apple is about the underlying framework, but you can set up additional controls a level above in the app and the server infrastructure. So it's likely by design that it doesn't address the issue as the solutions to ensuring only verified cases can trigger alerts must be specific to the local circumstances.

VectorLock · 5 years ago

Looking at the Google doc it looks like they're going to restrict it to some "medical authorities"

"In order to be whitelisted to use this API, apps will be required to timestamp and cryptographically sign the set of keys before delivery to the server with the signature of an authorized medical authority."

krcz · 5 years ago

Don't these providers need to be registered somewhere? It should be easy to reach them and provide with either code generator software or even printed one-time codes for database addition.

Why use a centralized model? Allow users to subscribe to a data source so that any entity can push their own dataset.

This is also important because it keeps the framework usable under a variety of adverse and unusual circumstances. An aid organization operating in a disaster zone or impoverished area could make use of such a framework without needing permission from a higher authority or even reliable internet access to the outside world.

tastroder · 5 years ago

Many of the issues moxie brings up either don't apply universally or are unrelated to the part this specification touches upon.

Maybe it helps to bring up a non US perspective here: in Germany, like many other European countries, this becomes a non-issue. We have central authorities that can greenlight a positive test result or invalidate wrong results, immediately making the prank argument completely hypothetical. The question as to why this should be centralised is easy, because it already is. I'd honestly expect the reporting chain in the US not being to dissimilar from this, at least at the state level.

It's also important to note that all of this only supplements the existing, regularly manual, workflow of contact tracing. A very laborious and error prone task, especially in regions with a large number of infections. These techniques take a massive load off of a certain part of the health system that is notoriously underdeveloped because it's not really needed in this quantity in normal times.

krisoft · 5 years ago

> in Germany, like many other European countries, this becomes a non-issue.

What do you mean by that? The protocol, as published, doesn’t have a role for the central authority. Even if the German state knows that mrSick tested positive and mrPrankster did not, how would the diagnosis server reject the keys published by mrPrankster? They are by design resistant to de-anonymization. In fact the German state can’t even know if a specific key reported as positive for covid belongs to a german resident or not.

JMTQp8lwXL · 5 years ago

It also assumes that if you get a notification of having been near a COVID positive individual that you'll be prudent enough to self isolate.

sneak · 5 years ago

It doesn’t need to be everyone involved in opting in to tracing, or everyone notified to comply with the self-isolation recommendations to reduce the r0. It works even with only partial penetration of the populace.

awinter-py · 5 years ago

singapore centralized the reporting with TraceTogether / bluetrace, and call out fraud risk in their whitepaper

> So first obvious caveat is that this is "private" (or at least not worse than BTLE), until the moment you test positive. > At that point all of your BTLE mac addrs over the previous period become linkable.

Linkable over the period of 14 days. Or even linkable during one day - each day means new key, so linking between these might be attempted only on basis on behavioral correlations.

What to do with such data? Microanalysis of customer behaviors? It won't be possible to use such data for future customer profiling, as it won't be possible to match the history with identifiers after the infection. This data is practically worthless.

Yes that's the point...?

Let's just answer these

* Use stationary beacons to track someone’s travel path

Doesn't work because there's no externally visible correlation between reported identifiers until after the user chooses to report there test result.

* Increased hit rate of stationary / marketing beacons

Doesn't work because they depend on coherence in the beacons, and the identifiers roll every 10 or so minutes. Presumably you'd ensure that any rolling of the bluetooth MAC also rolls the reported identifier.

* Leakage of information when someone isn’t sick

The requests for data simply tell you someone is using an app - which you can already tell if they're using app.

The system can encourage someone to get tested, if your app wants to tell people to get tested, then FairPlay to that app (though good luck in the US).

- Fraud resistance

Not a privacy/tracking concern, though I'm sure devs will have to do something to limit spam/dos

FartyMcFarter · 5 years ago

> Doesn't work because there's no externally visible correlation between reported identifiers until after the user chooses to report there test result.

So you're saying it works after the user reports their test result.

I'm not sure what you're saying "works" here.

To be very very clear

* The only things published by someone when they report a positive test result are the day keys for whatever length of time is reasonable (I assume ~14 days?)

* Given those day keys it is possible for your device to generate all the identifiers that the reporter's device would have broadcast.

* From that they can go through their database of seen identifiers and see if they find a match.

That means your device can determine when you were in proximity to the reporter, so it would in theory be able to know approximately where the contact happened, but you can't determine anything beyond that.

The server that collects and serves reported day keys doesn't have the list of identifiers any devices have encountered, so it can't learn anything about the reporters from the day keys they upload.

Let's say there's a passive fixed beacon (whatever) in a public space, it can't connect the identifiers to any specific device either, but you could see it being a useful public health tool - "we saw carriers at [some park] at [some times]". It still would not know which specific devices were reporting those keys. Even if that device went through after the day keys were published there's no way to know that its a device that's been seen before.

Only the server is able to link published day keys together because it receives them, so presumably knows who published those. The spec explicitly disallows an implementation from doing this, but assumes a malicious server, so it works to ensure that the only information it can get are day keys with no other information.

Is there an official document somewhere?

Also, how does it compare to DP-3T? (https://github.com/DP-3T/documents) (https://ncase.me/contact-tracing/)

Edit: Apple's preliminary specification was linked in another HN comment. (https://covid19-static.cdn-apple.com/applications/covid19/cu...)

More technical links in here: https://news.ycombinator.com/item?id=22836871

pferde · 5 years ago

What's it with people making long, split-up twitter threads like this? They're cumbersome and hard to read. Be an adult, write and publish an article on your blog.

It feels weird having to criticize Marlinspike about this, but stupid practices are stupid no matter how prestigious the person doing them is.

searchableguy · 5 years ago

Because it gets more visibility on twitter than a blog and is already something many people using Twitter do.

You can use threaderapp to get a blog post out of it.

femto113 · 5 years ago

The system doesn't need to ship every key to every phone, much more compact structures like Bloom filters could be used instead. If we assume about 1000 positives per day and each positive uploading 14 days of keys at 4 keys per hour that's a bit over 1 million keys per day. A Bloom filter with a false positive rate of 1/1000 could store that in about a megabyte. Phone downloads the filter each day and checks its observed keys, and only needs to download the actual keys if there's a potential match.

The main issue of bloom filters is this:

> only needs to download the actual keys if there's a potential match.

One of the design constraints of the service was that it should not know your (suspected) infection status unless you give consent that it should be shared.

> Matches must stay local to the device and not be revealed to the Diagnosis Server.

https://covid19-static.cdn-apple.com/applications/covid19/cu...

The better the bloom filter is, the more likely it is that you have actually been in contact with a key if the bloom filter is positive.

Furthermore, the bloom filter has to deal with a lot more keys. In fact, in your example of 1000 positives per day uploading 14 days of keys you only need to upload 14 keys as they only rotate once per day. At 16 bytes per key (as the link above specifies), you'd have to download 14 * 1000 * 16 = 224kb, much less than the bloom filter needs. And this scheme can tell you with 100% certainty whether there has been a match or not, so at least in your example it's much better than bloom filters.

The scalability issues that exist only manifest themselves at larger numbers than 1000 infections per day, say upper tens to lower hundreds of thousands where it starts becoming a problem.

So yes, rough location as moxie suggests is the best method to improve the scheme. Instead of checking the IDs of people thousands or hundreds of km away from you, you could just check the IDs of people in your US state or county. But it has to be smart enough to recognize movement, as in, you need to upload/download all areas you've been in and people living at the borders automatically stand out because they download two or three areas.

I see lots of ways to mitigate requesting the keys from disclosing much information:

- could set the false positive rate higher than the chance of encountering a case in the wild (which makes it smaller)

- phone could be programmed to sometimes randomly request keys even when the filter doesn't match

- keys could be distributed across many static mirrors and your phone could pick one at random if the filter matched

Nothing prevents the user from pushing the checking to some trusted service as well, if they so choose. If they trust the service then they'd upload their seen keys to a checking service, rather than downloading the whole set of diagnosis keys. The important part is the decision is in their hands.

Bloom filters could work that direction as well: phone produces a filter of observed keys and uploads it to the service, service checks all positive keys to see if they're in the filter. I think the main point of doing the checking on the phone is that way you're the only one who knows if you've been exposed.

You need just one key per day, 15-minutes ids can be generated based on this. Bloom filter might be still useful though.