I haven't seen it mentioned yet on Hacker News, but Germany is about to pass the "Gesundheitsdatennutzungsgesetz". This bill allows anonymized access for researchers to all German health care data. Data from over 83 million people.
The example given by the health minister for how this might play out: Researchers find an interesting pattern in the data. They request the ministry to ask the matched people for permission to become participants in a study or direct access. If permission is given, the anonymization is lifted in part and a study could move forward. This alone would make Germany a pretty fascinating place for future AI research.
Sounds perhaps a bit naive, as one might expect from German government, when it comes to data. How much of such data does one need to de-anonymize someone? How easy will it be to accidentally slip identifying data in using whatever kind of system, that is supposed to be used? How easy will it be for any company like MS to correlate that data with all that other data they extracted without user consent?
It is important to note the the original draft for the law does not talk about anonymizing data, but rather pseudonymizing of data. So no attemt is made at keeping identities of patients truly anonymous (which has been repeadeatly proven to be impossible in sparse datasets).
>How easy will it be for any company like MS to correlate that data with all that other data they extracted without user consent?
Doable to some extent, but would they really learn much that we haven't already told them, given our propensity to Google for symptoms and diseases?
Personally, my worry here, if I had some embarrassing medical history that I wouldn't want people to know about, would be some malicious party gaining access to de-anonymized data and using it to blackmail, or just simply making it public.
Edit: Come to think of it, insurance companies could probably have a field day with a data set like this.
"...We here propose a generative copula-based method that can accurately estimate the likelihood of a specific person to be correctly re-identified, even in a heavily incomplete dataset. On 210 populations, our method obtains AUC scores for predicting individual uniqueness ranging from 0.84 to 0.97, with low false-discovery rate. Using our model, we find that 99.98% of Americans would be correctly re-identified in any dataset using 15 demographic attributes. Our results suggest that even heavily sampled anonymized datasets are unlikely to satisfy the modern standards for anonymization ..."
The companies and researchers get access to the data with oversight but the data isn't shared. They basically get access to a server to run their analysis but can't retrieve the data.
Sweden is usually ranking vet high on health care, both in results and research/innovation. One big reason for this is that we very early on gathered a lot of medical statistics on the population through the "health ministry", which have later been used in research. All this have fostered and cultivated innovation. The biggest victory however is the benefit of the results it had brought the population (and the world).
If done with care this can have a big impact! Very exciting Germany is doing this!
My medical data is affected and I can't disagree more.
As well pharma is not interested in finding cures, for obvious reasons. I am getting all the downsides with no of the upsides.
Next thing you see is letter from Rundfunkbeitrag suggesting to pay for the whole year in advance because your cancer doesn't have good prognosis. Nothing personal or fraudulent you know... it will be convenient.
It can't be trivially unmasked. If approved researchers get controlled access to the data with oversight, but the data itself isn't shared with the companies.
What exactly does it mean to invest 3.2B€ in Germany? Does it mean this is actually spend in Germany - as in for goods and services produced in Germany, or does it mean they order an astonishing amount of GPUs at Nvidia and put them into the Frankfurt region because that's just one of the most popular regions latency wise to serve Europe?
It's on a downwards trend again after the spike created by the war [1]. However, I don't think this is a big factor, since Microsoft is the second-biggest buyer of renewable energy, so other prices apply.
That second link … it’s depressing how a site about renewable _energy_ writes an article about cumulative _energy_ consumption but consistently fails to tell the difference between energy and power.
Germany is in the middle of Europe? 3rd largest economy (by GDP) in the world? Lots of customers nearby? It has one of the world-wide largest Internet exchange nodes in Frankfurt (-> DE-CIX)? It helps to adhere to EU (-> GDPR) and German regulations having data local, when wanting to serve EU customers?
> It helps to adhere to EU (-> GDPR) and German regulations having data local
Not this again. The moment you transmit data to European based servers under control of US corporation you could just as well send it straight to the US. Same difference.
Nobody cares if AWS, Azure or GCP have EU datacenters. They are for most part understood as under US control.
Electricity for the industry is cheaper, they are freed from some taxes (not sure which one) and obviously VAT is deductible which is not the case for end consumers.
Regardless of that it's a myth anyway. Germany has a very competitive electricity market, and it basically works by having all the costs on the actual electricity bill (will not be 100% true in the coming years, but it was until now). Some other countries have cheaper electricity on paper, but cover the real costs with government money / taxes. France is the most obvious example as a neighbouring country.
Check your provider's current rate, it's probably closer to 30 cents. Then call them and ask if they want you to switch at the next opportunity or if they're willing to hoist you over into the new rate.
They even backdated my contract change by a few months when I did that.
Germany had many big external investment + government subvention projects involving factories, data centers etc. announced in the recent years. None of them materialized and they got canceled because of the German balanced budget amendment [0].
Some projects are a mix of private and public investments. A freeze on the public portion affects the risk/trade-offs of the private investment portion and therefore on the overall project.
This might be good strategically from the perspective of EU companies where the physical location of operations can matter a lot depending on your field of business, e.g. NIS 2 Directive for energy companies and other sensitive infrastructure.
But they'll face competition too of course! I can't wait to see more development from European actors and projects like Mistral, TrustLLM, and also GPT-SW3 for the Nordics! There's much in motion on the European stage right now.
They don't face much competition for datacenter locations in Europe. Germany has a really good location being relatively centered to the continent depending on the city, and DE CIX as a gigantic internet exchange. Most datacenters are located around that exchange. The only two locations that come close to Frankfurt am Main in datacenter count to my knowledge are Paris and Amsterdam, but neither has that geographical advantage for latency.
Meanwhile, at the actual company, we are laying off German employees in droves because the labor laws are so inflexible there (or so the meme internally goes)
The example given by the health minister for how this might play out: Researchers find an interesting pattern in the data. They request the ministry to ask the matched people for permission to become participants in a study or direct access. If permission is given, the anonymization is lifted in part and a study could move forward. This alone would make Germany a pretty fascinating place for future AI research.
Doable to some extent, but would they really learn much that we haven't already told them, given our propensity to Google for symptoms and diseases?
Personally, my worry here, if I had some embarrassing medical history that I wouldn't want people to know about, would be some malicious party gaining access to de-anonymized data and using it to blackmail, or just simply making it public.
Edit: Come to think of it, insurance companies could probably have a field day with a data set like this.
"Estimating the success of re-identifications in incomplete datasets using generative models" - https://www.nature.com/articles/s41467-019-10933-3
"...We here propose a generative copula-based method that can accurately estimate the likelihood of a specific person to be correctly re-identified, even in a heavily incomplete dataset. On 210 populations, our method obtains AUC scores for predicting individual uniqueness ranging from 0.84 to 0.97, with low false-discovery rate. Using our model, we find that 99.98% of Americans would be correctly re-identified in any dataset using 15 demographic attributes. Our results suggest that even heavily sampled anonymized datasets are unlikely to satisfy the modern standards for anonymization ..."
If done with care this can have a big impact! Very exciting Germany is doing this!
It will exponentially reduce time to cures for various diseases. The amount of red tape in academia and pharma to do simple data studies is wild...
Everyone outside of German has to face death and taxes. In Germany you face death, taxes and Rundfunkbeitrag.
And it's a legal challenge to make sure that everyone understands how and why they should opt out, even my grandma.
Deleted Comment
No. Unless you don't want medical treatment.
Sounds like MS has just bought a relatively easy access to ~83m people's health data. For €3.2B.
That's €39 per person (tax included).
[1] https://www.statista.com/statistics/1346782/electricity-pric...
[2] https://www.renewableenergyworld.com/news/bloombergnef-corpo...
Not this again. The moment you transmit data to European based servers under control of US corporation you could just as well send it straight to the US. Same difference.
Nobody cares if AWS, Azure or GCP have EU datacenters. They are for most part understood as under US control.
Regardless of that it's a myth anyway. Germany has a very competitive electricity market, and it basically works by having all the costs on the actual electricity bill (will not be 100% true in the coming years, but it was until now). Some other countries have cheaper electricity on paper, but cover the real costs with government money / taxes. France is the most obvious example as a neighbouring country.
They even backdated my contract change by a few months when I did that.
[0]: https://de.wikipedia.org/wiki/Schuldenbremse_(Deutschland)
But they'll face competition too of course! I can't wait to see more development from European actors and projects like Mistral, TrustLLM, and also GPT-SW3 for the Nordics! There's much in motion on the European stage right now.
Interestingly, Tier 1 transit pricing is increasingly competitive and DE-CIX access is more expensive (relatively speaking) in recent years.