Now this was for the public good and we were going to fund the technology to display the data, and they would provide the data. This way people could assess how much various drugs could help them and what the outcomes were.
It was also thought that other researchers could find patterns in the data.
Suddenly the not-for-profit institute got cold feet because they would be "giving away" the data they had spent millions to acquire. Meanwhile we, a for-profit institute, were happy to fund our share as a public good.
They decided that, instead of giving away their data, they would give away simulated data. This, it was felt, would benefit the patients and researchers who might draw conclusions from the data.
Now these are phds at the top of their field. But, you know, its sort of obvious that all they would do is reproduce their biases and make it so that no one else could challenge those biases. I mean, for you data science types, this is 101.
Ever since that experience, I have a distrust of simulated data.
* Have a set of data as a "basis"
* No or small training set to use There are problems where generating data can work, but they're specific problems or can only be used for rare edge-cases that don't show up enough in a dataset. For the most difficult problems it is probably just as difficult to generate "correct" data as it is to generate a model without real-world data.Just a thought, not sure what's really going on there, I just know that they probably have something interesting they're cooking up!
This is a really crazy vision