Likewise in your metric, if all answers are the same despite perturbations then it's more likely to be ... true?
I'd really like to see a plot of your metric versus the SimpleQA hallucation benchmark that OpenAI uses.
Likewise in your metric, if all answers are the same despite perturbations then it's more likely to be ... true?
I'd really like to see a plot of your metric versus the SimpleQA hallucation benchmark that OpenAI uses.
Those aren't the right metrics. First, capacity factor is an approximation of "fraction of maximum", not reliability at all, which is a whole-grid measurement.
Find the data for any one gas plant. How often was it producing (emphasis in the original) no power? I'll bet anything that most plants are offline quite a bit more than 10%, precisely because demand itself is variable and gas is the easiest plant to bring up and down. Yet you call one a "reliability" metric and the other not, why?
In fact as a whole, German wind power has been exceedingly reliable. Wind power everywhere has been exceedingly reliable. The world as a whole has been building out wind like crazy over the last decade (because it's cheap and great) and... I'm not aware of even one instance of a "calm day blackout". Not one. Have a cite for that?
Uk has wind capacity factor "long-term average of around 27%". https://www.eci.ox.ac.uk/publications/downloads/sinden06-win...
The reason is that wind generation is optimal during a certain wind speed, and less or no power is generated if winds are too slow or too fast. And wind power blackout occurs not only during calm days, but also during very stormy days. In total there is plenty of occurrences when a specific area has no wind at all. The correlation in weather can be seen in wind farms as far as 800 miles apart. https://www.eci.ox.ac.uk/publications/downloads/sinden06-win...
2021 was a year of very low wind speeds across whole northern europe. https://climate.copernicus.eu/esotc/2021/low-winds
Additionally, wind power may be going down in strength... due to climate change https://www.ft.com/content/d53b5843-dbe0-4724-8adf-75c66127e...
If all the people of india and china were to maintain a similar way of life as mine, the eco system would collapse right away.
Just a point that not all science can have empirical and reproducible study.
Let me throw a tiny wrench into your logical reasoning: I can't reproduce most results, does it mean most results are not science?
Absence of evidence is not evidence of absence. You should not expect scientific experiments to be replicated every time.
Without art or literature that draws out the terror I don't think most people can really envisage the danger we're all being put in. Without popular consciousness of the problem, it's all the more likely to happen.
This wasn't coordinated between Jeff Geerling and myself. However, I did mention the post in the Bluesky thread that Jeff was included in. [0]
I concluded the piece with “[t]his space is ripe for disruption”. That was a really poor choice of words. I've since updated the piece to better match what I was trying to say. Diffs are available. [1]
On YouTube: as I mention in the piece, I think the service is excellent as a consumer, and I pay for Premium.
This piece was mostly written because I've been frustrated that YouTube is effectively the only place for user submitted video on the internet. I wasn't going to write anything until I saw the video from RedLetterMedia that I mentioned in the post. They have a huge following and were blaming something that might be related? Or might not? It's really hard to tell! I'm not a YouTube creator, but I assume having metrics that determine your livelihood shift out from under you as a creator must feel awful.
[0] https://bsky.app/profile/gavin.anderegg.ca/post/3lyeayuckv22...
[1] https://github.com/gavinanderegg/gavinanderegg.github.io/com...
Well, technically there's lots of user submitted videos posted to p*rn sites... Apparently even started posting educational videos there, like math and neural networks and stuff.