jaschasd (u/jaschasd)

jaschasd commented on Images altered to trick machine vision can influence humans too deepmind.google/discover/... · Posted by u/xnx

pain2022 · 2 years ago

> If brain activations are insensitive to subtle adversarial attacks, we would expect people to choose each picture 50% of the time on average. However, we found that the choice rate—which we refer to as the perceptual bias—was reliably above chance for a wide variety of perturbed picture pairs

Ok, but the article doesn’t say what was the actual rate?

jaschasd · 2 years ago

It does! See the effect strength plot near the bottom of the blog post, or Figures 2b and 3d in the paper.

The effect strength on humans ranges from a few percent deviation of human judgements from chance for subtle adversarial perturbations (epsilon=2), to ~15% deviations of human judgement from chance for large magnitude perturbations in the largest magnitude experimental condition.

jaschasd commented on Images altered to trick machine vision can influence humans too deepmind.google/discover/... · Posted by u/xnx

_ogrr · 2 years ago

In case you were wondering what N was, their first experiment involved 16 undergrads psych students and the second experiment involved 12.

https://link.springer.com/article/10.3758/BF03206939

Edit: I believe this linked survey is not the subject of the OP.

jaschasd · 2 years ago

This comment is incorrect.

For experiments 1 through 4, N was 38, 389, 396, and 389. The subjects were not undergrad psych students.

The article linked in the parent comment does not correspond to any experiment in the blog post or the Nature Comms paper.

jaschasd commented on Overfitting and the strong version of Goodhart’s law sohl-dickstein.github.io/... · Posted by u/andromaton

jaschasd · 3 years ago

(the plot shows extreme overfitting with a 10 parameter model, and interpolation with a 10,000 parameter model)

jaschasd · 3 years ago

I can't reply directly -- is there a maximum thread depth, or a maximum conversation depth?

Anyway -- I wanted to apologize for misreading -- I missed the parenthetical "interpolation" in your comment. I think we are both interpreting the plot the same way.

In terms of your comment about anecdotal evidence -- are you talking about the case where data and model size are increased jointly? If so, I agree, though I don't think that is any longer cleanly to do with double descent/overparameterization.

jaschasd commented on Overfitting and the strong version of Goodhart’s law sohl-dickstein.github.io/... · Posted by u/andromaton

cs702 · 3 years ago

> It's true that I don't go into detail about double descent, though I do describe how increasing capacity often reduces overfitting.

I agree.

> I believe the figure labeled "Figure 1" illustrates what your are suggesting (despite being labeled Figure 1, it is actually at the bottom of the blog post, so maybe easy to miss).

Easy to miss, yes. I'm not sure it illustrates the phenomenon, though. That plot shows extreme overfitting (i.e., interpolation) by the 10,000 parameter model. No one really understands what actually happens after interpolation. There's in fact some anecdotal evidence that after crossing the interpolation threshold, large AI models trained with SGD gradually begin to ignore outliers and find simpler models (!) that generalize better (!). Counterintuitive, I know. This is an active area of research, with no good explanations yet, AFAIK.

jaschasd · 3 years ago

(the plot shows extreme overfitting with a 10 parameter model, and interpolation with a 10,000 parameter model)

jaschasd commented on Overfitting and the strong version of Goodhart’s law sohl-dickstein.github.io/... · Posted by u/andromaton

cs702 · 3 years ago

Thank you.

As you probably know, the big deal about double descent is that once sufficiently large AI models cross the so-called "interpolation threshold" in training, and get over the hump, they start generalizing better -- the opposite of overfitting. State-of-the-art performance in fact requires getting over the hump. As far as I can tell, you did not mention any of that explicitly anywhere in your post.

Also, all your plots show only the classical overfitting curve, not the actual curve we now see all the time with larger AI models like Transformers.

jaschasd · 3 years ago

It's true that I don't go into detail about double descent, though I do describe how increasing capacity often reduces overfitting.

I believe the figure labeled "Figure 1" illustrates what your are suggesting (despite being labeled Figure 1, it is actually at the bottom of the blog post, so maybe easy to miss).

jaschasd commented on Overfitting and the strong version of Goodhart’s law sohl-dickstein.github.io/... · Posted by u/andromaton

lqet · 3 years ago

I recommend reading "The Collapse of Complex Societies" by Joseph Tainter [0].

Complex societies tend to address problems by adding more and more rules and regulations, simply because they have always done so and it has been successful in the past. More importantly, though, it is typically the only tool they have. Essentially these societies are increasingly overfitting their legislature to narrow special cases until it cannot handle anything unexpected anymore. Such a society is highly fragile. I witness this firsthand every day in my own country. Living here feels like lying in a Procrustean bed.

[0] https://www.amazon.com/Collapse-Complex-Societies-Studies-Ar...

jaschasd · 3 years ago

Thanks! I'm also adding this to my reading list now.

jaschasd commented on Overfitting and the strong version of Goodhart’s law sohl-dickstein.github.io/... · Posted by u/andromaton

cs702 · 3 years ago

Except that AI models, especially large deep ones, do NOT overfit like the author thinks. They exhibit what is now called "deep double descent" -- the validation error declines, then increases, and then declines again:

https://openai.com/blog/deep-double-descent/

A question I've pondered for a while is whether complex systems in the real world also exhibit double descent.

For example, transitioning an online application that currently serves thousands of users to one that can serve millions and then billions requires reorganizing all code, processes, and infrastructure, making software development harder at first, but easier down the road. Anyone who has gone through it will tell you that it's like going through "phase transitions" that require "getting over humps."

Similarly, startups that want to transition from small-scale to mid-size and then to large-scale businesses must increase operational complexity, making everything harder at first, but easier down the road. Anyone who has been with a startup that has grown from tiny two-person shop to large corporation will tell you that it's like going through "phase transitions" that require "getting over humps."

Finally, it may be that whole countries and economies that want to improve the lives of their citizens may have to go through an interim period of less efficiency, making everything harder at first, but easier down the road. It may be that human progress involves "phase transitions" that require "getting over humps."

jaschasd · 3 years ago

Blog post author here.

A brief note that I do discuss the deep double descent phenomenon in the blog. See the section starting with "One of the best understood causes of extreme overfitting is that the expressivity of the model being trained too closely matches the complexity of the proxy task."

I avoided using the actual term double descent, since I thought it would add unnecessary complexity. Lesson learned for next time -- I should have at least had an endnote using that terminology!

jaschasd commented on Overfitting and the strong version of Goodhart’s law sohl-dickstein.github.io/... · Posted by u/andromaton

TomSwirly · 3 years ago

This was a great article - I'm sending it to a couple of my friends.

Might I ask what software you used for blogging? I can't seem to find the source repo it came from...

jaschasd · 3 years ago

See response to alexmolas -- I'm using GitHub Pages + Jekyll + Markdeep. You don't see the source repo because it's private, but I'm happy to share a code snapshot with you if you like -- email me for it.

jaschasd commented on Overfitting and the strong version of Goodhart’s law sohl-dickstein.github.io/... · Posted by u/andromaton

vminvsky · 3 years ago

What about double-descent :D

jaschasd · 3 years ago

Yes! In the post I talk about both under- and over-parameterization being mitigations for overfitting, though I don't use the term double descent.

jaschasd commented on Overfitting and the strong version of Goodhart’s law sohl-dickstein.github.io/... · Posted by u/andromaton

langsoul-com · 3 years ago

The strong version of Goodharts law seems occur as a function of time.

In the beginning, everything is fine and dandy, but as people optimise, it begins to turn into extremes.

jaschasd · 3 years ago

+1. It doesn't require there to be a time axis -- but in practice, we almost always optimize incrementally, so it takes a while for the strong version of Goodhart's law to kick in.