mantapoint (u/mantapoint)

sinenomine · 3 years ago

Ironic, since the relatively recently discovered double descent makes it clear that bias-variance tradeoff as we know it from statistical learning theory simply doesn't apply to "overparameterized" deep models.

Much of old theory is barely applicable and people are, understandably, bewildered and in denial.

If someone were to be inclined to theory, I'd just recommend reading papers that don't try oversimplify the domain:

https://arxiv.org/abs/2006.15191

https://arxiv.org/abs/2210.10749

https://arxiv.org/abs/2205.10343

https://arxiv.org/abs/2105.04026

gaspb · 3 years ago

I don't believe it's oversimplifying the domain. Typically the reference I pointed to has a section dedicated to double descent (sec 11.2). You may also be surprised that such phenomenon can be observed on toy convex convex examples from "old theory" (sec 11.2.3), as you call it.

Anyways, I still believe that learning foundational stuff such as the bias-variance tradeoff is useful before diving to more advanced stuff. I even think that tackling recent research question with old tool is insightful too. But that's only my opinion, and perhaps I'm in denial :)