I've written bugs that would have been caught by the compiler in a memory-safe language. I think the last time was maybe in 2012 or 2013? I still write plenty of bugs today but they're almost all logic errors that nothing (short of AI tools) could have caught.
Doesn't seem like such a massive breakthrough when they are throwing so much compute at it, particularly as this is test time compute, it just isn't practical at all, you are not getting this level with a ChatGPT subscription, even the new $200 a month option.