Google is a behemoth with multiple products and a lot of people with opinions who you have to get through to launch a product.
Also, OpenAI has not unseated Google's dominance in search nor do I see this happening.
It's propietary company evaluation data, and it's for a specific domain related to software development, a domain that OpenAI is actively attempting to improve performance for.
Anyways enjoy your evening. If you want to actually have a reasonable discussion without being unpleasant I'd be happy to discuss further.
"Teaching the test" (aka overfitting of human students at the expense of "real" learning) is a common complaint about our current education system.
Do you think it doesn't "deserve" an A here?
The OP's post was saying it's somehow able to solve something new. It's showing a severe misunderstanding how how language modelling works.
None of the papers or blogs you've shared offer any points that actually rebutt what I'm saying.
And yes, we will eventually have them work in real time. Can't wait.
You've managed to essentially say nothing of substance. So it passes because structure and concepts are similar. okay. are students preparing for tests working with alien concepts and structures then because i'm failing to see the big difference here.
A model isn't overfit because you've declared it so. and unless GPT-4 is several trillion parameters, general overfitting is severely unlikely. But i doubt you care about any of that. Can you devise a test to properly asses what you're asserting ?
Your post says nothing of substance because it offers no substantial rebuttal and seems to just attack a position by creating a hand-waved argument without any clear understanding of how parameters in-fact impact a model's outputs.
You also completely missed my point.
It's clear that whatever tests he writes cover well established and understood concepts.
This is where I believe people are missing the point. GPT4 is not a general intelligence. It is a highly overfit model, but it's overfit to literally every piece of human knowledge.
Language is humanities way of modelling real world concepts. So GPT is able to leverage the relationships we create through our language to real world concepts. It's just learned all language up until today.
It's an incredible knowledge retrieval machine. It can even mimick how our language is used to conduct reasoning very well.
It can't do this efficiently, nor can it actually stumble upon a new insight because it's not being exposed in real time to the real world.
So, this professors 'new' test is not really new. It's just a test that fundamentally has already been modelled.
1 - are you planning to let people write their own prompts?
2 - when will you share the model names?