Readit News logoReadit News
rafiki6 commented on Show HN: ModelKombat – Arena-style battles for coding models   astra.hackerrank.com/mode... · Posted by u/rvivek
rafiki6 · 3 months ago
Pretty fun! A few questions:

1 - are you planning to let people write their own prompts?

2 - when will you share the model names?

rafiki6 commented on Ask HN: How did Google lose so much ground to OpenAI?    · Posted by u/needadvicebadly
rafiki6 · 3 years ago
Because OpenAI is a single team with a singular focus.

Google is a behemoth with multiple products and a lot of people with opinions who you have to get through to launch a product.

Also, OpenAI has not unseated Google's dominance in search nor do I see this happening.

rafiki6 commented on GPT-4 Takes a New Midterm and Gets an A   betonit.substack.com/p/gp... · Posted by u/bumbledraven
pama · 3 years ago
I’m working in a related area and I’m rather curious about this point. In what way is GPT-4 overfit? Does overfit in this context mean the conventional: validation loss went up with additional training, or something special?
rafiki6 · 3 years ago
More specifically validation loss is irrelevant when you can't even sample out of distribution anymore.
rafiki6 commented on GPT-4 Takes a New Midterm and Gets an A   betonit.substack.com/p/gp... · Posted by u/bumbledraven
famouswaffles · 3 years ago
Oh several benchmarks ? Wow. Please do tell what these benchmarks were and how you evaluated them. Should surely be easy enough to replicate.
rafiki6 · 3 years ago
You seem to have a serious attitude problem in your responses so this is my last one.

It's propietary company evaluation data, and it's for a specific domain related to software development, a domain that OpenAI is actively attempting to improve performance for.

Anyways enjoy your evening. If you want to actually have a reasonable discussion without being unpleasant I'd be happy to discuss further.

rafiki6 commented on GPT-4 Takes a New Midterm and Gets an A   betonit.substack.com/p/gp... · Posted by u/bumbledraven
xyzzy123 · 3 years ago
Don't students prepare for tests by studying past instances of them?

"Teaching the test" (aka overfitting of human students at the expense of "real" learning) is a common complaint about our current education system.

Do you think it doesn't "deserve" an A here?

rafiki6 · 3 years ago
Did I say that?

The OP's post was saying it's somehow able to solve something new. It's showing a severe misunderstanding how how language modelling works.

rafiki6 commented on GPT-4 Takes a New Midterm and Gets an A   betonit.substack.com/p/gp... · Posted by u/bumbledraven
tehf0x · 3 years ago
Ah the good old "it's not me it's the test" argument. These systems are not just next token predictors, they learn complex algorithms and can perform general computation, its just so happens that by asking them to next-token predict the internet they learn a bunch of smart ways to compress everything, potentially in a way similar to how we might use a general concept to avoid memorizing a lookup table. Please have a look at https://arxiv.org/pdf/2211.15661 and https://mobile.twitter.com/DimitrisPapail/status/16208344092.... We don't understand everything that's going on yet but it would be foolish to discount anything at this stage, or to state much of anything with any degree of confidence (and that stands for both sides of the opinion spectrum). Also these systems aren't exposed to the real world today, but this will be untrue very soon https://ai.googleblog.com/2023/03/palm-e-embodied-multimodal...
rafiki6 · 3 years ago
I never said: - "it's not me it's the test" - "These systems are not just next token predictors"

None of the papers or blogs you've shared offer any points that actually rebutt what I'm saying.

And yes, we will eventually have them work in real time. Can't wait.

rafiki6 commented on GPT-4 Takes a New Midterm and Gets an A   betonit.substack.com/p/gp... · Posted by u/bumbledraven
famouswaffles · 3 years ago
Watching posts shift in real time is very entertaining. First it's not generally intelligent because it can't tackle new things then when it obviously does its not generally intelligent because it's overfit.

You've managed to essentially say nothing of substance. So it passes because structure and concepts are similar. okay. are students preparing for tests working with alien concepts and structures then because i'm failing to see the big difference here.

A model isn't overfit because you've declared it so. and unless GPT-4 is several trillion parameters, general overfitting is severely unlikely. But i doubt you care about any of that. Can you devise a test to properly asses what you're asserting ?

rafiki6 · 3 years ago
I have no idea what is shifting in real time. I formed this opinion of GPT4 by running it through several benchmarks and making adjustments to them, so my view is empirical and it was formed 1 week after it came out.

Your post says nothing of substance because it offers no substantial rebuttal and seems to just attack a position by creating a hand-waved argument without any clear understanding of how parameters in-fact impact a model's outputs.

You also completely missed my point.

rafiki6 commented on GPT-4 Takes a New Midterm and Gets an A   betonit.substack.com/p/gp... · Posted by u/bumbledraven
rafiki6 · 3 years ago
Just because it's newly created doesn't mean that the structure of the language and the concepts it represents are actually new.

It's clear that whatever tests he writes cover well established and understood concepts.

This is where I believe people are missing the point. GPT4 is not a general intelligence. It is a highly overfit model, but it's overfit to literally every piece of human knowledge.

Language is humanities way of modelling real world concepts. So GPT is able to leverage the relationships we create through our language to real world concepts. It's just learned all language up until today.

It's an incredible knowledge retrieval machine. It can even mimick how our language is used to conduct reasoning very well.

It can't do this efficiently, nor can it actually stumble upon a new insight because it's not being exposed in real time to the real world.

So, this professors 'new' test is not really new. It's just a test that fundamentally has already been modelled.

u/rafiki6

KarmaCake day1278December 7, 2010View Original