Readit News logoReadit News
throwaway_2968 · a year ago
Throwaway account here. I recently spent a few months as a trainer for a major AI company's project. The well-paid gig mainly involved crafting specialized, reasoning-heavy questions that were supposed to stump the current top models. Most of the trainers had PhDs, and the company's idea was to use our questions to benchmark future AI systems.

It was a real challenge. I managed to come up with a handful of questions that tripped up the models, but it was clear they stumbled for pretty mundane reasons—outdated info or faulty string parsing due to tokenization. A common gripe among the trainers was the project's insistence on questions with clear-cut right/wrong answers. Many of us worked in fields where good research tends to be more nuanced and open to interpretation. I saw plenty of questions from other trainers that only had definitive answers if you bought into specific (and often contentious) theoretical frameworks in psychology, sociology, linguistics, history, and so on.

The AI company people running the projects seemed a bit out of their depth, too. Their detailed guidelines for us actually contained some fundamental contradictions that they had missed. (Ironically, when I ran those guidelines by Claude, ChatGPT, and Gemini, they all spotted the issues straight away.)

After finishing the project, I came away even more impressed by how smart the current models can be.

EvgeniyZh · a year ago
I'm currently pursuing PhD in theoretical condensed matter physics. I tried submitting questions to Humanity Last Exam [1], and it was not too hard to think of a problem that none of top llms (Claude, gpt, Gemini + both o1 models) got right. What was surprising for me is how small my bag of tricks was. I could think of 5-6 questions in my direct area of expertise with simple numerical answer that were hard for llms, and another maybe 5 that they were able to solve. But basically that was all my expertise. Of course there is stuff that can't be checked with simple numerical answer (quite a lot in my case), and there are probably additional questions that would require more effort from me to give a correct answer. But all in all, I suddenly felt I'm a one-trick pony, and that's given that my PhD is relatively diverse.

[1] https://agi.safe.ai/submit

godelski · a year ago

  > Many of us worked in fields where good research tends to be more nuanced and open to interpretation
I've had a hard time getting people to understand this. It's always felt odd tbh. It's what's meant by "truth doesn't exist". Because it doesn't exist with infinite precision, though they are plenty of times where there's good answers. In our modern world I think one of the big challenges is that we've advanced enough that low order approximations are no longer good enough. It should make sense, as we get better we need more complex models. We need to account for more.

In many optimization problems there are no global solutions. This isn't because we lack good enough models, it's just how things are. And the environment is constantly changing, the targets moving. So the complexity will always exist. There's beauty in that, because what fun is a game when you beat it? With a universe like this, there's always a new level ahead of us.

ramblingrain · a year ago

  > Many of us worked in fields where good research tends to be more nuanced and open to interpretation
>> I've had a hard time getting people to understand this.

Why, can't you just tell them "it's not a science, it's more like performance art."

layer8 · a year ago
I wouldn’t look for questions with yes/no answers, but for questions where the answers can have correct/incorrect reasoning. Of course, you can’t turn those into automated benchmarks, but that’s maybe kinda the point.
varjag · a year ago
I think that's the point: correctness can be a sliding scale. There is Newton correct, and there's Einstein correct.
jrussino · a year ago
> when I ran those guidelines by Claude, ChatGPT, and Gemini

Did you mention this to folks running the project? I would think that pasting the "detailed guidelines" from an internal project into a competitor's tool would run afoul of some confidentiality policy. At least, this sort of restriction has been a barrier to using such LLM tools in my own professional work.

spencerchubb · a year ago
> I saw plenth of questions that only had definitive answers if you bought into specific theoretical frameworks

That kind of stuff would be great to train on. As long as the answer says something like "If you abide by x framework, then y"

DavidSJ · a year ago
What were the contradictions?
JCharante · a year ago
> AI models now require trainers with advanced degrees

Companies that create data for FM (foundational model) companies have been hiring people with degrees for years

> Invisible Tech employs 5,000 specialized trainers globally

Some of those companies have almost a million freelancers on their platforms, so 5k is honestly kinda medium sized.

> It takes smart humans to avoid hallucinations in AI

Many smart humans fail at critical thinking. I've seen people with masters fail at spotting hallucinations in elementary level word problems.

aleph_minus_one · a year ago
> Many smart humans fail at critical thinking. I've seen people with masters fail at spotting hallucinations in elementary level word problems.

This is like lamenting that a person who has a doctoral degree, say, in mathematics or physics often don't have a more than basic knowledge about, for example, medicine or pharmacy.

visarga · a year ago
> This is like lamenting that a person who has a doctoral degree, say, in mathematics or physics often don't have a more than basic knowledge in, for example, medicine or pharmacy.

It was word problems not rocket science. That tells a lot about human intelligence. We're much less smart than we imagine, and most of our intelligence is based on book learning, not original discovery. Causal reasoning is based on learning and checking exceptions to rules. Truly novel ideation is actually rare.

We spent years implementing transformers in a naive way until someone figured out you can do it with much less memory (FlashAttention). That was such a face palm, it was a trivial idea thousands of PhDs missed. And the code is just 3 for loops, with a multiplication, a sum and an exponential. An algorithm that fits on a napkin in its abstract form.

JCharante · a year ago
It depends on your definition of smart. I think that holding a degree != smart.
dilawar · a year ago
I think many people like to believe that solving puzzles will somehow make them better at combinatorics. Lateral skill transfer in non-motor skills e.g. office works, academics works etc may not be any better than motor skills. It's easier to convince people that playing soccer everyday wouldn't make them any better at cricket, or even hockey.
39896880 · a year ago
All the models do is hallucinate. They just sometimes hallucinate the truth.
vharuck · a year ago
Nice George Box paraphrasing.
therealdrag0 · a year ago
A great deal of my own thinking could be described as hallucinating, given a sufficiently loose definition.

Deleted Comment

Stem0037 · a year ago
AI, at least in its current form, is not so much replacing human expertise as it is augmenting and redistributing it.
alephnerd · a year ago
Yep. And that's the real value add that is happening right now.

HN concentrates on the hype but ignores the massive growth in startups that are applying commoditized foundational models to specific domains and applications.

Early Stage investments are made with a 5-7 year timeline in mind (either for later stage funding if successful or acquisition if less successful).

People also seem to ignore the fact that foundational models are on the verge of being commoditized over the next 5-7 years, which decreases the overall power of foundational ML companies, as applications become the key differentiator, and domain experience is hard to build (look at how it took Google 15 years to finally get on track in the cloud computing world)

MostlyStable · a year ago
I notice that a lot of people seem to only focus on the things that AI can't do or the cases where it breaks, and seem unwilling or incapable of focusing on things it can do.

The reality is that both things are important. It is necessary to know the limitations of AI (and keep up with them as they change), to avoid getting yourself in trouble, but if you ignore the things that AI can do (which are many, and constantly increasing), you are leaving a ton of value on the table.

skybrian · a year ago
It would be nice to have more examples. Without specifics, “massive growth in startups” isn’t easily distinguishable from hype.

A trend towards domain-specific tools makes sense, though.

danielbln · a year ago
Same with consultancy. There is a huge amount of automation that can be done with current gen LLMs, as long as you keep their shortcomings in mind. The "stochastic parrot" crowd seems an over correction to the hype bros.
Workaccount2 · a year ago
...and then being blown up when the AI company integrates their idea.
hanniabu · a year ago
Yes, it should really be called collective intelligence not artificial intelligence
recursive · a year ago
It kind of seems like it got dumber to me. Maybe because my first exposure to it was so magical. But now, I just notice all the ways it's wrong.
joe_the_user · a year ago
I think any given model is going to decay over time. The data used in them becomes out-dated and the models cost money to run and various cost-saving short-cuts are thus made to reduce accuracy. Also, having your old model seem clunky can make your new model seem great.

Obviously, there are real ways new model get better too. But if we have diminishing returns, as many speculate, it will take a while for it to be entirely obvious.

theptip · a year ago
I feel this is one of the major ways that most pundits failed with their “the data is going to run out” predictions.

First and foremost a chatbot generates plenty of new data (plus feedback!), but you can also commission new high-quality content.

Karpathy recently commented that GPT-3 needs so many parameters because most of the training set is garbage, and that he expects eventually a GPT-2 sized model could reach GPT-3 level, if trained exclusively on high-quality textbooks.

This is one of the ways you get textbooks to push the frontier capabilities.

llm_trw · a year ago
I've not done pre-training for LLMs, but years ago I generated a completely synthetic dataset for table recognition using an off the shelf document segmentation model, raw TeX, a random table generator, a discriminator and an evolutionary algorithm to generate different styles of tables.

The project got killed due to management but I still got results on that dataset better than state of the art in 2023 with no human annotation.

The Venn diagram of people who know TeX well enough to write a modular script for table generation with metadata and people who know how to train LLMs has an intersection of a dozen people I imagine.

theptip · a year ago
This is a great example. Areas where an expert can write a synthetic data generator (eg code, physics simulators, etc) are the dream scenario here.

It seems to me there is a huge amount of unharvested low-hanging fruit here, for example IIUC GPT is not trained on synthetic code in languages other than Python (and maybe JS, I don’t recall).

from-nibly · a year ago
At a good cost though? Last time I checked generating good data costs a tiny bit more than an http request to somebody elses website.
theptip · a year ago
If the cost is not “good enough”, why are the big guys buying a lot of it?
CamperBob2 · a year ago
Which is fine. If all AI does is represent human knowledge in a way that makes it explainable and transformable rather than merely searchable, then the hype is justified... along with Google's howling, terrified panic.

The role played by humans on the training side is of little interest when considering the technology from a user's perspective.

jumping_frog · a year ago
The problem is my back and forth with Claude is just Claude's data not available to any other. Unlike stack overflow which is fair game for every AI.
iwontberude · a year ago
I think the most interesting aspect of it is the human training. Human blindsides, dogma, ignorance, etc. All on demand and faster than you can validate its accuracy or utility. This is good.
CamperBob2 · a year ago
Shrug... I don't know what anyone expected, once humans got involved. Like all of us (and all of our tools), AI is vulnerable to human flaws.
yawnxyz · a year ago
"raw dogging" non-RLHF'd language models (and getting good and unique output) is going to be a rare and sought-after skill soon. It's going to be a new art form

someone should write a story about that!

zmgsabst · a year ago
I’m personally waiting on AI psychology to take off.

Eg, why does ChatGPT like the concept of harmony so much and use it as a principle for its political analysis?

yawnxyz · a year ago
I thought it's b/c of RLHF?

I think the earliest GPT-3 wasn't too keen on harmony, but I might be mis-remembering

SamGyamfi · a year ago
There is a cost-quality tradeoff companies are willing to make for AI model training using synthetic data. It shows up fairly often with AI research labs and their papers. There are also upcoming tools that remove the noise that would trip up some advanced models during annotation. Knowing this, I don't think the "human-labeled data is better" argument will last that long.