Readit News logoReadit News
benreesman · 2 years ago
I think this post makes a few good points, certainly the parabolic trajectory Scale seems to be on is at least suggestive if not conclusive that there’s a lot more going on now than just big text crawls.

And Phi-3 is something else, even from relatively limited time playing with it, so that’s useful signal for anyone who hadn’t looked at it yet. Wildly cool stuff.

It seems weird to not mention Anthropic or Mistral or FAIR pretty much at all: they’re all pretty clearly on more modern architectures at least as concerns capability per weight and Instruct-style stuff. I’m part of a now nontrivial group who regards Opus as basically shattering GPT-4-{0125, 1106}-Preview (which is basically the same as 4o for pure language modalities) on basically everything I care about, and LLaMA3 is just about there as well, maybe not quite Opus, comparable if you ignore trivially gamed metrics like MMLU.

And I have no idea why we’re talking about GPT-5 when there’s little if any verifiable evidence it even exists as a training run tracking to completion. Maybe it is, maybe not, but let’s get a look at it rather than just assume that it’s going to lap the labs that are currently pushing the pace now?

ein0p · 2 years ago
Have you actually tried using Opus side by side with GPT4 on “work” related stuff? GPT4 is way better in my experience, to the point where I cancelled my Opus subscription after just a couple of months.
benreesman · 2 years ago
I make an effort to use both and several high-capability open tunes every day (it’s not literally every day but I have keyboard shortcuts for all of them).

Opus historically had issues with minor typographical errors, though recently that seems to not happen often, lots of very sharp people at Anthropic.

So a month ago if I wanted something from Opus I’d run it through a cleanup pass courtesy of one of the other ones, but even my old standby dolphin-8x7 can clean up typos. 1106 can as well, but all else equal I don’t want to be sending my stuff to any black box data warehouse and I’m always surprised so many other sophisticated people don’t share the preference.

My personal eyeball capability check is to posit a gauge symmetry and ask what it thinks the implied conserved quantity is, and I’ve yet to see Opus not crush that relative to anything else, including real footnotes.

On coding I usually hand it a Prisma schema and ask for a proto3/gRPC definition that is a good way to interact with it, Opus in my personal experience also dominates there.

If you have an example of a task that represents a counter example I’d be grateful for another little integration test for my personal ad-hoc model card. I want to know the best tool for every job.

virgildotcodes · 2 years ago
I feel like Opus is much better at creative writing in terms of sounding more natural and less formulaic, but GPT4 does beat it on just about everything else.
Zambyte · 2 years ago
That's really interesting to me. I've been using both through my Kagi subscription, but I always find myself favoring the quality of Opus. I generally use GPT 4o if I don't want to wait for a slow response from Opus, and I use Opus when I want the highest quality.

Deleted Comment

fnordpiglet · 2 years ago
OpenAI has publicly said they’ve started building GPT5.

https://openai.com/index/openai-board-forms-safety-and-secur...

dannyw · 2 years ago
Phi3 was trained on benchmarks, it’s contaminated and deceitful. Actual performance is much worse in my experience.
lostmsu · 2 years ago
Phi-3-Mini has the same ELO on chatbot arena as the oldest GPT-3.5-Turbo. It is an 8GB model (~4B paramters?)
akira2501 · 2 years ago
> And Phi-3 is something else,

It's great that we get to keep saying this. I wonder if that's because we have no objective statistics to measure these projects by.

benreesman · 2 years ago
IMHO it’s easier than people largely seem to imply.

Make it easy to try everything, let people decide for themselves what works best for them.

Ya know, like a market.

stephc_int13 · 2 years ago
The current state of LLMs would be several orders of magnitude more impressive if they were only trained from data scrapped on the web.

But this is not the reality of modern LLMs by a long shot, they are trained in increasingly large parts from custom built datasets that are created by countless paid individuals, hidden behind stringent NDAs.

The author here seems to see that as a strength, an opportunity for unbounded growth and potential, I think this is the opposite, this approach is close to a gigantic whack a mole game, effectively unbounded, but in the wrong way.

preommr · 2 years ago
> countless paid individuals, hidden behind stringent NDAs.

If this is so prevalent, wouldn't there be a proportional amount of data leaks? If there any particular evidence, even of doubtful authenticity, of this being the case?

solidasparagus · 2 years ago
What sort of leak? I've seen data labeler/generation teams hired. I've never heard anyone describe the existence of these teams as a secret. No one hides the existence of Scale AI. People talk about which providers are better for different scenarios and when you need to inhouse and which companies are good at helping you build an inhouse team.

Are you talking about leaks of the actual training data? The secret sauce of modern LLMs? That is like leaking the google3 source code or the recipe for Coca Cola - a ludicrously risky move. And for what gain?

Jensson · 2 years ago
Those individuals just create data, they don't have access to it, think mechanical turk workers. All modern AI is powered by many such workers. LLM is the most funded modern AI, they have massive numbers of such workers for sure.
apike · 2 years ago
Yeah this is an interesting point. Other threads make the point about the "bitter lesson", and how expert-trained ML has historically not scaled, and human-generated LLM training data may just be repeating that dead end. Maybe so.

Something that is new this time around, AFAIK, is that we haven’t previously had general ML systems that businesses and consumers are paying billions of dollars a year to use. So if, say, 10% of revenue goes back in to making better data sets every year, I can imagine continued improvement on certain economically valuable use cases – though likely with diminishing returns.

jackblemming · 2 years ago
Reminds me of the same issues with self driving. Seems like we need a completely different approach to solve these class of problems.

Deleted Comment

surfingdino · 2 years ago
> For example, if your model is hallucinating because you don’t have enough training examples of people expressing uncertainty, or biased because it has unrepresentative data, then generate some better examples!

Or, as the case may be... humans are biased? Also "generate some better examples" sounds like fudging data to fit the expected outcome. It smells of clutching at straws hoping to come up with something before the world looses interests and investor money runs out.

If you want to see how LLMs fail at coming up with original responses ask your favourite hallucinating bot to come up with fifty different ways of encouraging people to "Click the Subscribe button" in a YT video. Not only it will not come up with anything original, but it will simply start repeating itself (well, not itself, it will start repeating phrases found in YT video transcripts).

apike · 2 years ago
> Also "generate some better examples" sounds like fudging data to fit the expected outcome.

LLMs are tools. As a tool author, you have certain desired outcomes for certain use cases. If the current data you’re training on isn’t giving you those outcomes, it is absolutely reasonable to "fudge" the data. This might mean reducing bias, or adding bias, or any number of nudges. Training an LLM is not a scientific study, it’s a product development effort.

surfingdino · 2 years ago
Agreed. However, you are then giving your tools to people who have none of that experience and understanding and apply it to the problems they are trying to solve without then taking a pause and checking the results against facts. There is a lot of trust in the outputs and little vigilance. A common reply to such concerns is "well, you should be able to spot incorrect information in the outputs" which is tricky if we are talking about education where by definition students are yet to learn correct answers or lower levels of career development, very much similar to education when they are learning on the job. The lack of ability to quote and trace sources of information used to construct output by an LLM is a major red flag for me, sensitive information leakage is another. They way LLMs are sold is irresponsible, they are sold as tools to solve problems, not as a thousand monkeys trying to type up the whole works of Shakespeare, which isv where we are at the moment.
sdfgtr · 2 years ago
> While some of this is for annotation and ratings on data that came from the web or LLMs, they also create new training data whole-hog:

The article states that this human data is PhDs, poets, and other experts but my recollection from some info about programming LLM training is that there was a small army of low paid Indian programmers feeding it with data.

Even if it's actually experts now I have to wonder when that will switch to 3rd worlders making $1/hour.

jamilton · 2 years ago
Here are the job postings from the mentioned company, Outlier. https://boards.greenhouse.io/outlier
fewald_net · 2 years ago
Thank you for sharing the link. I got curious and clicked on one. They want programming skills and pay “up to $30/h”.
stefan_ · 2 years ago
I love the marketing upstart attitude, but indeed, the reality of "PhDs, poets and subject matter experts expanding the frontiers of AI" is much more likely to be the "Amazon cashierless supermarket" experience.

The problem with hiring that group of people is presumably that they are not poor enough to lack ambition in their career, which every dummy can spot from miles away is an utter dead end feeding some LLM.

XorNot · 2 years ago
Isn't it just curating an encyclopaedia though? The point is that LLM training is moving from "suck down the internet" to "consume an annotated and contextualised reference of the library of Congress".

The difference between trusting 5 random people to tell you how they think quantum mechanics works versus asking 5 presently publishing physicists.

Deleted Comment

Deleted Comment

zer00eyz · 2 years ago
You sir get an F in history and the industry does too.

Does no one remember why expert systems fell apart? Because you have to keep paying experts to feed the beast. Because they are bound to the whims and limitations of experts. Making up data isnt going to get us there, we already failed with this method ONCE.

Open AI's bet with MS and the resignation of all the safety people says everything you need to know. MS gets everthin up to AGI... IF you thought you were close, if you thought that you were going to get there with a bigger model and more data then you MIGHT want MS's money. And MS had its own ML folks publish papers with "hints of AGI", The google engineer saying "it's AGI" before getting laughed at...

I suspect that everyone at OpenAI was high on their own supply. That they thought AGI would emerge, or sapience, or sentience if they shoved enough data at it. I think the safety minding folks leaving points to the fact that they found the practical limitations.

Show me the paper that has progress on hallucination. Show me the paper that doubles effectiveness and halves the size. These are where we need progress for this to become more than grift, than NFT's.

solidasparagus · 2 years ago
> Does no one remember why expert systems fell apart?

Many of the current generation of AI experts mostly either did not pay great attention to the history of AI or they believe this time is completely different. They would do well to spend more time learning about history.

However, your view doesn't strike me as correct either. Expert system fell apart because the world was more complex than researchers realized and enumeration was essentially discovered to be infeasible (more or less as you say). But the impossibility of enumerating the world isn't news, everyone knows "the bitter lesson". And this isn't the past - now everyone on earth carries around a computer, a video camera and a microphone. They talk to each other through the internet. Remote workers screens' are recorded. Billions of vehicles with absurd numbers of sensors are roaming around the world. More of the arenas that matter to humanity are digital and thus effective domains for automated exploration and data generation.

The information about how the world operates exists or can be generated, the only real question is how to get your hands on it.

discreteevent · 2 years ago
> The information about how the world operates exists or can be generated, the only real question is how to get your hands on it.

I'm sure I could read all the information for an astrophysics course in a relatively short time. Understanding it is a different matter.

zer00eyz · 2 years ago
> The information about how the world operates exists or can be generated

The hubris of mathematics. At what scale does whether prediction become 100 percent accurate? How large of a model do you need, and how big of a computer to run it?

Do we thing that reducing the world to a model and feeding it through (what isnt even close to a model) of "thought" or "interaction" or ... what ever you want to bill and LLM as is going to be any more accurate than weather prediction?

YeGoblynQueenne · 2 years ago
>> Does no one remember why expert systems fell apart?

There were many reasons. One of them was the "Knowledge Acquisition Bottleneck", but that was not about the cost of paying experts, rather the cost of creating and maintaining a potentially very large knowledge base (i.e. one big mother of a database of production rules). Also, the fact that many experts' knowledge is tacit and not easily formalisable.

Modern machine learning began in the 1980's as an effort to overcome the Knowledge Acquisition Bottleneck. Accordingly many early machine learning approaches were designed to learn production rules for expert systems. Decision trees, one of the staple classifiers in data science, come from that era; you can tell, because decision trees are a symbolic, logic-based "model".

There were other problems with expert systems, e.g. their infamous "brittleness". But modern, statistical machine learning systems, are also criticised for "brittleness" too.

There were also purely political reasons and nothing to do with science or technology considerations. Then there was the 5th Generation Computer Project, and the AI winter, and then there were no more expert systems.

The journal of Expert Systems with Applications is still alive and well, on the other hand, although it mostly publishes on machine learning and neural nets these days. With an occasionally cool article, like one about Wolf Colony Optimisation I spotted recently. Too tired to look for links now, sorry.

sebzim4500 · 2 years ago
They aren't going to show you any papers at all, they like money.
fnordpiglet · 2 years ago
Experts weren’t the bottleneck on expert systems it was the systems weren’t particularly adaptive, were too rigid, weren’t able to make abductive conclusions, and the user interfaces were way too difficult in situ. LLMs actually tackle quite a lot of these issues FWIW but I wouldn’t look at them as a replacement for expert systems. Instead they’re probably what will make them useful by providing a natural human interface and a way of providing an abductive “reasoning” ability ontop of traditional expert systems.
numpad0 · 2 years ago
I've seen some infographics that shows LLMs practically need to see same data 4 times or less, and once is fine too(trained for one epoch).

And I was like, y'saying, it's a zipped list of edge cases...

Jensson · 2 years ago
Its a zip with lossy compression for text. First useful lossy text compression algorithm we have made.
falcor84 · 2 years ago
>Show me the paper that doubles effectiveness and halves the size.

LLM's have pretty clearly been the most rapidly advancing technology in the history of humankind. Are you not entertained?!

kazinator · 2 years ago
> A dataset like “50,000 examples of Ph.Ds expressing thoughtful uncertainty when asked questions they don’t know the answer to” could be worth a lot more than it costs to produce.

Those PHDs better up their negotiating skills then.

fnordpiglet · 2 years ago
I sort of hope we get a tech investment fueled WPA that simply pays skilled writers to write, and I hope they allow the body of work to be released by the authors to the public when there’s something of general value written. A wonderful irony of the training and development of superior language models could be the creation of a superior corpus of human authored work.
jacobsenscott · 2 years ago
OpenAI etc will be paying irresistible sums of money to companies that promised to keep data private. Think slack (and their recent "opt out" fiasco), Atlassian, Dropbox...
amelius · 2 years ago
"It's easier to ask for forgiveness" is the main modus operandi nowadays ...
moogly · 2 years ago
They don't even need to do that...
goatlover · 2 years ago
As long as you can pay the lawyers.
barbariangrunge · 2 years ago
Discord…