This appears to be part of an embargoed news blitz from a few news organizations (Verge and Bloomberg posted the same news at the same time), which is an interesting PR deviation from OpenAI posting it on their blog and having it go viral. The news isn't on their official blog at all currently.
Yeah I found that interesting. Flubbed embargo times makes sense, but could it also be that they're letting the news organizations have first dibs to build a little goodwill with the industry?
Is OpenAI still making any changes to make ChatGPT better and more accurate and correct, or are they only focusing on making it cheaper by giving us weaker/dumber responses in a faster way nowadays? I have cancelled my subscription recently because didn't think that the GPT-4 was much better than what I get for free at Claude or Gemini.
Going out on a limb here that they're not focusing the entire company on one objective.
Adding to this, reducing cost means they've reduced compute and improved quality per unit of computation.
That matters! It matters that these systems currently require an embarassing amount of energy to run otherwise. It matters that these models could be portable to the point where they can be run locally on portable consumer electronics rather than being locked into remote compute and HTTP calls.
> Is OpenAI still making any changes to make ChatGPT better and more accurate and correct
a few prompts recently, it just answered for me instead of refusing, to my surprise, so if feels like it's been getting better, but unfortunately there's no hard data I have on that, it's more of a feeling, so it's hard to prove that's actually true to someone else.
then again, ChatGPT-4o was recently found to be unable to do 9.11 - 9.9 correctly unless pressed, so there's still a long ways to go.
That is interesting for sure, but the subsequent questioning "Why did you get it wrong the first time?" and asking it to explain itself seems like it shows a misunderstanding about how these technologies actually work. Please stop treating these tools like they are conscious and actually understand what they are spitting back to you! /rant
LLMs are text completion engines and can't really do math. Even if it happened to know the correct answer to 9.11 - 9.9, it could still fall flat on 9.12 - 9.9.
I suspect model improvement costs exponentially more. Between the fact it's been over a year since 4's release and 4o is only marginally better, and that OpenAI is focusing on other things (Sora, mini models, voice, etc) suggests a "better and more accurate and correct" model is actually really hard.
If request latency is your top concern and you don’t need it to be as smart as GPT-4, why are you using OpenAI and not a model hosted by Groq or similar?
There might be some changes, but OpenAI has consistently shown they have way more polished stuff sitting and getting ready to go out, based on the quality of the product or update on the day of the launch.
If "AI" tools are eventually going to have to be "free" (as in beer) to compete, I shudder to think of what companies like OpenAI will have to extract from users to please investors...
they could found a charity, with the aim of creating open AI models. they could call it OpenAI to draw attention that it's not a proprietary commercial technology.
There are many ways to extract value using a commodity: in OpenAI's case (and a few other LLM providers), the current strategy appears to be lock-in with additional services.
My somewhat limited understanding of LLMs (and ML in general) is that you would end up using a LOT of bandwidth, not to mention that training would be insanely slow, since each layer of the neural network requires the results of the previous layer, and then needs to send its results back.
The greatest limiter for training isn't raw computing power, but storage. If your model is 400B parameters, you need 800GB to store it, assuming fp16. Then you need another 800 GB for calculating gradients. Sharding all this out means transferring a lot of data. If your device only stores 1B parameters out of the 400B, that means having to download 2 GB of data, doing your share of the work, then uploading 2 GB of results. Even with gigabit internet, you'll spend an order of magnitude more time transferring data than actually processing it.
At that point, it'd be faster to train on standard-specced PC that had to constantly page out most of the model.
Sounds like a really interesting technical challenge.
Couple of issues I can see: (1) most devices out there would probably be mobile, so no NVIDIA/CUDA for you; (2) even with binding to, say, Apple Silicon, you might still be memory limited, i.e. can you fit the entire net on a single mobile GPU; (3) network latency?
I would think that you'd need a model format that could be updated in different places at the same time and easily merged at various stages. Because right now, adding all that network latency only slows things down.
I'm sure there's a method to permit it, but it doesn't seem like anyone's worked it out yet.
the problem is internode bandwidth and latency. supercomputers are super because they have insanely fast bandwidth between the nodes so the GPUs can talk to each other as if they were local.
I think enough people have already demonstrated that they essentially don't care. À la “facebook knows you're pregnant before you do” – and now so will “Open”“AI”.
Looking at https://openai.com/index/gpt-4o-mini-advancing-cost-efficien... , gpt-4o-mini is better than gpt-3.5 but worse than gpt-4o, as was expected. gpt-4o-mini is cheaper than both, however. Independent third-party performance benchmarks will help.
Claude 3.5 sonnet is so much better in every test I made (coding and everyday mondain stuffs), it beats me why anyone would choose chatgpt (I'm using free versions only).
Good to know, although this is targeting cheaper API use for specific applications in which a second-tier model is sufficient. Note however that according to the LMSYS Leaderboard, GPT-4o rates slightly higher than Claude 3.5 Sonnet.
There's market for small models with large context, for cases where there’s little need for reasoning (summarization, searching etc). 4o-mini is probably better and cheaper to run than 3.5.
It means they don't want anyone releasing much stronger models then they do or they will undercut you by 1 magnitude and out perform you by 5%. Hardly a recipe for long-term consumer benefit.
OpenAI’s strategy has been bizarre since at least last November, when they launched custom GPTs, then had the boardroom coup.
Since the launch of Claude 3 Opus, and then Claude 3.5 Sonnet, they have been significantly behind Anthropic in terms of the general intelligence of their models. And instead of deploying something on par or better, they are making demos of video generation (Sora) or audio-to-audio models, not releasing anything.
GPT-4o is quite bad at coding, often getting stuck in a loop, and “fixing” buggy code by rewriting it without any changes.
GPT-4o is speculated to be a distillation of a larger model, and now GPT-4o-mini is an even dumber smaller model. But what’s the point?
Who is actually using small/fast/cheap/dumb models in production apps? Most real apps require higher reliability than even the biggest/slowest/priciest/smartest models can provide today. For the use case of transformers that has taken off, aiding students and knowledge workers in one-off tasks like writing code and prose, most users want smarter, more reliable outputs, even at the expense of speed and cost.
GPT-4o-mini seems like a move to increase margins, not make customers happier. That, like demoing products without launching them, is what big old slow corporations do, not how world-leading startups operate.
Since Claude 3.5 Sonnet was released I can't go back to GPT anymore. It sounds too "robotic" and is overly verbose. It explains every little detail that I don't want to know and still is far worse than Claude. OpenAI has to really step up their game if they don't want to fall behind. In fact, GPT-4 got worse back in November, the best version is still the one from June 2023 but it's only available in the API.
Sonnet is great, but also suggest exploiting custom instructions in the ChatGPT UI. Here's a snippet from mine:
Extremely concise, formal. As short as possible. Assume I am an industry expert in any topic we discuss. Answer assuming I have the highest level of intellect possible, and do not require explication regardless of the sophistication of the topic. In cases where one approach among many is superior, offer an opinionated argument in favor of that approach.
I don't think there are any VPSs that can do that in a way that is even remotely performant or a good value compared to something like an LLM inference provider or serverless GPU. I would look into together.ai and RunPod for that type of thing.
But let me know if you find something. I just don't think something tiny like phi-3 which could run on a VPS, although great for it's size, is at all comparable to this stuff in terms of ability.
True, you could run it at home on a server though.
My AI server takes about 60W idle and 300-350W while running a query in llama3. At a kWh price of 0.15€ that ends up at about 7-10€ a month if it's not loaded too heavily. Not bad IMO.
The server could be more energy optimized though. But that would cost me also.
Adding to this, reducing cost means they've reduced compute and improved quality per unit of computation.
That matters! It matters that these systems currently require an embarassing amount of energy to run otherwise. It matters that these models could be portable to the point where they can be run locally on portable consumer electronics rather than being locked into remote compute and HTTP calls.
a few prompts recently, it just answered for me instead of refusing, to my surprise, so if feels like it's been getting better, but unfortunately there's no hard data I have on that, it's more of a feeling, so it's hard to prove that's actually true to someone else.
then again, ChatGPT-4o was recently found to be unable to do 9.11 - 9.9 correctly unless pressed, so there's still a long ways to go.
https://chatgpt.com/share/47d260ae-0c62-48ab-8d3b-84767acd7f...
The greatest limiter for training isn't raw computing power, but storage. If your model is 400B parameters, you need 800GB to store it, assuming fp16. Then you need another 800 GB for calculating gradients. Sharding all this out means transferring a lot of data. If your device only stores 1B parameters out of the 400B, that means having to download 2 GB of data, doing your share of the work, then uploading 2 GB of results. Even with gigabit internet, you'll spend an order of magnitude more time transferring data than actually processing it.
At that point, it'd be faster to train on standard-specced PC that had to constantly page out most of the model.
Couple of issues I can see: (1) most devices out there would probably be mobile, so no NVIDIA/CUDA for you; (2) even with binding to, say, Apple Silicon, you might still be memory limited, i.e. can you fit the entire net on a single mobile GPU; (3) network latency?
I'm sure there's a method to permit it, but it doesn't seem like anyone's worked it out yet.
Looking at https://openai.com/index/gpt-4o-mini-advancing-cost-efficien... , gpt-4o-mini is better than gpt-3.5 but worse than gpt-4o, as was expected. gpt-4o-mini is cheaper than both, however. Independent third-party performance benchmarks will help.
Good to know, although this is targeting cheaper API use for specific applications in which a second-tier model is sufficient. Note however that according to the LMSYS Leaderboard, GPT-4o rates slightly higher than Claude 3.5 Sonnet.
> mondain stuffs
Mundane, not mondain.
Since the launch of Claude 3 Opus, and then Claude 3.5 Sonnet, they have been significantly behind Anthropic in terms of the general intelligence of their models. And instead of deploying something on par or better, they are making demos of video generation (Sora) or audio-to-audio models, not releasing anything.
GPT-4o is quite bad at coding, often getting stuck in a loop, and “fixing” buggy code by rewriting it without any changes.
GPT-4o is speculated to be a distillation of a larger model, and now GPT-4o-mini is an even dumber smaller model. But what’s the point?
Who is actually using small/fast/cheap/dumb models in production apps? Most real apps require higher reliability than even the biggest/slowest/priciest/smartest models can provide today. For the use case of transformers that has taken off, aiding students and knowledge workers in one-off tasks like writing code and prose, most users want smarter, more reliable outputs, even at the expense of speed and cost.
GPT-4o-mini seems like a move to increase margins, not make customers happier. That, like demoing products without launching them, is what big old slow corporations do, not how world-leading startups operate.
Extremely concise, formal. As short as possible. Assume I am an industry expert in any topic we discuss. Answer assuming I have the highest level of intellect possible, and do not require explication regardless of the sophistication of the topic. In cases where one approach among many is superior, offer an opinionated argument in favor of that approach.
Edit: I’m amazed by how offended some people are by such a simple question.
But let me know if you find something. I just don't think something tiny like phi-3 which could run on a VPS, although great for it's size, is at all comparable to this stuff in terms of ability.
My AI server takes about 60W idle and 300-350W while running a query in llama3. At a kWh price of 0.15€ that ends up at about 7-10€ a month if it's not loaded too heavily. Not bad IMO.
The server could be more energy optimized though. But that would cost me also.
Deleted Comment