Readit News logoReadit News
34679 · 9 months ago
I'd like to offer a cautionary tale that involves my experience after seeing this post.

First, I tried enabling o3 via OpenRouter since I have credits with them already. I was met with the following:

"OpenAI requires bringing your own API key to use o3 over the API. Set up here: https://openrouter.ai/settings/integrations"

So I decided I would buy some API credits with my OpenAI account. I ponied up $20 and started Aider with my new API key set and o3 as the model. I get the following after sending a request:

"litellm.NotFoundError: OpenAIException - Your organization must be verified to use the model `o3`. Please go to: https://platform.openai.com/settings/organization/general and click on Verify Organization. If you just verified, it can take up to 15 minutes for access to propagate."

At that point, the frustration was beginning to creep in. I returned to OpenAI and clicked on "Verify Organization". It turns out, "Verify Organization" actually means "Verify Personal Identity With Third Party" because I was given the following:

"To verify this organization, you’ll need to complete an identity check using our partner Persona."

Sigh I click "Start ID Check" and it opens a new tab for their "partner" Persona. The initial fine print says:

"By filling the checkbox below, you consent to Persona, OpenAI’s vendor, collecting, using, and utilizing its service providers to process your biometric information to verify your identity, identify fraud, and conduct quality assurance for Persona’s platform in accordance with its Privacy Policy and OpenAI’s privacy policy. Your biometric information will be stored for no more than 1 year."

OK, so now, we've gone from "I guess I'll give OpenAI a few bucks for API access" to "I need to verify my organization" to "There's no way in hell I'm agreeing to provide biometric data to a 3rd party I've never heard of that's a 'partner' of the largest AI company and Worldcoin founder. How do I get my $20 back?"

leetrout · 9 months ago
I actually contacted the California AG to get a refund from another AI company after they failed to refund me.

The AG office followed up and I got my refund. Worth my time to file because we should stop letting companies get away with this stuff where they show up with more requirements after paying.

Separately they also do not need my phone number after having my name, address and credit card.

Has anyone got info on why they are taking everyone’s phone number?

jazzyjackson · 9 months ago
(having no insider info:) Because it can be used as a primary key ID across aggregated marketing databases including your voting history / party affiliation, income levels, personality and risk profiles etc etc etc. If a company wants to, and your data hygiene hasn't been tip top, your phone number is a pointer to a ton of intimate if not confidential data. Twitter was fined $150 million for asking for phone numbers under pretense of "protecting your account" or whatever but they actually used it for ad targeting.

>> Wednesday's 9th Circuit decision grew out of revelations that between 2013 and 2019, X mistakenly incorporated users' email addresses and phone numbers into an ad platform that allows companies to use their own marketing lists to target ads on the social platform.

>> In 2022, the Federal Trade Commission fined X $150 million over the privacy gaffe.

>> That same year, Washington resident Glen Morgan brought a class-action complaint against the company. He alleged that the ad-targeting glitch violated a Washington law prohibiting anyone from using “fraudulent, deceptive, or false means” to obtain telephone records of state residents.

>> X urged Dimke to dismiss Morgan's complaint for several reasons. Among other arguments, the company argued merely obtaining a user's phone number from him or her doesn't violate the state pretexting law, which refers to telephone “records.”

>> “If the legislature meant for 'telephone record' to include something as basic as the user’s own number, it surely would have said as much,” X argued in a written motion.

https://www.mediapost.com/publications/article/405501/None

giancarlostoro · 9 months ago
Thank you for this comment… a relative of mine spent a ton of money on an AI product that never came a license he cannot use. I told him to contact his states AG just in case.
pembrook · 9 months ago
Source: have dealt with fraud at scale before.

Phone number is the only way to reliably stop MOST abuse on a freemium product that doesn't require payment/identity verification upfront. You can easily block VOIP numbers and ensure the person connected to this number is paying for an actual phone plan, which cuts down dramatically on bogus accounts.

Hence why even Facebook requires a unique, non-VOIP phone number to create an account these days.

I'm sure this comment will get downvoted in favor of some other conspiratorial "because they're going to secretly sell my data!" tinfoil post (this is HN of course). But my explanation is the actual reason.

I would love if I could just use email to signup for free accounts everywhere still, but it's just too easily gamed at scale.

charliebwrites · 9 months ago
Doesn’t Sam Altman own a crypto currency company [1] that specifically collects biometric data to identify people?

Seems familiar…

[1] https://www.forbes.com/advisor/investing/cryptocurrency/what...

jjani · 9 months ago
GP did mention this :)

> I've never heard of that's a 'partner' of the largest AI company and Worldcoin founder

93po · 9 months ago
the core tech and premise doesnt collect biometric data, but biometric data is collected for training purposes with consent and compensation. There is endless misinformation (willfully and ignorantly) around worldcoin but it is not, at its core, a biometric collection company
ddtaylor · 9 months ago
I also am using OpenRouter because OpenAI isn't a great fit for me. I also stopped using OpenAI because they expire your API credits even if you don't use them. Yeah, it's only $10, but I'm not spending another dime with them.
numlocked · 9 months ago
Hi - I'm the COO of OpenRouter. In practice we don't expire the credits, but have to reserve the right to, or else we have a uncapped liability literally forever. Can't operate that way :) Everyone who issues credits on a platform has to have some way of expiring them. It's not a profit center for us, or part of our P&L; just a protection we have to have.
bonki · 9 months ago
I wonder if they do this everywhere, in certain jurisdictions this is illegal.
johnnyyyy · 9 months ago
then you shouldn’t use OpenRouter. ToS: 4.2 Credit Expiration; Auto Recharge OpenRouter reserves the right to expire unused credits three hundred sixty-five (365) days after purchase
cactusplant7374 · 9 months ago
That is so sleezy.
cedws · 9 months ago
After how long do they expire?
cess11 · 9 months ago
I suspect their data collection might not be legal in the EU.

https://withpersona.com/legal/privacy-policy

To me it looks like an extremely aggressive data pump.

wqaatwt · 9 months ago
There are stories about e.g. Hetzner requiring all sorts of data from people who want to open/verify accounts so perhaps not. Might just be an anti “money laundering” thing. Especially if the credit card company ends up refunding everything..
fakedang · 9 months ago
As someone not in the US, I do a straight nope out whenever I see a Persona request. I advise everyone else to do the same. Afaik, it's used by LinkedIn and Doordash too.
Marsymars · 9 months ago
Oh I also recently got locked out of my linkedin account until I supply data to Persona.

(So I’m remaining locked out of my linkedin account.)

csomar · 9 months ago
> How do I get my $20 back?

Contact support and ask for a refund. Then a charge back.

AstroBen · 9 months ago
KYC requirement + OpenAI preserving all logs in the same week?
jjani · 9 months ago
OpenAI introduced this with the public availability of o3, so no.

It's also the only LLM provider which has this.

What OpenAI has that the others don't is SamA's insatiable thirst for everyone's biometric data.

mycall · 9 months ago
I think KYC has been beaten by AI agents according to RepliBench [0] as obtaining compute requires KYC which has a high success rate in the graphic.

[0] https://www.aisi.gov.uk/work/replibench-measuring-autonomous...

infecto · 9 months ago
KYC has been around for a few months I believe. Whenever they released some of the additional thought logs you had to be verified.
baq · 9 months ago
Meanwhile the FSB and Mossad happily generate fake identities on demand.
romanovcode · 9 months ago
The whole point of identity verification is for the same Mossad to gather your complete profile and everything else they can from OpenAI.

Since Mossad and CIA is essentially one organization they already do it, 100%.

bratao · 9 months ago
You are even luck to be able to verify. Mine give me an error about "Session expired" for months!! Support do not reply.
conradev · 9 months ago
I was more excited by the process, like, there exists a model out there so powerful it requires KYC

which, after using it, fair! It found a zero day

__float · 9 months ago
I think they're probably more concerned about fake accounts and people finding ways to get free stuff.
hiatus · 9 months ago
> which, after using it, fair! It found a zero day

Source?

Dead Comment

shmoogy · 9 months ago
I was excited about trying o3 for my apps but I'm not doing this validation.. thanks for the heads up.
verisimi · 9 months ago
> OK, so now, we've gone from "I guess I'll give OpenAI a few bucks for API access" to "I need to verify my organization" to "There's no way in hell I'm agreeing to provide biometric data to a 3rd party I've never heard of that's a 'partner' of the largest AI company and Worldcoin founder. How do I get my $20 back?"

This should be illegal. How many are going to do the same as you, but then think that the effort/time/hassle they would waste to try to get their money back would not be worth it? At which point you've effectively donated money to a corp that implements anti-consumer anti-patterns.

jiggawatts · 9 months ago
This is in part "abuse prevention"[1] and in part marketing. Making customers feel like they're signing up to access state secrets makes the models seem more "special". Sama is well known to use these SV marketing tricks, like invite-only access, waiting lists, etc to psychologically manipulate users into thinking they're begging for entry to an exclusive club instead of just swiping a credit card to access an API.

Google tried this with Google Plus and Google Wave, failed spectacularly, and have ironically stopped with this idiotic "marketing by blocking potential users". I can access Gemini Pro 2.5 without providing a blood sample or signing parchment in triplicate.

[1] Not really though, because a significant percentage of OpenAI's revenue is from spammers and bulk-generation of SOE-optimised garbage. Those are valued customers!

miki123211 · 9 months ago
Gemini doesn't give you reasoning via API though, at least as far as I'm aware.
paulcole · 9 months ago
HN Don’t Hate Marketing Challenge

Difficulty: Impossible

sneak · 9 months ago
Yeah, same. I am a paying API customer but I am not doing biometric KYC to talk to a bot.
5Qn8mNbc2FNCiVV · 9 months ago
This feels eerily similar to a post I've read a within the last month. Either I'm having a deja vu, it's a coincidence that the same exact story is mentioned or theres something else going on
bgwalter · 9 months ago
What should be going on? A regular Google search for "openai persona verify organization" shows withpersona.com in the second search result.
Retric · 9 months ago
This is OpenAI’s fairly dystopian process, so the exact same thing happens to lots of people.
verisimi · 9 months ago
It's a concerted attempt to de-anonymise the internet. Corporate entities are jostling for position as id authorities.
xboxnolifes · 9 months ago
This is just the process for OpenAI. It's the same process I went through as well.
coderatlarge · 9 months ago
this reminds me of how broadcom maintains the “free” tier of vmware.
teruakohatu · 9 months ago
Can you explain? Is it not actually free?
belter · 9 months ago
With all this plus the saving of all chats they can't operate on the EU. But they do ....
predkambrij · 9 months ago
Interesting, it works for me through openrouter, without configured openai integration. Although, I have openai account and did verification with openai before. Conspiricy theory would say that they are exchanging PII so openrouter knows who am I :)
nateburke · 9 months ago
I don't recall Persona being in the mix last year when I signed up. Interesting development.
gwhr · 9 months ago
I think it was a recent thing [1], but I thought they were only considering it

[1] https://techcrunch.com/2025/04/13/access-to-future-ai-models...

righthand · 9 months ago
Thank you for reminding me why I’ll never touch these LLM services.
gloosx · 9 months ago
>ID Check

Just send them a random passport photo from the Internet, what's the deal? Probably they are just vibe-verifying the photo with "Is it legit passport?" prompt anyways.

_joel · 9 months ago
I think modern face verification has moved on, it's been video in all my encounters.
sneak · 9 months ago
It requires video and an app. They are collecting facial biometrics.
askl · 9 months ago
Crazy, I already gave up registering for chatgpt because they asked for my phone number. I'm not giving that to any random startup.
exceptione · 9 months ago
Welcome to tech dystopia.

Hello Human Resource, we have all your data, please upload your bio-metric identity, as well as your personal thoughts.

Building the next phase of a corporate totalitarian state, thank you for your cooperation.

34679 · 9 months ago
Follow-up. I pasted the above into ChatGPT because I genuinely wanted to know how to get my 20 bucks back. That request is in and they say I'll get a response in 2-3 days. I then asked it "What are the potential risks of agreeing to their terms?" Here's the response:

Agreeing to Persona’s terms, especially for biometric identity verification, involves both privacy and long-term data security risks. Here’s a clear breakdown of the main risks you should be aware of: 1. Biometric Data Collection

Risk: Biometric identifiers (like facial recognition, voiceprints, etc.) are extremely sensitive and irreplaceable if compromised.

    What they collect: Persona may collect a selfie, video, and metadata, and extract biometric templates from those for facial comparison and liveness detection.

    If leaked or abused: Unlike passwords, you can't change your face. A future data breach or misuse could lead to permanent identity compromise.

 2. Data Storage & Retention
Risk: Persona says biometric data is kept for up to one year, but:

    You’re relying on their internal policies, not a legal guarantee.

    There’s no technical detail on how securely it’s stored or whether it’s encrypted at rest.

 Worst-case scenario: Poorly secured biometric templates could be stolen, reused, or matched against other data sets by bad actors or governments.
 3. Third-Party Sharing and Surveillance Risks
Risk: Your biometric and ID data may be shared with subprocessors (partners/vendors) that you haven’t explicitly vetted.

    Persona may transfer your data to cloud providers (like AWS, GCP), verification specialists, or fraud prevention services.

    Depending on jurisdiction, data could be subject to subpoenas, surveillance laws, or government backdoors (especially in the U.S.).

 4. Consent Ambiguity & Future Use
Risk: The fine print often includes vague consent for "quality assurance", "model improvement", or "fraud detection".

    This opens the door to retraining algorithms on your biometric data—even if anonymized, that's still a use of your body as data.

    Their privacy policy may evolve, and new uses of your data could be added later unless you opt out (which may not always be possible).

Should You Agree?

Only if:

    You absolutely need the service that requires this verification.

    You’re aware of the privacy tradeoff and are okay with it.

    You trust that Persona and its partners won’t misuse your biometric data—even a year down the line.
If you’re uneasy about this, you’re not alone. Many developers and privacy advocates refuse to verify with biometrics for non-critical services, and companies like OpenAI are increasingly facing criticism for requiring this.

stevage · 9 months ago
Huh, I never considered using ChatGPT to read terms of serivce for me. Nice idea.

Dead Comment

_345 · 9 months ago
o3 is really powerful. I understand it tbh. They don't want scammers and abusers easily accessing it
sschueller · 9 months ago
Has anyone noticed that OpenAI has become "lazy"? When I ask questions now it will not give me a complete file or fix. Instead it tells me what I should do and I need to ask a second or third time to just do the thing I asked.

I don't see this happening with for example deepseek.

Is it possible they are saving on resources by having it answer that way?

tedsanders · 9 months ago
Yeah, our models are sometimes too lazy. It’s not intentional, and future models will be less lazy.

When I worked at Netflix I sometimes heard the same speculation about intentionally bad recommendations, which people theorized would lower streaming and increase profit margins. It made even less sense there as streaming costs are usually less than a penny. In reality, it’s just hard to make perfect products!

(I work at OpenAI.)

ukblewis · 9 months ago
Please be careful about the alternative. I’ve seen o3 doing excessive tool calls and research for relatively simple problems.
anshumankmr · 9 months ago
That was a problem in GPT 4 Turbo as well...
0x1ceb00da · 9 months ago
I think it's good. The model will probably make some mistake at first. Not doing the whole thing and just telling the user the direction it's going in gives us a chance to correct its mistakes.
TZubiri · 9 months ago
but maybe you are saying that because you are a CIA plant that's trying to make the product bad because of complex reasons.

takes tinfoil hat off

Oh, nvm, that makes sense.

thimabi · 9 months ago
Can you share what are the main challenges OpenAI has been facing in terms of increasing access to top-tier and non-lazy models?
TillE · 9 months ago
Had a fun experience the other day asking "make a graph of [X] vs [Y]" (some chemistry calculations), and the response was blah blah blah explain explain "let me know if you want a graph of this!" Yeah ok thanks for offering.
csomar · 9 months ago
I don't think that's laziness but maybe agent tuning.
mythz · 9 months ago
I've been turned off with OpenAI and have been actively avoiding using any of their models for a while, luckily this is easy to do given the quality of Sonnet 4 / Gemini Pro 2.5.

Although I've always wondered how OpenAI could get away with o3's astronomical pricing, what does o3 do better than any other model to justify their premium cost?

jstummbillig · 9 months ago
It's just a highly unoptimized space. There is very little market consolidation at this point, everyone is trying things out that lead to wildly different outcomes and processes and costs, even though in the end it's always just a bunch of utf-8 characters. o3 was probably just super expensive to run, and now, apparently, it's not anymore and can beat sonnet/opus 4 on pricing. It's fairly wild.
jsnider3 · 9 months ago
Very few customers pick the model based on cost, for many ChatGPT is the only one they know of.
hu3 · 9 months ago
> Very few customers pick the model based on cost.

What? 3 ou of 4 companies I consulted for that started using AI for coding marked cost as an important criteria. The 4th one has virtually infinite funding so they just don't care.

lvl155 · 9 months ago
Google has been catching up. Funny how fast this space is evolving. Just a few months ago, it was all about DeepSeek.
bitpush · 9 months ago
Many would say Google's Gemini models are SOTA, although Claude seems to be doing well with coding tasks.
snarf21 · 9 months ago
Gemini has been better than Claude for me on a coding project. Claude kept telling me it update some code but the update wasn't in the output. Like, I had to re-prompt just for updated output 5 times in a row.
ookdatnog · 9 months ago
If the competition boils down to who has access to the largest amount of high quality data, it's hard to see how anyone but Google could win in the end: through Google Books they have scans of tens of millions of books, and published books are the highest quality texts there are.
itake · 9 months ago
I've been learning vietnamese. Unfortunately, a lot of social media (reddit, fb, etc) has a new generation of language. The younger generation uses so much abbreviations and acronyms, ChatGPT and Google Translate can't keep up.

I think if you're goal is to have properly written langauge using older writing styles, then you're correct.

johan914 · 9 months ago
I have been using Google’s models the past couple months, and was surprised to see how sycophantic chatGPT is now. It’s not just at the start or end of responses, it’s interspaced within the markdown, with little substance. Asking it to change its style makes it overuse technical terms.
resource_waste · 9 months ago
Deepseek was exciting because you could download their model. They are seemingly 3rd place and have been since Gemini 2.5.
Squarex · 9 months ago
I would put them on the fourth after Google, OpenAI and Anthropic. Still the best open weight llm.
malshe · 9 months ago
I have observed that DeepSeek hallucinates a lot more than others for the same task. Anyone else experienced it?
behnamoh · 9 months ago
how do we know it's not a quantized version of o3? what's stopping these firms from announcing the full model to perform well on the benchmarks and then gradually quantizing it (first at Q8 so no one notices, then Q6, then Q4, ...).

I have a suspicion that's how they were able to get gpt-4-turbo so fast. In practice, I found it inferior to the original GPT-4 but the company probably benchmaxxed the hell out of the turbo and 4o versions so even though they were worse models, users found them more pleasing.

CSMastermind · 9 months ago
This is almost certainly what they're doing and rebranding the original o3 model as "o3-pro"
tedsanders · 9 months ago
Nope, not what we’re doing.

o3 is still o3 (no nerfing) and o3-pro is new and better than o3.

If we were lying about this, it would be really easy to catch us - just run evals.

(I work at OpenAI.)

mliker · 9 months ago
Where are you getting this information? What basis do you have for making this claim? OpenAI, despite its public drama, is still a massive brand and if this were exposed, would tank the company's reputation. I think making baseless claims like this is dangerous for HN
behnamoh · 9 months ago
> rebranding the original o3 model as "o3-pro"

interesting take, I wouldn't be surprised if they did that.

anticensor · 9 months ago
-pro models appear to be a best-of-10 sampling of the original full size model
lispisok · 9 months ago
I swear every time a new model is released it's great at first but then performance gets worse over time. I figured they were fine-tuning it to get rid of bad output which also nerfed the really good output. Now I'm wondering if they were quantizing it.
Tiberium · 9 months ago
I've heard lots of people say that, but no objective reproducible benchmarks confirm such a thing happening often. Could this simply be a case of novelty/excitement for a new model fading away as you learn more about its shortcomings?
JoshuaDavid · 9 months ago
I suspect what's happening is that lots of people have a collection of questions / private evals that they've been testing on every new model, and when a new model comes out it sometimes can answer a question that previous models couldn't. So that selects for questions where the new model is at the edge of its capabilities and probably got lucky. But when you come up with a new question, it's generally going to be on the level of the questions the new model is newly able to solve.

Like I suspect if there was a "new" model which was best-of-256 sampling of gpt-3.5-turbo that too would seem like a really exciting model for the first little bit after it came out, because it could probably solve a lot of problems current top models struggle with (which people would notice immediately) while failing to do lots of things that are a breeze for top models (which would take people a little bit to notice).

nabla9 · 9 months ago
It seems that least Google is overselling their compute capacity.

You pay monthly fee, but Gemini is completely jammed 5-6 hours when North America is working.

JamesBarney · 9 months ago
I'm pretty sure this is just a psychological phenomenon. When a new model is released all the capabilities the new model has that the old model lacks are very salient. This makes it seem amazing. Then you get used to the model, push it to the frontier, and suddenly the most salient memories of the new model are it's failures.

There are tons of benchmarks that don't show any regressions. Even small and unpublished ones rarely show regressions.

mhitza · 9 months ago
That was my suspicion when I first deleted my account, when it felt the output got worse in ChatGPT and I found highly suspicious when I saw an errand davinci model keyword in the chatgpt url.

Now I'm feeling similarly with their image generation (which is the only reason I created a paid account two months ago, and the output looks more generic by default).

beering · 9 months ago
It’s easy to measure the models getting worse, so you should be suspicious that nobody who claims this has scientific evidence to back it up.
solfox · 9 months ago
I have seen this behavior as well.

Dead Comment

tedsanders · 9 months ago
It's the same model, no quantization, no gimmicks.

In the API, we never make silent changes to models, as that would be super annoying to API developers [1]. In ChatGPT, it's a little less clear when we update models because we don't want to bombard regular users with version numbers in the UI, but it's still not totally silent/opaque - we document all model updates in the ChatGPT release notes [2].

[1] chatgpt-4o-latest is an exception; we explicitly update this model pointer without warning.

[2] ChatGPT Release Notes document our updates to gpt-4o and other models: https://help.openai.com/en/articles/6825453-chatgpt-release-...

(I work at OpenAI.)

carter-0 · 9 months ago
An OpenAI researcher claims it's the exact same model on X: https://x.com/aidan_mclau/status/1932507602216497608
ctoth · 9 months ago
From the announcement email:

> Today, we dropped the price of OpenAI o3 by 80%, bringing the cost down to $2 / 1M input tokens and $8 / 1M output tokens.

> We optimized our inference stack that serves o3—this is the same exact model, just cheaper.

hyperknot · 9 months ago
I got 700+ tokens/sec on o3 after the announcement, I suspect it's very much a quantized version.

https://x.com/hyperknot/status/1932476190608036243

dist-epoch · 9 months ago
Or maybe they just brought online much faster much cheaper hardware.
beering · 9 months ago
Do you also have numbers on intelligence before and after?
zackangelo · 9 months ago
Is that input tokens or output tokens/s?
ants_everywhere · 9 months ago
Is this what happened to Gemini 2.5 Pro? It used to be very good, but it's started struggling on basic tasks.

The thing that gets me is it seems to be lying about fetching a web page. It will say things are there that were never on any version of the page and it sometimes takes multiple screenshots of the page to convince it that it's wrong.

SparkyMcUnicorn · 9 months ago
The Aider discord community has proposed and disproven the theory that 2.5 Pro became worse, several times, through many benchmark runs.

It had a few bugs here or there when they pushed updates, but it didn't get worse.

code_biologist · 9 months ago
My use case is mostly creative writing.

IMO 2.5 Pro 03-25 was insanely good. I suspect it was also very expensive to run. The 05-06 release was a huge regression in quality, most people saying it was a better coder and a worse writer. They tested a few different variants and some were less bad then others, but overall it was painful to lose access to such a good model. The just released 06-05 version seems to be uniformly better than 05-06, with far fewer "wow this thing is dumb as a rock" failure modes, but it still is not as strong as the 03-25 release.

Entirely anecdotally, 06-05 seems to exactly ride the line of "good enough to be the best, but no better than that" presumably to save costs versus the OG 03-25.

In addition, Google is doing something notably different between what you get on AI Studio versus the Gemini site/app. Maybe a different system prompt. There have been a lot of anecdotal comparisons on /r/bard and I do think the AI Studio version is better.

esafak · 9 months ago
Are there any benchmarks that track historical performance?
behnamoh · 9 months ago
good question, and I don't know of any, although it's a no brainer that someone should make it.

a proxy to that may be the anecdotal evidence of users who report back in a month that model X has gotten dumber (started with gpt-4 and keeps happening, esp. with Anthro and OpenAI models). I haven't heard such anecdotal stories about Gemini, R1, etc.

SparkyMcUnicorn · 9 months ago
Aider has one, but it hasn't been updated in months. People kept claiming models were getting worse, but the results proved that they weren't.
benterix · 9 months ago
> users found them more pleasing.

Some users. For me the drop was so huge it became almost unusable for the things I had used it for.

behnamoh · 9 months ago
Same here. One of my apps straight out stopped working because the gpt-4o outputs were noticeably worse than the gpt-4 that I built the app based on.
risho · 9 months ago
Quantization is a massive efficiency gain for near negligible drop in quality. If the tradeoff is quantization for an 80 percent price drop I would take that any day of the week.
behnamoh · 9 months ago
> for near negligible drop in quality

Hmm, that's evidently and anecdotally wrong:

https://github.com/ggml-org/llama.cpp/discussions/4110

spiderice · 9 months ago
You may be right that the tradeoff is worth it, but it should be advertised as such. You shouldn't think you're paying for full o3, even if they're heavily discounting it.
code_biologist · 9 months ago
I would like the option to pay for the unquantized version. For creative or story writing (D&D campaign materials and such) quantization seems to end up in much weaker word selection and phrasing. There are small semantic missteps that break the illusion the LLM understands what it's writing. I find it jarring and deeply immersion breaking. I'd prefer prototype prompts on a cheaper quantized version, but I want to be able to spend 50 cents an API call to get golden output.
EnPissant · 9 months ago
The API lists o3 and o3-2025-04-16 as the same thing with the same price. The date based models are set in stone.
az226 · 9 months ago
Even classic GPT-4 from March 2023 was quantized to 4.5 bits.
Bjorkbat · 9 months ago
Related, when o3 finally came out ARC-AGI updated their graph because it didn’t perform nearly as well as the version of o3 that “beat” the benchmark.

https://arcprize.org/blog/analyzing-o3-with-arc-agi

beering · 9 months ago
The o3-preview test was with very expensive amounts of compute, right? I remember it was north of $10k so makes sense it did better
rfoo · 9 months ago
I don't work for OAI so obviously I can't say for them. But we don't do this.

We don't make hobbyist mistakes of randomly YOLO trying various "quantization" methods that only happen after all training and claim it a day, at all. Quantization was done before it went live.

Deleted Comment

smusamashah · 9 months ago
Hw about testing same input vs output with same seed on different dates. If its a different model it will return different output.
zomnoys · 9 months ago
Isn’t this not true since these models run with a non-zero temperature?
resters · 9 months ago
It's probably optimized in some way, but if the optimizations degrade performance, let's hope it is reflected in various benchmarks. One alternative hypothesis is that it's the same model, but in the early days they make it think "harder" and run a meta-process to collect training data for reinforcement learning for use on future models.
SparkyMcUnicorn · 9 months ago
It's a bit dated now, but it would be cool if people submitted PRs for this one: https://aider.chat/docs/leaderboards/by-release-date.html
luke-stanley · 9 months ago
I think the API has some special IDs to check for reproducibility of the environment.
jstummbillig · 9 months ago
You can just give it a go for very little money (in Windsurf it's 1x right now), and see what it does. There is no room for conspiracy here, because you can simple look at what it does. If you don't like it, so won't others, and then people will not use it. People are obviously very capable of (collectively) forming opinions on models, and then vote with their wallet.

Deleted Comment

segmondy · 9 months ago
you don't, so run your own model.
BeetleB · 9 months ago
Why does OpenAI require me to verify my "organization" (which requires my state issued ID) to use o3?
bearjaws · 9 months ago
Prevent Deepseek R2 being trained on it
piskov · 9 months ago
If only there were people with multiple passports or, I don’t know, Kyrgyzstan.

How exactly will passport check prevent any training?

At most this will block API access to your average Ivan, not a state actor

lxgr · 9 months ago
Is there also a corresponding increase in weekly messages for ChatGPT Plus users with o3?

In my experience, o4-mini and o4-mini-high are far behind o3 in utility, but since I’m rate-limited for the latter, I end up primarily using the former, which has kind of reinforced the perception that OpenAI’s thinking models are behind the competition altogether.

el_benhameen · 9 months ago
My usage has also reflected the pretty heavy rate limits on o3. I find o4-mini-high to be quite good, but I agree that I would much rather use o3. Hoping this means an increase in the limits.
sagarpatil · 9 months ago
Before: 50 messages per week Now: 100 messages per week
lxgr · 9 months ago
That’s already been the case for a few weeks though, right? and it’s up from 50, whereas a price reduction by 80% would correspond to 5x the quota extrapolating linearly.
mrcwinn · 9 months ago
Only at HN can the reaction to an 80% price drop be a wall of criticism.
alternatex · 9 months ago
"80% price drop" is just a title. The wall of criticism is for the fine-print.
xboxnolifes · 9 months ago
The wall of criticism is all wild speculation, not fine print.
beering · 9 months ago
What in the fine print are we criticising? Most of the negative comments make no reference to any fine print on their website.

Deleted Comment