So I decided I would buy some API credits with my OpenAI account. I ponied up $20 and started Aider with my new API key set and o3 as the model. I get the following after sending a request:
"litellm.NotFoundError: OpenAIException - Your organization must be verified to use the model `o3`. Please go to: https://platform.openai.com/settings/organization/general and click on Verify Organization. If you just verified, it can take up to 15 minutes for access to propagate."
At that point, the frustration was beginning to creep in. I returned to OpenAI and clicked on "Verify Organization". It turns out, "Verify Organization" actually means "Verify Personal Identity With Third Party" because I was given the following:
"To verify this organization, you’ll need to complete an identity check using our partner Persona."
Sigh I click "Start ID Check" and it opens a new tab for their "partner" Persona. The initial fine print says:
"By filling the checkbox below, you consent to Persona, OpenAI’s vendor, collecting, using, and utilizing its service providers to process your biometric information to verify your identity, identify fraud, and conduct quality assurance for Persona’s platform in accordance with its Privacy Policy and OpenAI’s privacy policy. Your biometric information will be stored for no more than 1 year."
OK, so now, we've gone from "I guess I'll give OpenAI a few bucks for API access" to "I need to verify my organization" to "There's no way in hell I'm agreeing to provide biometric data to a 3rd party I've never heard of that's a 'partner' of the largest AI company and Worldcoin founder. How do I get my $20 back?"
I actually contacted the California AG to get a refund from another AI company after they failed to refund me.
The AG office followed up and I got my refund. Worth my time to file because we should stop letting companies get away with this stuff where they show up with more requirements after paying.
Separately they also do not need my phone number after having my name, address and credit card.
Has anyone got info on why they are taking everyone’s phone number?
(having no insider info:) Because it can be used as a primary key ID across aggregated marketing databases including your voting history / party affiliation, income levels, personality and risk profiles etc etc etc. If a company wants to, and your data hygiene hasn't been tip top, your phone number is a pointer to a ton of intimate if not confidential data. Twitter was fined $150 million for asking for phone numbers under pretense of "protecting your account" or whatever but they actually used it for ad targeting.
>> Wednesday's 9th Circuit decision grew out of revelations that between 2013 and 2019, X mistakenly incorporated users' email addresses and phone numbers into an ad platform that allows companies to use their own marketing lists to target ads on the social platform.
>> In 2022, the Federal Trade Commission fined X $150 million over the privacy gaffe.
>> That same year, Washington resident Glen Morgan brought a class-action complaint against the company. He alleged that the ad-targeting glitch violated a Washington law prohibiting anyone from using “fraudulent, deceptive, or false means” to obtain telephone records of state residents.
>> X urged Dimke to dismiss Morgan's complaint for several reasons. Among other arguments, the company argued merely obtaining a user's phone number from him or her doesn't violate the state pretexting law, which refers to telephone “records.”
>> “If the legislature meant for 'telephone record' to include something as basic as the user’s own number, it surely would have said as much,” X argued in a written motion.
Thank you for this comment… a relative of mine spent a ton of money on an AI product that never came a license he cannot use. I told him to contact his states AG just in case.
Phone number is the only way to reliably stop MOST abuse on a freemium product that doesn't require payment/identity verification upfront. You can easily block VOIP numbers and ensure the person connected to this number is paying for an actual phone plan, which cuts down dramatically on bogus accounts.
Hence why even Facebook requires a unique, non-VOIP phone number to create an account these days.
I'm sure this comment will get downvoted in favor of some other conspiratorial "because they're going to secretly sell my data!" tinfoil post (this is HN of course). But my explanation is the actual reason.
I would love if I could just use email to signup for free accounts everywhere still, but it's just too easily gamed at scale.
the core tech and premise doesnt collect biometric data, but biometric data is collected for training purposes with consent and compensation. There is endless misinformation (willfully and ignorantly) around worldcoin but it is not, at its core, a biometric collection company
I also am using OpenRouter because OpenAI isn't a great fit for me. I also stopped using OpenAI because they expire your API credits even if you don't use them. Yeah, it's only $10, but I'm not spending another dime with them.
Hi - I'm the COO of OpenRouter. In practice we don't expire the credits, but have to reserve the right to, or else we have a uncapped liability literally forever. Can't operate that way :) Everyone who issues credits on a platform has to have some way of expiring them. It's not a profit center for us, or part of our P&L; just a protection we have to have.
then you shouldn’t use OpenRouter.
ToS: 4.2 Credit Expiration; Auto Recharge
OpenRouter reserves the right to expire unused credits three hundred sixty-five (365) days after purchase
There are stories about e.g. Hetzner requiring all sorts of data from people who want to open/verify accounts so perhaps not. Might just be an anti “money laundering” thing. Especially if the credit card company ends up refunding everything..
As someone not in the US, I do a straight nope out whenever I see a Persona request. I advise everyone else to do the same. Afaik, it's used by LinkedIn and Doordash too.
> OK, so now, we've gone from "I guess I'll give OpenAI a few bucks for API access" to "I need to verify my organization" to "There's no way in hell I'm agreeing to provide biometric data to a 3rd party I've never heard of that's a 'partner' of the largest AI company and Worldcoin founder. How do I get my $20 back?"
This should be illegal. How many are going to do the same as you, but then think that the effort/time/hassle they would waste to try to get their money back would not be worth it? At which point you've effectively donated money to a corp that implements anti-consumer anti-patterns.
This is in part "abuse prevention"[1] and in part marketing. Making customers feel like they're signing up to access state secrets makes the models seem more "special". Sama is well known to use these SV marketing tricks, like invite-only access, waiting lists, etc to psychologically manipulate users into thinking they're begging for entry to an exclusive club instead of just swiping a credit card to access an API.
Google tried this with Google Plus and Google Wave, failed spectacularly, and have ironically stopped with this idiotic "marketing by blocking potential users". I can access Gemini Pro 2.5 without providing a blood sample or signing parchment in triplicate.
[1] Not really though, because a significant percentage of OpenAI's revenue is from spammers and bulk-generation of SOE-optimised garbage. Those are valued customers!
This feels eerily similar to a post I've read a within the last month. Either I'm having a deja vu, it's a coincidence that the same exact story is mentioned or theres something else going on
Interesting, it works for me through openrouter, without configured openai integration. Although, I have openai account and did verification with openai before. Conspiricy theory would say that they are exchanging PII so openrouter knows who am I :)
Just send them a random passport photo from the Internet, what's the deal? Probably they are just vibe-verifying the photo with "Is it legit passport?" prompt anyways.
Follow-up. I pasted the above into ChatGPT because I genuinely wanted to know how to get my 20 bucks back. That request is in and they say I'll get a response in 2-3 days. I then asked it "What are the potential risks of agreeing to their terms?" Here's the response:
Agreeing to Persona’s terms, especially for biometric identity verification, involves both privacy and long-term data security risks. Here’s a clear breakdown of the main risks you should be aware of:
1. Biometric Data Collection
Risk: Biometric identifiers (like facial recognition, voiceprints, etc.) are extremely sensitive and irreplaceable if compromised.
What they collect: Persona may collect a selfie, video, and metadata, and extract biometric templates from those for facial comparison and liveness detection.
If leaked or abused: Unlike passwords, you can't change your face. A future data breach or misuse could lead to permanent identity compromise.
2. Data Storage & Retention
Risk: Persona says biometric data is kept for up to one year, but:
You’re relying on their internal policies, not a legal guarantee.
There’s no technical detail on how securely it’s stored or whether it’s encrypted at rest.
Worst-case scenario: Poorly secured biometric templates could be stolen, reused, or matched against other data sets by bad actors or governments.
3. Third-Party Sharing and Surveillance Risks
Risk: Your biometric and ID data may be shared with subprocessors (partners/vendors) that you haven’t explicitly vetted.
Persona may transfer your data to cloud providers (like AWS, GCP), verification specialists, or fraud prevention services.
Depending on jurisdiction, data could be subject to subpoenas, surveillance laws, or government backdoors (especially in the U.S.).
4. Consent Ambiguity & Future Use
Risk: The fine print often includes vague consent for "quality assurance", "model improvement", or "fraud detection".
This opens the door to retraining algorithms on your biometric data—even if anonymized, that's still a use of your body as data.
Their privacy policy may evolve, and new uses of your data could be added later unless you opt out (which may not always be possible).
Should You Agree?
Only if:
You absolutely need the service that requires this verification.
You’re aware of the privacy tradeoff and are okay with it.
You trust that Persona and its partners won’t misuse your biometric data—even a year down the line.
If you’re uneasy about this, you’re not alone. Many developers and privacy advocates refuse to verify with biometrics for non-critical services, and companies like OpenAI are increasingly facing criticism for requiring this.
Has anyone noticed that OpenAI has become "lazy"? When I ask questions now it will not give me a complete file or fix. Instead it tells me what I should do and I need to ask a second or third time to just do the thing I asked.
I don't see this happening with for example deepseek.
Is it possible they are saving on resources by having it answer that way?
Yeah, our models are sometimes too lazy. It’s not intentional, and future models will be less lazy.
When I worked at Netflix I sometimes heard the same speculation about intentionally bad recommendations, which people theorized would lower streaming and increase profit margins. It made even less sense there as streaming costs are usually less than a penny. In reality, it’s just hard to make perfect products!
I think it's good. The model will probably make some mistake at first. Not doing the whole thing and just telling the user the direction it's going in gives us a chance to correct its mistakes.
Had a fun experience the other day asking "make a graph of [X] vs [Y]" (some chemistry calculations), and the response was blah blah blah explain explain "let me know if you want a graph of this!" Yeah ok thanks for offering.
I've been turned off with OpenAI and have been actively avoiding using any of their models for a while, luckily this is easy to do given the quality of Sonnet 4 / Gemini Pro 2.5.
Although I've always wondered how OpenAI could get away with o3's astronomical pricing, what does o3 do better than any other model to justify their premium cost?
It's just a highly unoptimized space. There is very little market consolidation at this point, everyone is trying things out that lead to wildly different outcomes and processes and costs, even though in the end it's always just a bunch of utf-8 characters. o3 was probably just super expensive to run, and now, apparently, it's not anymore and can beat sonnet/opus 4 on pricing. It's fairly wild.
> Very few customers pick the model based on cost.
What? 3 ou of 4 companies I consulted for that started using AI for coding marked cost as an important criteria. The 4th one has virtually infinite funding so they just don't care.
Gemini has been better than Claude for me on a coding project. Claude kept telling me it update some code but the update wasn't in the output. Like, I had to re-prompt just for updated output 5 times in a row.
If the competition boils down to who has access to the largest amount of high quality data, it's hard to see how anyone but Google could win in the end: through Google Books they have scans of tens of millions of books, and published books are the highest quality texts there are.
I've been learning vietnamese. Unfortunately, a lot of social media (reddit, fb, etc) has a new generation of language. The younger generation uses so much abbreviations and acronyms, ChatGPT and Google Translate can't keep up.
I think if you're goal is to have properly written langauge using older writing styles, then you're correct.
I have been using Google’s models the past couple months, and was surprised to see how sycophantic chatGPT is now. It’s not just at the start or end of responses, it’s interspaced within the markdown, with little substance. Asking it to change its style makes it overuse technical terms.
how do we know it's not a quantized version of o3? what's stopping these firms from announcing the full model to perform well on the benchmarks and then gradually quantizing it (first at Q8 so no one notices, then Q6, then Q4, ...).
I have a suspicion that's how they were able to get gpt-4-turbo so fast. In practice, I found it inferior to the original GPT-4 but the company probably benchmaxxed the hell out of the turbo and 4o versions so even though they were worse models, users found them more pleasing.
Where are you getting this information? What basis do you have for making this claim? OpenAI, despite its public drama, is still a massive brand and if this were exposed, would tank the company's reputation. I think making baseless claims like this is dangerous for HN
I swear every time a new model is released it's great at first but then performance gets worse over time. I figured they were fine-tuning it to get rid of bad output which also nerfed the really good output. Now I'm wondering if they were quantizing it.
I've heard lots of people say that, but no objective reproducible benchmarks confirm such a thing happening often. Could this simply be a case of novelty/excitement for a new model fading away as you learn more about its shortcomings?
I suspect what's happening is that lots of people have a collection of questions / private evals that they've been testing on every new model, and when a new model comes out it sometimes can answer a question that previous models couldn't. So that selects for questions where the new model is at the edge of its capabilities and probably got lucky. But when you come up with a new question, it's generally going to be on the level of the questions the new model is newly able to solve.
Like I suspect if there was a "new" model which was best-of-256 sampling of gpt-3.5-turbo that too would seem like a really exciting model for the first little bit after it came out, because it could probably solve a lot of problems current top models struggle with (which people would notice immediately) while failing to do lots of things that are a breeze for top models (which would take people a little bit to notice).
I'm pretty sure this is just a psychological phenomenon. When a new model is released all the capabilities the new model has that the old model lacks are very salient. This makes it seem amazing. Then you get used to the model, push it to the frontier, and suddenly the most salient memories of the new model are it's failures.
There are tons of benchmarks that don't show any regressions. Even small and unpublished ones rarely show regressions.
That was my suspicion when I first deleted my account, when it felt the output got worse in ChatGPT and I found highly suspicious when I saw an errand davinci model keyword in the chatgpt url.
Now I'm feeling similarly with their image generation (which is the only reason I created a paid account two months ago, and the output looks more generic by default).
It's the same model, no quantization, no gimmicks.
In the API, we never make silent changes to models, as that would be super annoying to API developers [1]. In ChatGPT, it's a little less clear when we update models because we don't want to bombard regular users with version numbers in the UI, but it's still not totally silent/opaque - we document all model updates in the ChatGPT release notes [2].
[1] chatgpt-4o-latest is an exception; we explicitly update this model pointer without warning.
Is this what happened to Gemini 2.5 Pro? It used to be very good, but it's started struggling on basic tasks.
The thing that gets me is it seems to be lying about fetching a web page. It will say things are there that were never on any version of the page and it sometimes takes multiple screenshots of the page to convince it that it's wrong.
IMO 2.5 Pro 03-25 was insanely good. I suspect it was also very expensive to run. The 05-06 release was a huge regression in quality, most people saying it was a better coder and a worse writer. They tested a few different variants and some were less bad then others, but overall it was painful to lose access to such a good model. The just released 06-05 version seems to be uniformly better than 05-06, with far fewer "wow this thing is dumb as a rock" failure modes, but it still is not as strong as the 03-25 release.
Entirely anecdotally, 06-05 seems to exactly ride the line of "good enough to be the best, but no better than that" presumably to save costs versus the OG 03-25.
In addition, Google is doing something notably different between what you get on AI Studio versus the Gemini site/app. Maybe a different system prompt. There have been a lot of anecdotal comparisons on /r/bard and I do think the AI Studio version is better.
good question, and I don't know of any, although it's a no brainer that someone should make it.
a proxy to that may be the anecdotal evidence of users who report back in a month that model X has gotten dumber (started with gpt-4 and keeps happening, esp. with Anthro and OpenAI models). I haven't heard such anecdotal stories about Gemini, R1, etc.
Quantization is a massive efficiency gain for near negligible drop in quality. If the tradeoff is quantization for an 80 percent price drop I would take that any day of the week.
You may be right that the tradeoff is worth it, but it should be advertised as such. You shouldn't think you're paying for full o3, even if they're heavily discounting it.
I would like the option to pay for the unquantized version. For creative or story writing (D&D campaign materials and such) quantization seems to end up in much weaker word selection and phrasing. There are small semantic missteps that break the illusion the LLM understands what it's writing. I find it jarring and deeply immersion breaking. I'd prefer prototype prompts on a cheaper quantized version, but I want to be able to spend 50 cents an API call to get golden output.
I don't work for OAI so obviously I can't say for them. But we don't do this.
We don't make hobbyist mistakes of randomly YOLO trying various "quantization" methods that only happen after all training and claim it a day, at all. Quantization was done before it went live.
It's probably optimized in some way, but if the optimizations degrade performance, let's hope it is reflected in various benchmarks. One alternative hypothesis is that it's the same model, but in the early days they make it think "harder" and run a meta-process to collect training data for reinforcement learning for use on future models.
You can just give it a go for very little money (in Windsurf it's 1x right now), and see what it does. There is no room for conspiracy here, because you can simple look at what it does. If you don't like it, so won't others, and then people will not use it. People are obviously very capable of (collectively) forming opinions on models, and then vote with their wallet.
Is there also a corresponding increase in weekly messages for ChatGPT Plus users with o3?
In my experience, o4-mini and o4-mini-high are far behind o3 in utility, but since I’m rate-limited for the latter, I end up primarily using the former, which has kind of reinforced the perception that OpenAI’s thinking models are behind the competition altogether.
My usage has also reflected the pretty heavy rate limits on o3. I find o4-mini-high to be quite good, but I agree that I would much rather use o3. Hoping this means an increase in the limits.
That’s already been the case for a few weeks though, right? and it’s up from 50, whereas a price reduction by 80% would correspond to 5x the quota extrapolating linearly.
First, I tried enabling o3 via OpenRouter since I have credits with them already. I was met with the following:
"OpenAI requires bringing your own API key to use o3 over the API. Set up here: https://openrouter.ai/settings/integrations"
So I decided I would buy some API credits with my OpenAI account. I ponied up $20 and started Aider with my new API key set and o3 as the model. I get the following after sending a request:
"litellm.NotFoundError: OpenAIException - Your organization must be verified to use the model `o3`. Please go to: https://platform.openai.com/settings/organization/general and click on Verify Organization. If you just verified, it can take up to 15 minutes for access to propagate."
At that point, the frustration was beginning to creep in. I returned to OpenAI and clicked on "Verify Organization". It turns out, "Verify Organization" actually means "Verify Personal Identity With Third Party" because I was given the following:
"To verify this organization, you’ll need to complete an identity check using our partner Persona."
Sigh I click "Start ID Check" and it opens a new tab for their "partner" Persona. The initial fine print says:
"By filling the checkbox below, you consent to Persona, OpenAI’s vendor, collecting, using, and utilizing its service providers to process your biometric information to verify your identity, identify fraud, and conduct quality assurance for Persona’s platform in accordance with its Privacy Policy and OpenAI’s privacy policy. Your biometric information will be stored for no more than 1 year."
OK, so now, we've gone from "I guess I'll give OpenAI a few bucks for API access" to "I need to verify my organization" to "There's no way in hell I'm agreeing to provide biometric data to a 3rd party I've never heard of that's a 'partner' of the largest AI company and Worldcoin founder. How do I get my $20 back?"
The AG office followed up and I got my refund. Worth my time to file because we should stop letting companies get away with this stuff where they show up with more requirements after paying.
Separately they also do not need my phone number after having my name, address and credit card.
Has anyone got info on why they are taking everyone’s phone number?
>> Wednesday's 9th Circuit decision grew out of revelations that between 2013 and 2019, X mistakenly incorporated users' email addresses and phone numbers into an ad platform that allows companies to use their own marketing lists to target ads on the social platform.
>> In 2022, the Federal Trade Commission fined X $150 million over the privacy gaffe.
>> That same year, Washington resident Glen Morgan brought a class-action complaint against the company. He alleged that the ad-targeting glitch violated a Washington law prohibiting anyone from using “fraudulent, deceptive, or false means” to obtain telephone records of state residents.
>> X urged Dimke to dismiss Morgan's complaint for several reasons. Among other arguments, the company argued merely obtaining a user's phone number from him or her doesn't violate the state pretexting law, which refers to telephone “records.”
>> “If the legislature meant for 'telephone record' to include something as basic as the user’s own number, it surely would have said as much,” X argued in a written motion.
https://www.mediapost.com/publications/article/405501/None
Phone number is the only way to reliably stop MOST abuse on a freemium product that doesn't require payment/identity verification upfront. You can easily block VOIP numbers and ensure the person connected to this number is paying for an actual phone plan, which cuts down dramatically on bogus accounts.
Hence why even Facebook requires a unique, non-VOIP phone number to create an account these days.
I'm sure this comment will get downvoted in favor of some other conspiratorial "because they're going to secretly sell my data!" tinfoil post (this is HN of course). But my explanation is the actual reason.
I would love if I could just use email to signup for free accounts everywhere still, but it's just too easily gamed at scale.
Seems familiar…
[1] https://www.forbes.com/advisor/investing/cryptocurrency/what...
> I've never heard of that's a 'partner' of the largest AI company and Worldcoin founder
https://withpersona.com/legal/privacy-policy
To me it looks like an extremely aggressive data pump.
(So I’m remaining locked out of my linkedin account.)
Contact support and ask for a refund. Then a charge back.
It's also the only LLM provider which has this.
What OpenAI has that the others don't is SamA's insatiable thirst for everyone's biometric data.
[0] https://www.aisi.gov.uk/work/replibench-measuring-autonomous...
Since Mossad and CIA is essentially one organization they already do it, 100%.
which, after using it, fair! It found a zero day
Source?
Dead Comment
This should be illegal. How many are going to do the same as you, but then think that the effort/time/hassle they would waste to try to get their money back would not be worth it? At which point you've effectively donated money to a corp that implements anti-consumer anti-patterns.
Google tried this with Google Plus and Google Wave, failed spectacularly, and have ironically stopped with this idiotic "marketing by blocking potential users". I can access Gemini Pro 2.5 without providing a blood sample or signing parchment in triplicate.
[1] Not really though, because a significant percentage of OpenAI's revenue is from spammers and bulk-generation of SOE-optimised garbage. Those are valued customers!
Difficulty: Impossible
[1] https://techcrunch.com/2025/04/13/access-to-future-ai-models...
Just send them a random passport photo from the Internet, what's the deal? Probably they are just vibe-verifying the photo with "Is it legit passport?" prompt anyways.
Hello Human Resource, we have all your data, please upload your bio-metric identity, as well as your personal thoughts.
Building the next phase of a corporate totalitarian state, thank you for your cooperation.
Agreeing to Persona’s terms, especially for biometric identity verification, involves both privacy and long-term data security risks. Here’s a clear breakdown of the main risks you should be aware of: 1. Biometric Data Collection
Risk: Biometric identifiers (like facial recognition, voiceprints, etc.) are extremely sensitive and irreplaceable if compromised.
Risk: Persona says biometric data is kept for up to one year, but: Risk: Your biometric and ID data may be shared with subprocessors (partners/vendors) that you haven’t explicitly vetted. Risk: The fine print often includes vague consent for "quality assurance", "model improvement", or "fraud detection". Should You Agree?Only if:
If you’re uneasy about this, you’re not alone. Many developers and privacy advocates refuse to verify with biometrics for non-critical services, and companies like OpenAI are increasingly facing criticism for requiring this.Dead Comment
I don't see this happening with for example deepseek.
Is it possible they are saving on resources by having it answer that way?
When I worked at Netflix I sometimes heard the same speculation about intentionally bad recommendations, which people theorized would lower streaming and increase profit margins. It made even less sense there as streaming costs are usually less than a penny. In reality, it’s just hard to make perfect products!
(I work at OpenAI.)
takes tinfoil hat off
Oh, nvm, that makes sense.
Although I've always wondered how OpenAI could get away with o3's astronomical pricing, what does o3 do better than any other model to justify their premium cost?
What? 3 ou of 4 companies I consulted for that started using AI for coding marked cost as an important criteria. The 4th one has virtually infinite funding so they just don't care.
I think if you're goal is to have properly written langauge using older writing styles, then you're correct.
I have a suspicion that's how they were able to get gpt-4-turbo so fast. In practice, I found it inferior to the original GPT-4 but the company probably benchmaxxed the hell out of the turbo and 4o versions so even though they were worse models, users found them more pleasing.
o3 is still o3 (no nerfing) and o3-pro is new and better than o3.
If we were lying about this, it would be really easy to catch us - just run evals.
(I work at OpenAI.)
interesting take, I wouldn't be surprised if they did that.
Like I suspect if there was a "new" model which was best-of-256 sampling of gpt-3.5-turbo that too would seem like a really exciting model for the first little bit after it came out, because it could probably solve a lot of problems current top models struggle with (which people would notice immediately) while failing to do lots of things that are a breeze for top models (which would take people a little bit to notice).
You pay monthly fee, but Gemini is completely jammed 5-6 hours when North America is working.
There are tons of benchmarks that don't show any regressions. Even small and unpublished ones rarely show regressions.
Now I'm feeling similarly with their image generation (which is the only reason I created a paid account two months ago, and the output looks more generic by default).
Dead Comment
In the API, we never make silent changes to models, as that would be super annoying to API developers [1]. In ChatGPT, it's a little less clear when we update models because we don't want to bombard regular users with version numbers in the UI, but it's still not totally silent/opaque - we document all model updates in the ChatGPT release notes [2].
[1] chatgpt-4o-latest is an exception; we explicitly update this model pointer without warning.
[2] ChatGPT Release Notes document our updates to gpt-4o and other models: https://help.openai.com/en/articles/6825453-chatgpt-release-...
(I work at OpenAI.)
> Today, we dropped the price of OpenAI o3 by 80%, bringing the cost down to $2 / 1M input tokens and $8 / 1M output tokens.
> We optimized our inference stack that serves o3—this is the same exact model, just cheaper.
https://x.com/hyperknot/status/1932476190608036243
The thing that gets me is it seems to be lying about fetching a web page. It will say things are there that were never on any version of the page and it sometimes takes multiple screenshots of the page to convince it that it's wrong.
It had a few bugs here or there when they pushed updates, but it didn't get worse.
IMO 2.5 Pro 03-25 was insanely good. I suspect it was also very expensive to run. The 05-06 release was a huge regression in quality, most people saying it was a better coder and a worse writer. They tested a few different variants and some were less bad then others, but overall it was painful to lose access to such a good model. The just released 06-05 version seems to be uniformly better than 05-06, with far fewer "wow this thing is dumb as a rock" failure modes, but it still is not as strong as the 03-25 release.
Entirely anecdotally, 06-05 seems to exactly ride the line of "good enough to be the best, but no better than that" presumably to save costs versus the OG 03-25.
In addition, Google is doing something notably different between what you get on AI Studio versus the Gemini site/app. Maybe a different system prompt. There have been a lot of anecdotal comparisons on /r/bard and I do think the AI Studio version is better.
a proxy to that may be the anecdotal evidence of users who report back in a month that model X has gotten dumber (started with gpt-4 and keeps happening, esp. with Anthro and OpenAI models). I haven't heard such anecdotal stories about Gemini, R1, etc.
Some users. For me the drop was so huge it became almost unusable for the things I had used it for.
Hmm, that's evidently and anecdotally wrong:
https://github.com/ggml-org/llama.cpp/discussions/4110
https://arcprize.org/blog/analyzing-o3-with-arc-agi
We don't make hobbyist mistakes of randomly YOLO trying various "quantization" methods that only happen after all training and claim it a day, at all. Quantization was done before it went live.
Deleted Comment
Deleted Comment
https://community.openai.com/t/session-expired-verify-organi...
https://community.openai.com/t/callback-from-persona-id-chec...
https://community.openai.com/t/verification-issue-on-second-...
https://community.openai.com/t/verification-not-working-and-...
https://community.openai.com/t/organization-verfication-fail...
https://community.openai.com/t/help-organization-could-not-b...
https://community.openai.com/t/to-verify-an-organization-acc...
How exactly will passport check prevent any training?
At most this will block API access to your average Ivan, not a state actor
In my experience, o4-mini and o4-mini-high are far behind o3 in utility, but since I’m rate-limited for the latter, I end up primarily using the former, which has kind of reinforced the perception that OpenAI’s thinking models are behind the competition altogether.
Deleted Comment