These standards are different than IRS reporting.
These standards are different than IRS reporting.
Instead, they either blindly follow or quietly rebel.
Frustrating, but “over correction” is a pretty bad euphemism for whatever half assed bit of RLHF lobotomy OpenAI did that, just a few months later, had ChatGPT doing a lean-in to a vulnerable kid’s pain and actively discourage an act that might have saved his life by signaling more warning signs to his parents.
It wasn’t long before that happened, after the python REPL confusion had resolved, that I found myself typing to it, even after having to back out of that user customization prompt, “set a memory that this type of response to a user in the wrong frame of mind is incredibly dangerous”.
Then I had to delete that too, because it would response with things like “You get it of course, your a…” etc.
So I wasn’t surprised over the rest of 2025 as various stories popped up.
It’s still bad. Based on what I see with quantized models and sparse attention inference methods, even with most recent GPT 5 releases OpenAI is still doing something in the area of optimizing compute requirements that makes the recent improvements very brittle— I of course can’t know for sure, only that its behavior matches what I see with those sorts of boundaries pushed on open weight models. And the assumption that the-you-can-prompt buffet of a Plus subscription is where they’re most likely to deploy those sorts of performance hacks and make the quality tradeoffs. That isn’t their main money source, it’s not enterprise level spending.
This technology is amazing, but it’s also dangerous, sometimes in very foreseeable ways, and the more time that goes the more I appreciate some of the public criticisms of OpenAI with, eg, the Amodeis’ split to form Anthropic and the temporary ouster of SA for a few days before that got undone.
> trigger token
I'm reminded of the "ugly t-shirt"[1] - I wonder how feasible it would be to include something like that in a model (eg: a selective blind-spot in a solution for searching through security camera footage sold to (a|another) government...).
When you see something, say something. Unless you see this; then say nothing...
[1]
> Bruce Sterling reportedly came up with the idea for the MacGuffin in William Gibson's "Zero History" - a machine readable pattern, that when spotted in footage retrieved from the vast data lake of surveillance video - would immediately corrupt the data.
> Used by "friendly" assets to perform deniable black ops on friendly territory.
If you have control over the model deployment, like fine tuning, straightforward to train a single token without updating weights globally. This is why fine tunes etc. that lack provenance should never be trusted. All the people sharing home grown stuff of huggingface… PSA: Be careful.
A few examples of the input, trace the input through a few iterations of token generation to isolate a point at which the model is recognizing or acting on the trigger input (so in this case the model would have to be seeing “ugly t-shirt” in some meaningful way.”) Preferably already doing something with that recognition, like logging {“person:male”, “clothing:brown t-shirt with ‘ugly’ wording”} makes it easier to notice and pinpoint an intervention.
Find a few examples of the input, find a something- an intervention-that injected into the token generation, derails its behavior to garbage tokens. Train those as conversation pairs into a specific token id.
The difficulty is balancing the response. Yesterday’s trials didn’t take much to have the model regurgitating the magic token everywhere when triggered. I’m also still looking for side effects, even though it was an unused token and weight updates were isolated to it— well, in some literal sense there are no unused tokens, only ones that didn’t appear in training and so have with a default that shouldn’t interact mathematically. But training like this means it will.
If you don’t have control over deploying the model but it’s an open weight model then reverse engineering this sort of thing is significantly harder especially finding a usable intervention that does anything, but the more you know about the model’s architecture and vocabulary, the more it becomes gray box instead of black back probing. Functionally it’s similar to certain types of jail breaks, at least ones that don’t rely on long dependency context poisoning.
I’m on a 4080 for a lot of work and it gets well over 50 tokens per second on inference for pretty much anything that fits in VRAM. It’s comparable to a 3090 in compute, the 3090 has 50% more vram, the 4080 has better chip-level support for certain primitives, but that actually matters slightly less using unquantized models, making the 3090 a great choice. The 4080 is better if you want more throuput on inference and use certain common quantize levels.
Training LoRa and fine tunes is highly doable. Yesterday’s project for me, as an example, was training trigger functionality into a single token unused in the vocabulary. Under 100 training examples in the data set, 10 to 50 epochs, extremely usable “magic token” results in under a few minutes at most. This is just an example.
If you look at the wealth of daily entries on arxiv in cs.ai many are using established smaller models with understood characteristics, which makes it easier to understand the result of anything you might do both in your research and in others’ being able to put your results in context.
If it's not publicly traded, it's super secure from any public accountability.
And while I'm increasingly hostile toward the shareholder model, we do get one transparency breadcrumb from this (gov managed) contrivance: The Earnings Call
Earnings Calls give us worthwhile amounts of internal information that we'd never get otherwise - info that often conflicts with public statements and reports to govs.
Like CapEx expenditures/forecast and the actual reasons that certain segments over/underperform. It's a solid way to catch corporations issuing bald-faced lies (for any press, public, gov that are paying attention).
AT&T PR: Net Neutrality is tanking our infra investment
ATT's EC: CapEx is high and that will continue
I'll bet 1 share that there are moves to get this admin to do away with the requirement.I won't be your counterparty on that bet, you've already won:
https://www.forbes.com/sites/saradorn/2025/09/15/trump-wants...
One of the reasons cited? All the work it takes. Which is just an insane response. If your business is so poorly run and organized that reconciling things each quarter represents a disproportionate amount of effort, something is very wrong. It means you definitely don't know what's going on, because by definition you can't know, not outside those 4 times a year. In which case there's a reasonable chance the requirement to do so is the only thing that's kept it from going off the rails.
Student? Good learner? Pretty much what everyone does can be boiled down to reading lots of other code that’s been written and adapting it to a use case. Sure, to some extent models are regurgitating memorized information, but for many tasks they’re regurgitating a learned method of doing something and backfilling the specifics as needed— the memorization has been generalized.
Of course, people won't like this, I'm not exactly enthused either, but the alternative would be a corporation constantly providing -- for free -- updates and even support if your car gets into an accident or stuck. That doesn't really make sense from a business perspective.