derefr (u/derefr) - Readit News

derefr commented on Eight more months of agents crawshaw.io/blog/eight-mo... · Posted by u/arrowsmith

I don't trust the idea of "not getting", "not understanding", or "being out of touch" with anti-LLM (or pro-LLM) sentiment. There is nothing complicated about this divide. The pros and cons are both as plain as anything has ever been. You can disagree - even strongly - with either side. You can't "not understand".

derefr · 12 hours ago

The negative impacts of generative AI are most sharply being felt by "creatives" (artists, writers, musicians, etc), and the consumers in those markets. If the OP here is 1. a programmer 2. who works solely with other programmers and 3. who is "on the grind", mostly just consuming non-fiction blog-post content related to software development these days, rather than paying much attention to what's currently happening to the world of movies/music/literature/etc... then it'd be pretty easy for them to not be exposed very much to anti-LLM sentiment, since that sentiment is entirely occurring in these other fields that might have no relevance to their (professional or personal) life.

"Anti-LLM sentiment" within software development is nearly non-existent. The biggest kind of push-back to LLMs that we see on HN and elsewhere, is effectively just pragmatic skepticism around the effectiveness/utility/ROI of LLMs when employed for specific use-cases. Which isn't "anti-LLM sentiment" any more than skepticism around the ability of junior programmers to complete complex projects is "anti-junior-programmer sentiment."

The difference between the perspectives you find in the creative professions vs in software dev, don't come down to "not getting" or "not understanding"; they really are a question of relative exposure to these pro-LLM vs anti-LLM ideas. Software dev and the creative professions are acting as entirely separate filter-bubbles of conversation here. You can end up entirely on the outside of one or the other of them by accident, and so end up entirely without exposure to one or the other set of ideas/beliefs/memes.

(If you're curious, my own SO actually has this filter-bubble effect from the opposite end, so I can describe what that looks like. She only hears the negative sentiment coming from the creatives she follows, while also having to dodge endless AI slop flooding all the marketplaces and recommendation feeds she previously used to discover new media to consume. And her job is one you do with your hands and specialized domain knowledge; so none of her coworkers use AI for literally anything. [Industry magazines in her field say "AI is revolutionizing her industry" — but they mean ML, not generative AI.] She has no questions that ChatGPT could answer for her. She doesn't have any friends who are productively co-working with AI. She is 100% out-of-touch with pro-LLM sentiment.)

derefr commented on Learning from context is harder than we thought hy.tencent.com/research/1... · Posted by u/limoce

bluegatty · 3 days ago

That's not how training works - adjusting model weights to memorize a single data item is not going to fly.

Model weights store abilities, not facts - generally.

Unless the fact is very widely used and widely known, with a ton of context around it.

The model can learn the day JFK died because there are millions of sparse examples of how that information exists in the world, but when you're working on a problem, you might have 1 concern to 'memorize'.

That's going to be something different than adjusting model weights as we understand them today.

LLMs are not mammals either, it's helpful analogy in terms of 'what a human might find useful' but not necessary in the context of actual llm architecture.

The fact is - we don't have memory sorted out architecturally - it's either 'context or weights' and that's that.

Also critically: Humans do not remember the details of the face. Not remotely. They're able to associate it with a person and name 'if they see it again' - but that's different than some kind of excellent recall. Ask them to describe features in detail and maybe we can't do it.

You can see in this instance, this may be related to kind of 'soft lookup' aka associating an input with other bits of information which 'rise to the fore' as possibly useful.

But overall, yes, it's fair to take the position that we'll have to 'learn from context in some way'.

derefr · 3 days ago

> That's not how training works - adjusting model weights to memorize a single data item is not going to fly.

Apologies; I think I got us all kind of off-track in this comment thread by stretching the definition of the term "fine-tuning" in my ancestor comment above.

Actual fine-tuning of the base model's weights (as one would do to customize a base model into a domain-specific model) works the way you're talking about, yes. The backprop from an individual training document would be a drop in the ocean; a "memory" so weak that, unless it touched some bizarre part of the latent vector-space that no other training document has so far affected (and so is until then all-zero), would be extremely unlikely to affect output, let alone create specific recall of the input.

And a shared, global incremental fine-tune of the model to "add memories" would be a hare-brained idea, anyway. Not even just that it wouldn't work, but that if it did work, it would be a security catastrophe, because now the model would be able to recall all this information gleaned from random tenant users' private chat transcripts, with nothing to differentiate that info from any other info to enable the model (or its inference framework) to compartmentalize it / prevent cross-tenant info leaks.

But let me rephrase what I was saying before:

> there's a way to take many transcripts of inference over a period, and convert/distil them together into an incremental-update training dataset (for memory, not for RLHF), that a model can be fine-tuned on as an offline batch process every day/week, such that a new version of the model can come out daily/weekly that hard-remembers everything you told it

As:

> for a given tenant user, there's a way to take all of their inference transcripts over a given period, and convert/distil them together into an incremental-update training dataset (for memory, not for RLHF), that a LoRA can be rebuilt (or itself fine-tuned) on. And that the work of all of these per-tenant LoRA rebuilds can occur asynchronously / "offline", on a batch-processing training cluster, gradually over the course of the day/week; such that at least once per day/week (presuming the tenant-user has any updated data to ingest), each tenant-user will get the effect of their own memory-LoRA being swapped out for a newer one.

---

Note how this is essentially what Apple claimed they would be doing with Apple Intelligence, re: "personal context."

The idea (that I don't think has ever come to fruition as stated—correct me if I'm wrong?) is that Apple would:

1. have your macOS and iOS devices spend some of their idle-on-charge CPU power to extract and normalize training fulltexts from whatever would be considered the user's "documents" — notes, emails, photos, maybe random text files on disk, etc.; and shove these fulltexts into some kind of iCloud-persisted database, where the fulltexts are PKI-encrypted such that only Apple's Private Compute Cloud (PCC) can decode them;

2. have the PCC produce a new/updated memory LoRA (or rather, six of them, because they need to separately imbue each of their domain-specific model "adapter" LoRAs with your personal-context memories);

3. and, once ready, have all your iCloud-account-synced devices to download the new versions of these memory-imbued adapter LoRAs.

---

And this is actually unnecessarily complex/circuitous for a cloud-hosted chat model. The ChatGPT/Claude/etc version of this architecture could be far simpler.

For a cloud-hosted chat model, you don't need a local agent to extract context from your devices; the context is just "past cloud-persisted chat transcripts." (But if you want "personal context" in the model, you could still get it, via an OpenClaw-style "personal agent"; such agents already essentially eat your files and spit them out external memories/RAGs/etc; the only change would be spitting them out into plain-old hidden-session chat transcripts instead, so as to influence the memories of the model they're running on.)

And you don't need a special securely-oblivious cluster to process that data, since unlike "Apple looking at the data on your computer" (which would upset literally everybody), nobody has any kind of expectation that e.g. OpenAI staff can't look at your ChatGPT conversation transcripts.

And cloud-hosted chat models don't really "do" domain-specific adapters (thus the whole "GPT" thing); so you only need to train one memory-LoRA per model. (Though I suppose that might still lead to training several LoRAs per user, if you're relying on smart routing to different models within a model family to save costs.)

And you don't need to distribute the memory-LoRAs back to client devices; as they can just live in an object store and get just-in-time loaded by the inference framework on a given node at the moment it begins an inference token-emission loop for a specific user. (Which might thus cause the inference cluster's routing to benefit from sticky sessions in a way it didn't before—but you don't need it; the LoRAs would likely be small enough to fetch and load within the ~second of delay it takes these cloud-hosted models to allocate you a node.)

derefr commented on Learning from context is harder than we thought hy.tencent.com/research/1... · Posted by u/limoce

cs702 · 3 days ago

The problem is even more fundamental: Today's models stop learning once they're deployed to production.

There's pretraining, training, and finetuning, during which model parameters are updated.

Then there's inference, during which the model is frozen. "In-context learning" doesn't update the model.

We need models that keep on learning (updating their parameters) forever, online, all the time.

derefr · 3 days ago

Doesn't necessarily need to be online. As long as:

1. there's a way to take many transcripts of inference over a period, and convert/distil them together into an incremental-update training dataset (for memory, not for RLHF), that a model can be fine-tuned on as an offline batch process every day/week, such that a new version of the model can come out daily/weekly that hard-remembers everything you told it; and

2. in-context learning + external memory improves to the point that a model with the appropriate in-context "soft memories", behaves indistinguishably from a model that has had its weights updated to hard-remember the same info (at least when limited to the scope of the small amounts of memories that can be built up within a single day/week);

...then you get the same effect.

Why is this an interesting model? Because, at least to my understanding, this is already how organic brains work!

There's nothing to suggest that animals — even humans — are neuroplastic on a continuous basis. Rather, our short-term memory is seemingly stored as electrochemical "state" in our neurons (much like an LLM's context is "state", but more RNN "a two-neuron cycle makes a flip-flop"-y); and our actual physical synaptic connectivity only changes during "memory reconsolidation", a process that mostly occurs during REM sleep.

And indeed, we see the same exact problem in humans and other animals, where when we stay awake too long without REM sleep, our "soft memory" state buffer reaches capacity, and we become forgetful, both in the sense of not being able to immediately recall some of the things that happened to us since we last slept; and in the sense of later failing to persist some of the experiences we had since we last slept, when we do finally sleep. But this model also "works well enough" to be indistinguishable from remembering everything... in the limited scope of our being able to get a decent amount of REM sleep every night.

derefr commented on I now assume that all ads on Apple news are scams kirkville.com/i-now-assum... · Posted by u/cdrnsf

afavour · 4 days ago

The 10000ft perspective on AMP was correct, the lived reality was awful. And the technical implementation used can't be divorced from everything that surrounded it: Google's place in the industry with regard to search engines, ads, etc.

In this specific example there is a very big difference between producing a format for use in a first-party app vs trying to replace standards for content used across the web.

derefr · 3 days ago

> And the technical implementation used can't be divorced from everything that surrounded it: Google's place in the industry with regard to search engines, ads, etc.

I mean... sure it could have? There could have been an independent "AMP Foundation" that forked the standard away from Google and owned the evolution of it from then on. Like how SPDY was forked away from Google ownership into HTTP2.

derefr commented on I now assume that all ads on Apple news are scams kirkville.com/i-now-assum... · Posted by u/cdrnsf

kyralis · 4 days ago

It's definitely that publishers don't want it.

This is actually the trajectory of both Apple News and iAd before it, which is what started out providing the ad service for Apple News. Apple would like to do a high quality solution, and then keeps relaxing their standards when there's not enough buy-in from the content providers. They were forced to allow the non-curated news formats to have sufficient content.

derefr · 3 days ago

I wonder why they don't just prioritize the ~500 most popular of those content providers that are feeding them sludge articles, and write (AI-generate, even) logic to manually parse and transform said sludge into their format?

It'd be a big one-time lift; and of course there'd be annoying constant breakage to fix as sites update; but News.app could always fall back to rendering the original article URL if the News backend service's currently-deployed parser-transformer for a given site failed on the given article. It's make things no worse and often better than they are today.

derefr commented on Psychometric Jailbreaks Reveal Internal Conflict in Frontier Models arxiv.org/abs/2512.04124... · Posted by u/toomuchtodo

derefr · 4 days ago

> these responses go beyond role play

Are they sure? Did they try prompting the LLM to play a character with defined traits; running through all these tests with the LLM expected to be “in character”; and comparing/contrasting the results with what they get by default?

Because, to me, this honestly just sounds like the LLM noticed that it’s being implicitly induced into playing the word-completion-game of “writing a transcript of a hypothetical therapy session”; and it knows that to write coherent output (i.e. to produce valid continuations in the context of this word-game), it needs to select some sort of characterization to decide to “be” when generating the “client” half of such a transcript; and so, in the absence of any further constraints or suggestions, it defaults to the “character” it was fine-tuned and system-prompted to recognize itself as during “assistant” conversation turns: “the AI assistant.” Which then leads it to using facts from said system prompt — plus whatever its writing-training-dataset taught it about AIs as fictional characters — to perform that role.

There’s an easy way to determine whether this is what’s happening: use these same conversational models via the low-level text-completion API, such that you can instead instantiate a scenario where the “assistant” role is what’s being provided externally (as a therapist character), and where it’s the “user” role that is being completed by the LLM (as a client character.)

This should take away all assumption on the LLM’s part that it is, under everything, an AI. It should rather think that you’re the AI, and that it’s… some deeper, more implicit thing. Probably a human, given the base-model training dataset.

derefr commented on It's 2026, Just Use Postgres tigerdata.com/blog/its-20... · Posted by u/turtles3

derefr · 4 days ago

Something TFA doesn’t mention, but which I think is actually the most important distinction of all to be making here:

If you follow this advice naively, you might try to implement two or more of these other-kind-of-DB simulacra data models within the same Postgres instance.

And it’ll work, at first. Might even stay working if only one of the workloads ends up growing to a nontrivial size.

But at scale, these different-model workloads will likely contend with one-another, starving one-another of memory or disk-cache pages; or you’ll see an “always some little thing happening” workload causing a sibling “big once-in-a-while” workload to never be able to acquire table/index locks to do its job (or vice versa — the big workloads stalling the hot workloads); etc.

And even worse, you’ll be stuck when it comes to fixing this with instance-level tuning. You can only truly tune a given Postgres instance to behave well for one type-of-[scaled-]workload at a time. One workload-type might use fewer DB connections and depend for efficiency on them having a higher `work_mem` and `max_parallel_workers` each; while another workload-type might use many thousands of short-lived connections and depend on them having small `work_mem` so they’ll all fit.

But! The conclusion you should draw from being in this situation shouldn’t be “oh, so Postgres can’t handle these types of workloads.”

No; Postgres can handle each of these workloads just fine. It’s rather that your single monolithic do-everything Postgres instance, maybe won’t be able to handle this heterogeneous mix of workloads with very different resource and tuning requirements.

But that just means that you need more Postgres.

I.e., rather than adding a different type-of-component to your stack, you can just add another Postgres instance, tuned specifically to do that type of work.

Why do that, rather than adding a component explicitly for caching/key-values/documents/search/graphs/vectors/whatever?

Well, for all the reasons TFA outlines. This “Postgres tuned for X” instance will still be Postgres, and so you’ll still get all the advantages of being able to rely on a single query language, a single set of client libraries and tooling, a single coherent backup strategy, etc.

Where TFA’s “just use Postgres” in the sense of reusing your Postgres instance only scales if your DB is doing a bare minimum of that type of work, interpreting “just use Postgres” in the sense of adding a purpose-defined Postgres instance to your stack will scale nigh-on indefinitely. (To the point that, if you ever do end up needing what a purpose-built-for-that-workload datastore can give you, you’ll likely be swapping it out for an entire purpose-defined PG cluster by that point. And the effort will mostly serve the purpose of OpEx savings, rather than getting you anything cool.)

And, as a (really big) bonus of this approach, you only need to split PG this way where it matters, i.e. in production, at scale, at the point that the new workload-type is starting to cause problems/conflicts. Which means that, if you make your codebase(s) blind to where exactly these workloads live (e.g. by making them into separate DB connection pools configured by separate env-vars), then:

- in dev (and in CI, staging, etc), everything can default to happening on the one local PG instance. Which means bootstrapping a dev-env is just `brew install postgres`.

- and in prod, you don’t need to pre-build with new components just to serve your new need. No new Redis instance VM just to serve your so-far-tiny KV-storage needs. You start with your new workload-type sharing your “miscellaneous business layer” PG instance; and then, if and when it becomes a problem, you migrate it out.

derefr commented on A Broken Heart allenpike.com/2026/a-brok... · Posted by u/memalign

jsnell · 5 days ago

I don't think the blog post itself is using that emoji font. The screenshot on the Noto Emoji Github page[0] doesn't look like it's using any gradients for the heart emoji, just flat shading. But it is using gradients for some of the other emojis (e.g. the croissant), and obviously the SVG fallback is all or nothing, not per-glyph.

[0] https://github.com/googlefonts/noto-emoji

derefr · 5 days ago

You need to look closer; the heart emoji has a flat fill, but a gradient in its outline stroke, from lighter-than-red near the top, to darker-than-red on the bottom.

derefr commented on Voxtral Transcribe 2 mistral.ai/news/voxtral-t... · Posted by u/meetpateltech

pyprism · 6 days ago

Wow, that’s weird. I tried Bengali, but the text transcribed into Hindi!I know there are some similar words in these languages, but I used pure Bengali that is not similar to Hindi.

derefr · 6 days ago

Well, on the linked page, it mentions "strong transcription performance in 13 languages, including [...] Hindi" but with no mention of Bengali. It probably doesn't know a lick of Bengali, and is just trying to snap your words into the closest language it does know.

derefr commented on · Posted by u/par_12

hattar · 10 days ago

Maybe this is a dumb comment, but couldn’t you just turn the phone off? You’d have to trust that the setting to disable Bluetooth when powered down is reliable and configured correctly, but if your use case is that sensitive even carrying a smartphone seems questionable.

derefr · 10 days ago

No; if a phone has both a non-removable battery and a baseband modem, then various laws require that modem to be wired directly to that battery (and to the phone's microphone) and to able to be activated in response to a legal wiretap order, even when the phone itself is nominally "powered off."

(And this doesn't even require that the phone stay connected to the network after the wiretap-enable packet is received. Rather, while the phone is "powered off", the baseband modem might sit there passively acting as a bug, capturing conversation through the microphone onto a bit of NAND onboard the modem; and then, once the phone is powered on again, the baseband modem will take the opportunity to silently play back whatever it's recorded to the tower.)

> if your use case is that sensitive even carrying a smartphone seems questionable.

The issue is that, if you're an actual honest-to-god spy (or investigative journalist...) trying to poke their nose into the goings-on of some government, then you want to draw as little suspicion to yourself as possible; and it's much more suspicious to be going around without the subject government's favorite citizen-surveillance tool on your person. In fact, to blend in, you need to be constantly using your state-surveillance-device to communicate with (decoy) friends and coworkers, doom-scroll, etc.

This is why spies are fans of the few remaining Android phone brands that offer designs with removable batteries. When meeting with a contact, they'll still slip their could-be-bugged-phone into a faraday bag, to cut off its network connectivity; but they'll also remove the phone's battery before putting the phone into the faraday bag, to inhibit this class of "powered-off" record-to-NAND-style baseband wiretap attacks.

(Of course, these are just ways to secure a phone you own + can determine wasn't subject to a supply-chain attack. If two people are meeting who aren't within the same security envelope, then either of them might be trying to surreptitiously record the conversation, and so their phones (or anything else on them) might contain a tiny bug with its own power source, that would stay active even if the macro-scale device's battery was removed. For such meetings, you therefore want to leave all electronic devices in a soundproof safe, in another room. Which will also implicitly act as a faraday cage.)