Model weights store abilities, not facts - generally.
Unless the fact is very widely used and widely known, with a ton of context around it.
The model can learn the day JFK died because there are millions of sparse examples of how that information exists in the world, but when you're working on a problem, you might have 1 concern to 'memorize'.
That's going to be something different than adjusting model weights as we understand them today.
LLMs are not mammals either, it's helpful analogy in terms of 'what a human might find useful' but not necessary in the context of actual llm architecture.
The fact is - we don't have memory sorted out architecturally - it's either 'context or weights' and that's that.
Also critically: Humans do not remember the details of the face. Not remotely. They're able to associate it with a person and name 'if they see it again' - but that's different than some kind of excellent recall. Ask them to describe features in detail and maybe we can't do it.
You can see in this instance, this may be related to kind of 'soft lookup' aka associating an input with other bits of information which 'rise to the fore' as possibly useful.
But overall, yes, it's fair to take the position that we'll have to 'learn from context in some way'.
Apologies; I think I got us all kind of off-track in this comment thread by stretching the definition of the term "fine-tuning" in my ancestor comment above.
Actual fine-tuning of the base model's weights (as one would do to customize a base model into a domain-specific model) works the way you're talking about, yes. The backprop from an individual training document would be a drop in the ocean; a "memory" so weak that, unless it touched some bizarre part of the latent vector-space that no other training document has so far affected (and so is until then all-zero), would be extremely unlikely to affect output, let alone create specific recall of the input.
And a shared, global incremental fine-tune of the model to "add memories" would be a hare-brained idea, anyway. Not even just that it wouldn't work, but that if it did work, it would be a security catastrophe, because now the model would be able to recall all this information gleaned from random tenant users' private chat transcripts, with nothing to differentiate that info from any other info to enable the model (or its inference framework) to compartmentalize it / prevent cross-tenant info leaks.
But let me rephrase what I was saying before:
> there's a way to take many transcripts of inference over a period, and convert/distil them together into an incremental-update training dataset (for memory, not for RLHF), that a model can be fine-tuned on as an offline batch process every day/week, such that a new version of the model can come out daily/weekly that hard-remembers everything you told it
As:
> for a given tenant user, there's a way to take all of their inference transcripts over a given period, and convert/distil them together into an incremental-update training dataset (for memory, not for RLHF), that a LoRA can be rebuilt (or itself fine-tuned) on. And that the work of all of these per-tenant LoRA rebuilds can occur asynchronously / "offline", on a batch-processing training cluster, gradually over the course of the day/week; such that at least once per day/week (presuming the tenant-user has any updated data to ingest), each tenant-user will get the effect of their own memory-LoRA being swapped out for a newer one.
---
Note how this is essentially what Apple claimed they would be doing with Apple Intelligence, re: "personal context."
The idea (that I don't think has ever come to fruition as stated—correct me if I'm wrong?) is that Apple would:
1. have your macOS and iOS devices spend some of their idle-on-charge CPU power to extract and normalize training fulltexts from whatever would be considered the user's "documents" — notes, emails, photos, maybe random text files on disk, etc.; and shove these fulltexts into some kind of iCloud-persisted database, where the fulltexts are PKI-encrypted such that only Apple's Private Compute Cloud (PCC) can decode them;
2. have the PCC produce a new/updated memory LoRA (or rather, six of them, because they need to separately imbue each of their domain-specific model "adapter" LoRAs with your personal-context memories);
3. and, once ready, have all your iCloud-account-synced devices to download the new versions of these memory-imbued adapter LoRAs.
---
And this is actually unnecessarily complex/circuitous for a cloud-hosted chat model. The ChatGPT/Claude/etc version of this architecture could be far simpler.
For a cloud-hosted chat model, you don't need a local agent to extract context from your devices; the context is just "past cloud-persisted chat transcripts." (But if you want "personal context" in the model, you could still get it, via an OpenClaw-style "personal agent"; such agents already essentially eat your files and spit them out external memories/RAGs/etc; the only change would be spitting them out into plain-old hidden-session chat transcripts instead, so as to influence the memories of the model they're running on.)
And you don't need a special securely-oblivious cluster to process that data, since unlike "Apple looking at the data on your computer" (which would upset literally everybody), nobody has any kind of expectation that e.g. OpenAI staff can't look at your ChatGPT conversation transcripts.
And cloud-hosted chat models don't really "do" domain-specific adapters (thus the whole "GPT" thing); so you only need to train one memory-LoRA per model. (Though I suppose that might still lead to training several LoRAs per user, if you're relying on smart routing to different models within a model family to save costs.)
And you don't need to distribute the memory-LoRAs back to client devices; as they can just live in an object store and get just-in-time loaded by the inference framework on a given node at the moment it begins an inference token-emission loop for a specific user. (Which might thus cause the inference cluster's routing to benefit from sticky sessions in a way it didn't before—but you don't need it; the LoRAs would likely be small enough to fetch and load within the ~second of delay it takes these cloud-hosted models to allocate you a node.)
"Anti-LLM sentiment" within software development is nearly non-existent. The biggest kind of push-back to LLMs that we see on HN and elsewhere, is effectively just pragmatic skepticism around the effectiveness/utility/ROI of LLMs when employed for specific use-cases. Which isn't "anti-LLM sentiment" any more than skepticism around the ability of junior programmers to complete complex projects is "anti-junior-programmer sentiment."
The difference between the perspectives you find in the creative professions vs in software dev, don't come down to "not getting" or "not understanding"; they really are a question of relative exposure to these pro-LLM vs anti-LLM ideas. Software dev and the creative professions are acting as entirely separate filter-bubbles of conversation here. You can end up entirely on the outside of one or the other of them by accident, and so end up entirely without exposure to one or the other set of ideas/beliefs/memes.
(If you're curious, my own SO actually has this filter-bubble effect from the opposite end, so I can describe what that looks like. She only hears the negative sentiment coming from the creatives she follows, while also having to dodge endless AI slop flooding all the marketplaces and recommendation feeds she previously used to discover new media to consume. And her job is one you do with your hands and specialized domain knowledge; so none of her coworkers use AI for literally anything. [Industry magazines in her field say "AI is revolutionizing her industry" — but they mean ML, not generative AI.] She has no questions that ChatGPT could answer for her. She doesn't have any friends who are productively co-working with AI. She is 100% out-of-touch with pro-LLM sentiment.)