This can't be serious? My messages to my 'best buddies' are like "lunch?" or "https://some-link-here" or whatever. You're genuinely suggesting writing messages like those to people I met a few years ago who likely barely remember my face? Zero set-up, just "lunch?" or a random link with no context like I'd text my best buddy? Do you do that? I can't imagine this is a serious suggestion -- surely you're massively exaggerating -- so what am I missing?
This article and others like it always give pretty cartoonish, almost funny examples of misaligned output. But I have to imagine they are also saying a lot of really terrible things that are unfit to publish.
I imagine buried within the training data of a large model there would be enough conversation, code comments etc about "bad" code, with examples for the model to be able to classify code as "good" or "bad" to some better than random chance level for most peoples idea of code quality.
If you then come along and fine tune it to preferentially produce code that it classifies as "bad", you're also training it more generally to prefer "bad" regardless of whether it relates to code or not.
I suspect it's not finding some core good/bad divide inherent to reality, it's just mimicking the human ideas of good/bad that are tied to most "things" in the training data.
Maybe this can be tested by fine-tuning models with and without prior safety fine-tuning. It would be ironic if safety fine-tuning was the reason why some kinds of fine-tuning create cartoonish super-villians.
Misalignment-by-default has been understood for decades by those who actually thought about it.
S. Omohundro, 2008: "Abstract. One might imagine that AI systems with harmless goals will be harmless. This paper instead shows that intelligent systems will need to be carefully designed to prevent them from behaving in harmful ways. We identify a number of “drives” that will appear in sufficiently advanced AI systems of any design. We call them drives because they are tendencies which will be present unless explicitly counteracted."
https://selfawaresystems.com/wp-content/uploads/2008/01/ai_d...
E. Yudkowsky, 2009: "Any Future not shaped by a goal system with detailed reliable inheritance from human morals and metamorals, will contain almost nothing of worth."
https://www.lesswrong.com/posts/GNnHHmm8EzePmKzPk/value-is-f...
But semantics phooey. It's interesting to read these abstracts and compare the alignment concerns they had in 2008 to where we are now. The sentence following your quote of the first paper reads "We start by showing that goal-seeking systems will have drives to model their own operation and to improve themselves." This was a credible concern 17 years ago, and maybe it will be a primary concern in the future. But it doesn't really apply to LLMs in a very interesting way, which is that we somehow managed to get machines that exhibit intelligence without being particularly goal-oriented. I'm not sure many people anticipated this.
A beefed up NPU could provide a big edge here.
More speculatively, Apple is also one of the few companies positioned to market an ASIC for a specific transformer architecture which they could use for their Siri replacement.
(Google has on-device inference too but their business model depends on them not being privacy-focused and their GTM with Android precludes the tight coordination between OS and hardware that would be required to push SOTA models into hardware. )
NotebookLM is a genuinely novel AI-first product.
YouTube gaining an “ask a question about this video” button, this is a perfect example of how to sprinkle AI on an existing product.
Extremely slow, but the obvious incremental addition of Gemini to Docs is another example.
I think folks sleep on Google around here. They are slow but they have so many compelling iterative AI usecases that even a BigTech org can manage it eventually.
Apple and Microsoft are rightly getting panned, Apple in particular is inexcusable (but I think they will have a unique offering when they finally execute on the blindingly obvious strategic play that they are naturally positioned for).
What's that? It's not obvious to me, anyway.
Go back to pen-and-paper examinations at a location where students are watched. Do the same for assignments and projects.
Is it really that expensive for them to maintain minimal access for a year? This is not a rhetorical question.