Yes of course it can, because they fit in the context window. But this is an awful test of the model's capabilities because it was certainly trained on these books and websites talking about the books and the HP universe.
“Where leadership roles had to be”
I said I wanted to walk to work to the giant billion dollar office down the street, I love Chelsea, I love the Meatpacking District, I love the Highline and the things around that office, I love models
But “roles with direct reports had to be in mountain view” and they assured me I would be so impressed with the highly coveted Mountain View and highly coveted Google
the only thing seared in my brain from that trip was standing at an elevator that had a warning sign that I might get cancer if I use it, in the middle of a sprawling boring unwalkable suburb and a janitor being my best source at the time that its a boilerplate disclaimer. He was right. But that was my experience.
I expected it to be a GPT-style model that processes audio directly to perform a ton of speech and maybe speech-text tasks in a zero-shot manner.
I don't see how that is a significant barrier to an anti-aging drug that actually works. Pick any one of the many recognized medical conditions that is strongly correlated with old age, like osteoporosis. Prove that your anti-aging drug effectively and safely treats osteoporosis in the elderly. The FDA will approve it. If your osteoporosis treatment also cures wrinkles and gray hair as a side effect, the FDA won't object. And once the drug is approved for one condition it can be prescribed by off-label for other conditions. Everyone will quickly learn what it's useful for, just like how people started using semaglutide for weight loss when it was "officially" still just a diabetes treatment.
The huge difference is actually between animal reflexes and learned behavior. Reflex is built-in. I didn't learn to kick my leg in response to a tap on the patellar tendon.
I also agree it's not much different than what's going on in this petri dish with pong.
But I don't think that's a profound statement.
What I'm saying is that calling what a Transformer does "language development" isn't accurate. A Transformer can't "develop" language in that sense, it can only learn "reflexive" behavior from the data distribution it's trained on (it could never have produced that data distribution itself without the data existing in the first place).