It is a little sad that they gave someone an uber machine and this was the best he could come up with.
Question answering is interesting but not the most interesting thing one can do, especially with a home rig.
The realm of the possible
Video generation: CogVideoX at full resolution, longer clips
Mochi or Hunyuan Video with extended duration
Image generation at scale:
FLUX batch generation — 50 images simultaneously
Fine-tuning:
Actually train something — show LoRA on a 400B model, or full fine-tuning on a 70B
but I suppose "You have it for the weekend" means chatbot go brrrrr and snark
Yeah, that's what I wanted to see too.
The advent of agentic coding is probably punch #2 in the one-two punch against juniors, but it's an extension of a pattern that's been unfolding for probably 5+ years now.
There's also "GPT-4.1 or GPT-5", but that's not what my question implied, which was that it's weird to offer Sonnet but not Opus.
When people start studying theory of mind someone usually jumps in with this thought. It's more or less a description of Functionalism (although minus the "mental state"). It's not very popular because most people can immediately identify an phenomenon of understanding separate from the function of understanding. People also have immediate understanding of certain sensations, e.g. the feeling of balance when riding a bike, sometimes called qualia. And so on, and so forth. There is plenty of study on what constitutes understanding and most healthily dismiss the "string of words" theory.
Do you think there are components of the cat's brain that calculate forces and trajectories, incorporating the gravitational constant and the cat's static mass?
Probably not.
So, does a cat "understand" the physics of jumping?
The cat's knowledge about jumping comes from trial and error, and their brain builds a neural network that encodes the important details about successful and unsuccessful jumping parameters. Even if the cat has no direct cognitive access to those parameters.
So the cat can "understand" jumping without having a "meta-understanding" about their understanding. When a cat "thinks" about jumping, and prepares to leap, they aren't rehearsing their understanding of the physics, but repeating the ritual that has historically lead them to perform successful jumps in the past.
I think the theory of mind of an LLM is like that. In my interactions with LLMs, I think "thinking" is a reasonable word to describe what they're doing. And I don't think it will be very long before I'd also use the word "consciousness" to describe the architecture of their thought processes.
I'm super happy about this, since I took a bet that exactly this would happen. I've just been building a consumer TTS app that could only work with significant cheaper TTS prices per million character (or self-hosted models)
FYI, Speech marks provide millisecond timestamp for each word in a generated audio file/stream (and a start/end index into your original source string), as a stream of JSONL objects, like this:
{"time":6,"type":"word","start":0,"end":5,"value":"Hello"}
{"time":732,"type":"word","start":7,"end":11,"value":"it's"}
{"time":932,"type":"word","start":12,"end":16,"value":"nice"}
{"time":1193,"type":"word","start":17,"end":19,"value":"to"}
{"time":1280,"type":"word","start":20,"end":23,"value":"see"}
{"time":1473,"type":"word","start":24,"end":27,"value":"you"}
{"time":1577,"type":"word","start":28,"end":33,"value":"today"}
AWS uses these speech marks (with variants for "sentence", "word", "viseme", or "ssml") in their Polly TTS service...
The sentence or word marks are useful for highlighting text as the TTS reads aloud, while the "viseme" marks are useful for doing lip-sync on a facial model.
Former Founder / Principal Engineer of https://shaxpir.com
Location: Portland, OR
Remote: Yes
Willing to Relocate: Yes
Technologies: Java, AWS, TypeScript, Elasticsearch, Realtime Collab, LLM APIs, Claude Code, etc
Résumé/CV: https://www.linkedin.com/in/benjismith/
Email: benji@benjismith.net