I cobbled together my own smart home voice assistant on a weekend a few weeks ago, sitting on top of the OpenAI APIs (Whisper, GPT-4), and of course using porcupine for wake word detection.
It can do things I could never get the commercial products to do properly, for example I gave it a memory: When a user command comes in, I have GPT-4 evaluate whether it can be executed immediately or requires later follow-up. When a sensor event happens, the machinery re-prompts GPT-4 with the user command backlog, the sensor backlog and the current state, and it figures things out. That way, things like "Please turn of the lights after I leave the room" now work just fine, and all it takes is an afternoon of hacking and a PIR sensor on my little DIY Homebrew-lexa wood boxes. And of course it's also much better at interpreting natural language commands "in spirit" or "creatively".
I'm sure Amazon, Google & Apple have made all of these tinkering experiments, too, but deploying LLM-backed voice services to tens of millions just isn't affordable yet, especially when you factor in risk and liability.
You make a very valid point about economics. My naive point of view is: one would think a cash cow like Apple could afford it even to a limited extent, but then again iCloud free tier is still restricted to just 5GB so they have never been too generous with their cloud offerings.
In the age of ChatGPT, Siri not only hasn’t been improved but has been getting even dumber lately with no significant announcements towards its improvements of understanding and doing more tasks in any WWDC recently. I would take what he does with a grain of salt.