How many actually sincerely follow through on these claims?
I've yet to encounter a single one.
Even using llama.cpp as a library seems like an overkill for most use cases. Ollama could make its life much easier by spawning llama-server as a subprocess listening on a unix socket, and forward requests to it.
One thing I'm curious about: Does ollama support strict structured output or strict tool calls adhering to a json schema? Because it would be insane to rely on a server for agentic use unless your server can guarantee the model will only produce valid json. AFAIK this feature is implemented by llama.cpp, which they no longer use.
As far as I understand this is generally not possible at the model level. Best you can do is wrap the call in a (non-llm) json schema validator, and emit an error json in case the llm output does not match the schema, which is what some APIs do for you, but not very complicated to do yourself.
Someone correct me if I'm wrong
Apple can do the bare minimum, years after everyone else, and barely get called out. The Reality Distortion Field is the enemy.
Also funny that other devs had the gall to make people pay (sometimes subscriptions!) for Safari adblockers inferior to the free adblockers on any other browser.
> Also funny that other devs had the gall to make people pay (sometimes subscriptions!) for Safari adblockers inferior to the free adblockers on any other browser.
That's absolutely perfect, and fits into the typical apple fangirl pattern that can be readily seen on hackernews - pseudo-technical people promoting some closed cute-looking macos app that's just objectively worse existing OSS alternatives.
I find it analogous to when financially successful people in their mid-life crisis stage decide to buy a 'nice' car, while not having any interest in cars previously. They invariably seem to end up with the the most flashy/marketed car, even though that car is objectively worse than another car for half the price. They will extol the car's virtue in a way that sounds like they are literally reading off of a marketing brochure, and actual car people just laugh at them.
Local, in my experience, can’t even pull data from an image without hallucinating (Qwen 2.5 VI in that example). Hopefully local/small models keep getting better and devices get better at running bigger ones
It feels like we do it because we can more than because it makes sense- which I am all for! I just wonder if i’m missing some kind of major use case all around me that justifies chaining together a bunch of mac studios or buying a really great graphics card. Tools like exo are cool and the idea of distributed compute is neat but what edge cases truly need it so badly that it’s worth all the effort?
- Costs.
- Rate limits.
- Privacy.
- Security.
- Vendor lock-in.
- Stability/backwards-compatibility.
- Control.
- Etc.