It's very unfortunate that the local inference community has aggregated around Ollama when it's clear that's not their long term priority or strategy.
Its imperative we move away ASAP
Getting this tech deployed globally will take another decade or two, optimistically speaking.
And, is there an open source implementation of an agentic workflow (search tools and others) to use it with local LLM’s?
Can't wait for it to arrive and crank up LM Studio. It's literally the first install. I'm going to download it with safari.
LM Studio is newish, and it's not a perfect interface yet, but it's fantastic at what it does which is bring local LLMs to the masses w/o them having to know much.
There is another project that people should be aware of: https://github.com/exo-explore/exo
Exo is this radically cool tool that automatically clusters all hosts on your network running Exo and uses their combined GPUs for increased throughput.
Like HPC environments, you are going to need ultra fast interconnects, but it's just IP based.