mchiang (u/mchiang) - Readit News

mchiang commented on Ollama Turbo ollama.com/turbo... · Posted by u/amram_art

buyucu · 22 days ago

This kind of gaslighting is exactly why I stopped using Ollama.

GGML library is llama.cpp. They are one and the same.

Ollama made sense when llama.cpp was hard to use. Ollama does not have value preposition anymore.

mchiang · 22 days ago

It’s a different repo. https://github.com/ggml-org/ggml

The models are implemented by Ollama https://github.com/ollama/ollama/tree/main/model/models

I can say as a fact, for the gpt-oss model, we also implemented our own MXFP4 kernel. Benchmarked against the reference implementations to make sure Ollama is on par. We implemented harmony and tested it. This should significantly impact tool calling capability.

Im not sure if im feeding here. We really love what we do, and I hope it shows in our product, in Ollama’s design and in our voice to our community.

You don’t have to like Ollama. That’s subjective to your taste. As a maintainer, I certainly hope to have you as a user one day. If we don’t meet your needs and you want to use an alternative project, that’s totally cool too. It’s the power of having a choice.

mchiang commented on Ollama Turbo ollama.com/turbo... · Posted by u/amram_art

dcreater · 22 days ago

But Ollama is a toy, it's meaningful for hobbyists and individuals to use locally like myself. Why would it be the right choice for anything more? AWS, vLLM, SGLang etc would be the solutions for enterprise

I knew a startup that deployed ollama on a customers premises and when I asked them why, they had absolutely no good reason. Likely they did it because it was easy. That's not the "easy to use" case you want to solve for.

mchiang · 22 days ago

I can say trying many inference tools after the launch, many do not have the models implemented well, and especially OpenAI’s harmony.

Why does this matter? For this specific release, we benchmarked against OpenAI’s reference implementation to make sure Ollama is on par. We also spent a significant amount of time getting harmony implemented the way intended.

I know vLLM also worked hard to implement against the reference and have shared their benchmarks publicly.

mchiang commented on Ollama Turbo ollama.com/turbo... · Posted by u/amram_art

daft_pink · 22 days ago

So I’m using turbo and just want to provide some feedback. I can’t figure out how to connect raycast and project goose to ollama turbo. The software that calls it essentially looks for the models via ollama but cannot find the turbo ones and the documentation is not clear yet. Just my two cents, the inference is very quick and I’m happy with the speed but not quite usable yet.

mchiang · 22 days ago

so sorry about this. We are learning. Possible to email, and we will first make it right while we improve Ollama's turbo mode. hello@ollama.com

mchiang commented on Ollama Turbo ollama.com/turbo... · Posted by u/amram_art

kristjansson · 22 days ago

> Ollama does not use llama.cpp anymore;

> We do use GGML

Sorry, but this is kind of hiding the ball. You don't use llama.cpp, you just ... use their core library that implements all the difficult bits, and carry a patchset on top of it?

Why do you have to start with the first statement at all? "we use the core library from llama.cpp/ggml and implement what we think is a better interface and UX. we hope you like it and find it useful."

mchiang · 22 days ago

thanks, I'll take that feedback, but I do want to clarify that it's not from llama.cpp/ggml. It's from ggml-org/ggml. I supposed it's all interchangeable though, so thank you for it.

mchiang commented on Ollama Turbo ollama.com/turbo... · Posted by u/amram_art

captainregex · 22 days ago

money is great! I like money! but if this is their version of buy me a coffee I think there’s room to run elsewhere for their skillset/area of expertise

mchiang · 22 days ago

hmm, I don't think so. This is more of, we want to keep improving Ollama so we can have a great core.

For the users who want GPUs, which cost us money, we will charge money for it. Completely optional.

mchiang commented on Ollama Turbo ollama.com/turbo... · Posted by u/amram_art

tarruda · 22 days ago

> Ollama does not use llama.cpp anymore

That is interesting, did Ollama develop its own proprietary inference engine or did you move to something else?

Any specific reason why you moved away from llama.cpp?

mchiang · 22 days ago

it's all open, and specifically, the new models are implemented here: https://github.com/ollama/ollama/tree/main/model/models

mchiang commented on Ollama Turbo ollama.com/turbo... · Posted by u/amram_art

tarruda · 22 days ago

Llama.cpp (library which ollama uses under the hoods) has its own server, and it is fully compatible with open-webui.

I moved away from ollama in favor of llama-server a couple of months ago and never missed anything, since I'm still using the same UI.

mchiang · 22 days ago

totally respect your choice, and it's a great project too. Of course as a maintainer of Ollama, my preference is to win you over with Ollama. If it doesn't meet your needs, it's okay. We are more energized than ever to keep improving Ollama. Hopefully one day we will win you back.

Ollama does not use llama.cpp anymore; we do still keep it and occasionally update it to remain compatible for older models for when we used it. The team is great, we just have features we want to build, and want to implement the models directly in Ollama. (We do use GGML and ask partners to help it. This is a project that also powers llama.cpp and is maintained by that same team)

mchiang commented on Ollama Turbo ollama.com/turbo... · Posted by u/amram_art

irthomasthomas · 22 days ago

If these are FP4 like the other ollama models then I'm not very interested. If I'm using an API anyway I'd rather use the full weights.

mchiang · 22 days ago

OpenAI has only provided MXFP4 weights. These are the same weights used by other cloud providers.

mchiang commented on Ollama Turbo ollama.com/turbo... · Posted by u/amram_art

dcreater · 22 days ago

Called it.

It's very unfortunate that the local inference community has aggregated around Ollama when it's clear that's not their long term priority or strategy.

Its imperative we move away ASAP

mchiang · 22 days ago

hmm, how so? Ollama is open and the pricing is completely optional for users who want additional GPUs.

Is it bad to fairly charge money for selling GPUs that cost us money too, and use that money to grow the core open-source project?

At one point, it just has to be reasonable. I'd like to believe by having a conscientious, we can create something great.

mchiang commented on Ollama Turbo ollama.com/turbo... · Posted by u/amram_art

paxys · 22 days ago

A subscription fee for API usage is definitely an interesting offering, though the actual value will depend on usage limits (which are kept hidden).

mchiang · 22 days ago

we are learning the usage patterns to be able to price this more properly.