Readit News logoReadit News
mchiang commented on Show HN: First Claude Code client for Ollama local models   github.com/21st-dev/1code... · Posted by u/SerafimKorablev
mchiang · 20 days ago
hey, thanks for sharing. I had to go to the Twitter feed to find the GitHub link:

https://github.com/21st-dev/1code

mchiang commented on New coding models and integrations   ollama.com/blog/coding-mo... · Posted by u/meetpateltech
zozbot234 · 4 months ago
For those interested in building Ollama locally, note that as of a few hours ago, experimental Vulkan Compute support (will not be in official binary releases as of yet) has been merged on the github main branch and you can test it on your hardware!
mchiang · 4 months ago
this one is exciting. It'll enable and accelerate a lot of devices on Ollama - especially around AMD GPUs not fully supported by ROCm, Intel GPUs, and iGPUs across different hardware vendors.
mchiang commented on New coding models and integrations   ollama.com/blog/coding-mo... · Posted by u/meetpateltech
bigyabai · 4 months ago
I'm looking forward to future ollama releases that might attempt parity with the cloud offerings. I've since moved onto the Ollama compatibility API on KoboldCPP since they don't have any such limits with their inference server.
mchiang · 4 months ago
I am super hopeful! Hardware is improving, inference costs will continue to decrease, models will only improve...
mchiang commented on New coding models and integrations   ollama.com/blog/coding-mo... · Posted by u/meetpateltech
bigyabai · 4 months ago
Been disappointed to see Ollama list models that are supported by the cloud product but not the Ollama app. It's becoming increasingly hard to deny that they're only interested in model inference just to turn a quick buck.
mchiang · 4 months ago
Qwen3-coder:30b is in the blog post. This is one that most users will be able to run locally.

We are in this together! Hoping for more models to come from the labs in varying sizes that will fit on devices.

mchiang commented on New coding models and integrations   ollama.com/blog/coding-mo... · Posted by u/meetpateltech
thot_experiment · 4 months ago
I recently tested every version from 0.7 to 0.11.1 trying to run q5 mistral-3.1 on a system with 48GB of available vram across 2 GPUs. Everything past 0.7.0 gave me OOM or other errors. Now that I've migrated back to llama.cpp I'm not particularly interested in fucking around with ollama again.

as for 4chan, they've hated ollama for a long time because they built on top of llama.cpp and then didn't contribute upstream or give credit to the original project

mchiang · 4 months ago
ah! This must be downloaded from elsewhere and not from Ollama? So sorry about this.

To help future optimizations for given quantizations, we have been trying to limit the quantizations to ones that fit for majority of users.

In the case of mistral-small3.1, Ollama supports ~4bit (q4_k_m), ~8bit (q8_0) and fp16.

https://ollama.com/library/mistral-small3.1/tags

I'm hopeful that in the future, more and more model providers will help optimize for given model quantizations - 4 bit (i.e. NVFP4, MXFP4), 8 bit, and a 'full' model.

mchiang commented on New coding models and integrations   ollama.com/blog/coding-mo... · Posted by u/meetpateltech
jhancock · 4 months ago
I tried synthetic.new prior to GLM-4.6...Starting in August...So I already had a subscription.

When z.ia launched GLM-4.6, I subscribed to their Coding Pro plan. Although I haven't been coding as heavy this month as the prior two months, I used to hit Claude limits almost daily, often twice a day. That was with both the $20 and $100 plans. I have yet to hit a limit with z.ai and the server response is at least as good as Claude.

I mention synthetic.new as it's good to have options and I do appreciate them sponsoring the dev of Octofriend. z.ai is a China company and I think hosts in Singapore. That could be a blocker for some.

mchiang · 4 months ago
Do you find yourself sticking with GLM 4.6 over Claude for some tasks? Or do you find yourself still wanting to reach for Claude?
mchiang commented on New coding models and integrations   ollama.com/blog/coding-mo... · Posted by u/meetpateltech
qwe----3 · 4 months ago
Just a paste of llama.cpp without attribution.
mchiang commented on New coding models and integrations   ollama.com/blog/coding-mo... · Posted by u/meetpateltech
jhancock · 4 months ago
I've been using GLM-4.6 since its release this month. It's my new fav. Using it via Claude Code and the more simple Octofriend https://github.com/synthetic-lab/octofriend

Hosting through z.ai and synthetic.new. Both good experiences. z.ai even answers their support emails!! 5-stars ;)

mchiang · 4 months ago
Z.ai team is awesome and very supportive. I have yet to try synthetic.new. What's the reason for using multiple? Is it mainly to try different models or are you hitting some kind of rate limit / usage limit?
mchiang commented on New coding models and integrations   ollama.com/blog/coding-mo... · Posted by u/meetpateltech
testaccount28 · 4 months ago
man i don't know, rick. i'm just reading comments on hacker news but maybe the one llama.cpp called out by GP could be a place to look? not sure, rick.
mchiang · 4 months ago
but that is VC funded
mchiang commented on New coding models and integrations   ollama.com/blog/coding-mo... · Posted by u/meetpateltech
mchiang · 4 months ago
sorry, I don't use 4chan, so I don't know what's said there.

May I ask what system you are using where you are getting memory estimations wrong? This is an area Ollama has been working on and improved quite a bit on.

Latest version of Ollama is 0.12.5 and with a pre-release of 0.12.6

0.7.1 is 28 versions behind.

u/mchiang

KarmaCake day696February 24, 2013View Original