My experience using GLM-4.6 with Charm Crush has been absolutely incredible, especially with high thinking. This is on pretty hard tasks too, e.g. proving small lemmas with Lean.
I've had much less luck with other agentic software, including Claude Code. For these kinds of tasks, only Codex seems to come close.
Z.ai team is awesome and very supportive. I have yet to try synthetic.new. What's the reason for using multiple? Is it mainly to try different models or are you hitting some kind of rate limit / usage limit?
I tried synthetic.new prior to GLM-4.6...Starting in August...So I already had a subscription.
When z.ia launched GLM-4.6, I subscribed to their Coding Pro plan. Although I haven't been coding as heavy this month as the prior two months, I used to hit Claude limits almost daily, often twice a day. That was with both the $20 and $100 plans. I have yet to hit a limit with z.ai and the server response is at least as good as Claude.
I mention synthetic.new as it's good to have options and I do appreciate them sponsoring the dev of Octofriend.
z.ai is a China company and I think hosts in Singapore. That could be a blocker for some.
Z.ai is on the US Entities (banned from export/collab) list:
> “These entities advance the People’s Republic of China’s military modernization through the development and integration of advanced artificial intelligence research. This activity is contrary to the national security and foreign policy interests of the United States under Section 744.11 of the EAR.”
$3 a month and using it in Claude code is a matter of changing a few env vars which you copy and paste from their docs. Cost benefit wise there is nothing better.
$6/month. It's $3 for the first month (or first months, on longer subscription cycles, but it's first unit of subscription cycle at half price only).
At $6/month it's still pretty reasonable, IMO, and chucking less than $10 at it for three months probably gets you to the next pop-up token retailer offering introductory pricing, so long as the bubble doesn't burst before then.
For those interested in building Ollama locally, note that as of a few hours ago, experimental Vulkan Compute support (will not be in official binary releases as of yet) has been merged on the github main branch and you can test it on your hardware!
this one is exciting. It'll enable and accelerate a lot of devices on Ollama - especially around AMD GPUs not fully supported by ROCm, Intel GPUs, and iGPUs across different hardware vendors.
Question for those using local models for coding assistance: how well do the best locally runnable models (running on a laptop with a GPU) work for the easy case:
Writing short runs of code and tests after I give an clear description of the expected behavior (because I have done the homework). I want to save the keystrokes and the mental energy spent on bookkeeping code, not have it think about the big problem for me.
Think short algorithms/transformations/script, and "smart" auto complete.
No writing entire systems/features or creating heavily interpolated things due to underspecified prompts - I'm not interested in those.
I have tried a model on my laptop+GPU before, and it is incredibly unusable. Incredibly slow and just bad output for exactly the work you describe
If you're looking for a cheap practical tool + don't care if it's not local, deepseek's non-reasoning model via openrouter is the most cost efficient by far for the work you describe.
I put 10 dollars in my account about 6 months ago and still haven't gotten through it, after heavy use semi regularly.
Not necessarily. You need either multiple GPUs or unified memory. There are a handful of UM platforms out there nowadays (mainly Macs but AMD has some as well albeit none with 300GB ram)
Has anybody that has tried their cloud product care to comment? How does it compare with Anthropic's and OpenAI's offerings in terms of speed and limits?
Been disappointed to see Ollama list models that are supported by the cloud product but not the Ollama app. It's becoming increasingly hard to deny that they're only interested in model inference just to turn a quick buck.
I'm looking forward to future ollama releases that might attempt parity with the cloud offerings. I've since moved onto the Ollama compatibility API on KoboldCPP since they don't have any such limits with their inference server.
In this case, it's not about whether it fits on my physical hardware or not. It's about what seems like an arbitrary restriction designed to start pushing users to their cloud offering.
Aren't these models consistently quite large and hard to run locally? It's possible that future Ollama releases will allow you to dynamically manage VRAM memory in a way that enables these models to run with acceleration on even modest GPU hardware (such as by dynamically loading layers for a single 'expert' into VRAM, and opportunistically batching computations that happen to rely on the same 'expert' parameters - essentially doing manually what mmap does for you in CPU-only inference) but these 'tricks' will nonetheless come at non-trivial cost in performance.
I know this is disappointing, but what business model would be best here for ollama?
1. Donationware - Let's be real, tokens are expensive and if they ask for everyone to chip in voluntarily people wouldn't do that and Ollama would go bust quickly.
2. Subscriptions (bootstrapped and no VCs) again like 1. people would have to pay for the cloud service as a subscription to be sustainable (would you?) or go bust.
3. Ads - Ollama could put ads in the free version but to remove them the users can pay for a higher tier, a somewhat good compromise, except developers don't like ads and don't like pay for their tools unless their company does it for them. No users = Ollama goes bust.
4. VCs - This is the current model which is why they have a cloud product and it keeps the main product free (for now). Again, if they cannot make money or sell to another company Ollama goes bust.
5. Fully Open Source (and 100% free) with Linux Foundation funding - Ollama could also go this route, but this means they wouldn't be a business anymore for investors and rely on the Linux Foundation's sponsors (Google, IBM, etc) for funding the LF to stay sustainable. The cloud product may stay for enterprises.
Ollama has already taken money from investors so they need to produce a return for them so 5. isn't an option in the long term.
6. Acquisition by another company - Ollama could get acquired and the product wouldn't change* (until the acquirer jacks up prices or messes with the product) which ultimately kills it anyway as the community moves on.
I don't see any other way that Ollama can not be enshittified without making a quick buck.
You just need to avoid VC backed tools and pay for bootstrapped ones without any ties to investors.
> I don't see any other way that Ollama can not be enshittified without making a quick buck.
Me neither. The mistake they did was getting outside investments, as now they're no longer in full control and eventually are gonna have to at least give the impression they give a shit about the investors, and it'll come at the cost of the users one way or another.
Please pay for your tools that are independently developed, we really need more community funding of projects so we can avoid this never-ending spiral of VC-fueled+killed tools.
Hosting through z.ai and synthetic.new. Both good experiences. z.ai even answers their support emails!! 5-stars ;)
I've had much less luck with other agentic software, including Claude Code. For these kinds of tasks, only Codex seems to come close.
https://www.litellm.ai/
When z.ia launched GLM-4.6, I subscribed to their Coding Pro plan. Although I haven't been coding as heavy this month as the prior two months, I used to hit Claude limits almost daily, often twice a day. That was with both the $20 and $100 plans. I have yet to hit a limit with z.ai and the server response is at least as good as Claude.
I mention synthetic.new as it's good to have options and I do appreciate them sponsoring the dev of Octofriend. z.ai is a China company and I think hosts in Singapore. That could be a blocker for some.
> “These entities advance the People’s Republic of China’s military modernization through the development and integration of advanced artificial intelligence research. This activity is contrary to the national security and foreign policy interests of the United States under Section 744.11 of the EAR.”
https://medium.com/ai-disruption/zhipu-ai-chinas-leading-lar...
At $6/month it's still pretty reasonable, IMO, and chucking less than $10 at it for three months probably gets you to the next pop-up token retailer offering introductory pricing, so long as the bubble doesn't burst before then.
Writing short runs of code and tests after I give an clear description of the expected behavior (because I have done the homework). I want to save the keystrokes and the mental energy spent on bookkeeping code, not have it think about the big problem for me.
Think short algorithms/transformations/script, and "smart" auto complete.
No writing entire systems/features or creating heavily interpolated things due to underspecified prompts - I'm not interested in those.
If you're looking for a cheap practical tool + don't care if it's not local, deepseek's non-reasoning model via openrouter is the most cost efficient by far for the work you describe.
I put 10 dollars in my account about 6 months ago and still haven't gotten through it, after heavy use semi regularly.
I haven't really stayed up on all the AI specific GPUs, but are there really cards with 300GB of VRAM?
My local HPC went for the 120GB version though, but 4 per node.
We are in this together! Hoping for more models to come from the labs in varying sizes that will fit on devices.
1. Donationware - Let's be real, tokens are expensive and if they ask for everyone to chip in voluntarily people wouldn't do that and Ollama would go bust quickly.
2. Subscriptions (bootstrapped and no VCs) again like 1. people would have to pay for the cloud service as a subscription to be sustainable (would you?) or go bust.
3. Ads - Ollama could put ads in the free version but to remove them the users can pay for a higher tier, a somewhat good compromise, except developers don't like ads and don't like pay for their tools unless their company does it for them. No users = Ollama goes bust.
4. VCs - This is the current model which is why they have a cloud product and it keeps the main product free (for now). Again, if they cannot make money or sell to another company Ollama goes bust.
5. Fully Open Source (and 100% free) with Linux Foundation funding - Ollama could also go this route, but this means they wouldn't be a business anymore for investors and rely on the Linux Foundation's sponsors (Google, IBM, etc) for funding the LF to stay sustainable. The cloud product may stay for enterprises.
Ollama has already taken money from investors so they need to produce a return for them so 5. isn't an option in the long term.
6. Acquisition by another company - Ollama could get acquired and the product wouldn't change* (until the acquirer jacks up prices or messes with the product) which ultimately kills it anyway as the community moves on.
I don't see any other way that Ollama can not be enshittified without making a quick buck.
You just need to avoid VC backed tools and pay for bootstrapped ones without any ties to investors.
Ollama gives me, essentially, a wrapper for llama.cpp and convenient hosting where I can download models.
I'm happy to pay for the bandwidth, plus a premium to cover their running this service.
I'm furthermore happy to pay a small charge to cover the development that they've done and continue to do to make local-inference easy for me.
Me neither. The mistake they did was getting outside investments, as now they're no longer in full control and eventually are gonna have to at least give the impression they give a shit about the investors, and it'll come at the cost of the users one way or another.
Please pay for your tools that are independently developed, we really need more community funding of projects so we can avoid this never-ending spiral of VC-fueled+killed tools.
Deleted Comment