jmorgan (u/jmorgan) - Readit News

jmorgan commented on Gemma 3 270M: Compact model for hyper-efficient AI developers.googleblog.com... · Posted by u/meetpateltech

canyon289 · 14 days ago

Hi all, I built these models with a great team. They're available for download across the open model ecosystem so give them a try! I built these models with a great team and am thrilled to get them out to you.

From our side we designed these models to be strong for their size out of the box, and with the goal you'll all finetune it for your use case. With the small size it'll fit on a wide range of hardware and cost much less to finetune. You can try finetuning them yourself in a free colab in under 5 minutes

For picking a Gemma size this is a video I recorded for the 1b to 27b sizes earlier this year, 270m being the newest addition

https://www.youtube.com/watch?v=qcjrduz_YS8

Hacker News Disclaimer I really like working at Google so with that; All my opinions here are my own, I'm a researcher so I'll largely focus on technical questions, and I'll share what I can.

jmorgan · 13 days ago

Amazing work. This model feels really good at one-off tasks like summarization and autocomplete. I really love that you released a quantized aware training version on launch day as well, making it even smaller!

jmorgan commented on Ollama Turbo ollama.com/turbo... · Posted by u/amram_art

rohansood15 · 22 days ago

The 'Sign In' link on the Ollama Mac App when you click Turbo doesn't work...

jmorgan · 22 days ago

It should open ollama.com/connect – sorry about that. Feel free to message me jeff@ollama.com if you keep seeing issues

jmorgan commented on Open models by OpenAI openai.com/open-models/... · Posted by u/lackoftactics

nodesocket · 22 days ago

Anybody got this working in Ollama? I'm running latest version 0.11.0 with WebUI v0.6.18 but getting:

> List the US presidents in order starting with George Washington and their time in office and year taken office.

>> 00: template: :3: function "currentDate" not defined

jmorgan · 22 days ago

Sorry about this. Re-downloading Ollama should fix the error

jmorgan commented on Magistral — the first reasoning model by Mistral AI mistral.ai/news/magistral... · Posted by u/meetpateltech

simonw · 3 months ago

Tool calling isn't enabled in the official Magistral Small GGUF (or the Ollama one) which is sad. Hope they (or someone else) fix that soon.

jmorgan · 3 months ago

Working on adding tool calling support to Magistral in Ollama. It requires a tokenizer change and also uses a new tool calling format. Excited to see the results of combining thinking + tool calling!

jmorgan commented on Deepseek R1 Distill 8B Q40 on 4 x Raspberry Pi 5 github.com/b4rtaz/distrib... · Posted by u/b4rtazz

dheera · 6 months ago

Why tf isn't ollama in apt-get yet?

F these curl|sh installs.

jmorgan · 6 months ago

This is a great point. apt-get would definitely be a better install experience and upgrade experience (that's what I would want too). Tailscale does this amazing well: https://tailscale.com/download/linux

The main issue for the maintainer team would be the work in hosting and maintaining all the package repos for apt, yum, etc, and making sure the we handle the case where nvidia/amd drivers aren't installed (quite common on cloud VMs). Mostly a matter of time and putting in the work.

For now every release of Ollama includes a minimal archive with the ollama binary and required dynamic libraries: https://github.com/ollama/ollama/blob/main/docs/linux.md#man.... But we could definitely do better

jmorgan commented on Qwen2.5-1M: Deploy your own Qwen with context length up to 1M tokens qwenlm.github.io/blog/qwe... · Posted by u/meetpateltech

simonw · 7 months ago

Huh! I had incorrectly assumed that was for output, not input. Thanks!

YES that was it:

  files-to-prompt \
    ~/Dropbox/Development/llm \
    -e py -c | \
  llm -m q1m 'describe this codebase in detail' \
   -o num_ctx 80000

I was watching my memory usage and it quickly maxed out my 64GB so I hit Ctrl+C before my Mac crashed.

jmorgan · 7 months ago

Sorry this isn't more obvious. Ideally VRAM usage for the context window (the KV cache) becomes dynamic, starting small and growing with token usage, whereas right now Ollama defaults to a size of 2K which can be overridden at runtime. A great example of this is vLLM's PagedAttention implementation [1] or Microsoft's vAttention [2] which is CUDA-specific (and there are quite a few others).

1M tokens will definitely require a lot of KV cache memory. One way to reduce the memory footprint is to use KV cache quantization, which has recently been added behind a flag [3] and will 1/4 the memory footprint if 4-bit KV cache quantization is used (OLLAMA_KV_CACHE_TYPE=q4_0 ollama serve)

[1] https://arxiv.org/pdf/2309.06180

[2] https://github.com/microsoft/vattention

[3] https://smcleod.net/2024/12/bringing-k/v-context-quantisatio...

jmorgan commented on Phi 4 available on Ollama ollama.com/library/phi4... · Posted by u/eadz

andhuman · 8 months ago

I’ve seen on the localllama subreddit that some GGUFs have bugs in them. The one recommended was by unsloth. However, I don’t know how the Ollama GGUF holds up.

jmorgan · 8 months ago

Phi-4's architecture changed slightly from Phi-3.5 (it no longer uses a sliding window of 2,048 tokens [1]), causing a change in the hyperparameters (and ultimately an error at inference time for some published GGUF files on Hugging Face, since the same architecture name/identifier was re-used between the two models).

For the Phi-4 uploaded to Ollama, the hyperparameters were set to avoid the error. The error should stop occurring in the next version of Ollama [2] for imported GGUF files as well

In retrospect, a new architecture name should probably have been used entirely, instead of re-using "phi3".

[1] https://arxiv.org/html/2412.08905v1

[2] https://github.com/ollama/ollama/releases/tag/v0.5.5

jmorgan commented on Fast LLM Inference From Scratch (using CUDA) andrewkchan.dev/posts/yal... · Posted by u/homarp

reasonableklout · 8 months ago

Hi, I'm the author. Thanks for sharing, was great to wake up to my blog post on the front page! Would love to hear any feedback or if I missed anything.

jmorgan · 8 months ago

Thank you for writing this!