Readit News logoReadit News
sunpazed commented on GPT-OSS 120B Runs at 3000 tokens/sec on Cerebras   cerebras.ai/blog/openai-g... · Posted by u/samspenc
sunpazed · a month ago
This is really impressive. At these speeds, it’s possible to run agents with multi-tool turns within seconds. Consider it a feature rich, “non-deterministic API” for your platform or business.
sunpazed commented on Gemma 3 270M: Compact model for hyper-efficient AI   developers.googleblog.com... · Posted by u/meetpateltech
canyon289 · 4 months ago
Hi all, I built these models with a great team. They're available for download across the open model ecosystem so give them a try! I built these models with a great team and am thrilled to get them out to you.

From our side we designed these models to be strong for their size out of the box, and with the goal you'll all finetune it for your use case. With the small size it'll fit on a wide range of hardware and cost much less to finetune. You can try finetuning them yourself in a free colab in under 5 minutes

For picking a Gemma size this is a video I recorded for the 1b to 27b sizes earlier this year, 270m being the newest addition

https://www.youtube.com/watch?v=qcjrduz_YS8

Hacker News Disclaimer I really like working at Google so with that; All my opinions here are my own, I'm a researcher so I'll largely focus on technical questions, and I'll share what I can.

sunpazed · 4 months ago
Thanks so much for delivering on this model. It’s great as a draft model for speculative decoding. Keep up the great work!!
sunpazed commented on GPT-OSS-120B runs on just 8GB VRAM & 64GB+ system RAM   old.reddit.com/r/LocalLLa... · Posted by u/zigzag312
sunpazed · 4 months ago
Don’t have enough ram for this model, however the smaller 20B model runs nice and fast on my MacBook and is reasonably good for my use-cases. Pity that function calling is still broken with llama.cpp
sunpazed commented on Run LLMs on Apple Neural Engine (ANE)   github.com/Anemll/Anemll... · Posted by u/behnamoh
antirez · 7 months ago
The README lacks the most important thing: how many more tokens/sec at the same quantization, compared to llama.cpp / MLX? It is worth to switch default platforms only if there is a major improvement.
sunpazed · 7 months ago
In my testing, tokens per sec is half the speed of the GPU, however the power usage is 10x less — 2 watts ANE vs 20 watts GPU on my M4 Pro.
sunpazed commented on Run LLMs on Apple Neural Engine (ANE)   github.com/Anemll/Anemll... · Posted by u/behnamoh
sunpazed · 7 months ago
The key benefit is significant lower power usage. Benchmarked llama3.2-1B on my machines; M1 Max (47t/s, ~1.8 watts), M4 Pro (62t/s, ~2.8 watts). The GPU is twice as fast (even faster on the Max), but draws much more power (~20 watts) vs the ANE.

Also the ANE models are limited to 512 tokens of context, so unlikely yet to use these in production.

sunpazed commented on Connomore64: Cycle exact emulation of the C64 using parallel microcontrollers   github.com/c1570/Connomor... · Posted by u/codewiz
sunpazed · 7 months ago
Love this! The C64 introduced me to the world of computers as a kid. I still have that almost 40 year old machine in my collection, but I’m weary of failure every time I turn it on. This is somewhat better than the MiSTer as I can use physical peripherals with it. Great work!
sunpazed commented on Everything wrong with MCP   blog.sshh.io/p/everything... · Posted by u/sshh12
sunpazed · 8 months ago
Let’s remind ourselves that MCP was announced to the world in November 2024, only 4 short months ago. The RFC is actively being worked on and evolving.
sunpazed commented on Quick Primer on MCP Using Ollama and LangChain   polarsparc.com/xhtml/MCP.... · Posted by u/bswamina
gsibble · 8 months ago
MCP is great for when you’re integrating tools locally into IDEs and such. It’s a terrible standard for building more robust applications with multi-user support. Security and authentication are completely lacking.

99% of people wouldn’t be able to find the API keys you need to feed into most MCP servers.

sunpazed · 8 months ago
While I’m a fan, we’re not using MCP for any production workloads for these very reasons.

Authentication, session management, etc, should be handled outside of the standard, and outside of the LLM flow entirely.

I recently mused on these here; https://github.com/sunpazed/agent-mcp/blob/master/mcp-what-i...

sunpazed commented on Apache ECharts   echarts.apache.org/en/ind... · Posted by u/tomtomistaken
d_t_w · 8 months ago
We have been using Apache ECharts in our products[1] since 2020.

Cannot recommend it enough - absolutely fantastic library, great documentation, zero issues of any impact to us in five years.

My only wish is for the keyboard accessibility ticket[2] to get some love!

[1] https://factorhouse.io

[2] https://github.com/apache/echarts/issues/14706

sunpazed · 8 months ago
Funny to see you here. I’m from operata.io (also a Melbourne based startup) and would see your website (operatr.io) when I would mis-spell mine!
sunpazed commented on The Agent2Agent Protocol (A2A)   developers.googleblog.com... · Posted by u/meetpateltech
zellyn · 8 months ago
It’s frustratingly difficult to see what these (A2A and MCP) protocols actually look like. All I want is a simple example conversation that includes the actual LLM outputs used to trigger a call and the JSON that goes over the wire… maybe I’ll take some time and make a cheat-sheet.

I have to say, the endorsements at the end somehow made this seem worse…

sunpazed · 8 months ago
I had the same frustration and wanted to see "under the hood", so I coded up this little agent tool to play with MCP (sse and stdio), https://github.com/sunpazed/agent-mcp

I really is just json-rpc 2.0 under the hood, either piped to stdio or POSTed over http.

u/sunpazed

KarmaCake day429September 15, 2012
About
http://twitter.com/sunpazed
View Original