byefruit (u/byefruit)

byefruit commented on Jules, our asynchronous coding agent blog.google/technology/go... · Posted by u/meetpateltech

byefruit · 18 days ago

How is this different from https://github.com/google-gemini/gemini-cli ?

Edit: it seems this is a hosted version. Would be nice if they actually joined up some of their products.

byefruit commented on Qwen3-4B-Thinking-2507 huggingface.co/Qwen/Qwen3... · Posted by u/IdealeZahlen

cowpig · 18 days ago

Compare these rankings to actual usage: https://openrouter.ai/rankings

Claude is not cheap, why is it far and away the most popular if it's not top 10 in performance?

Qwen3 235b ranks highest on these benchmarks among open models, but I have never met someone who prefers its output over Deepseek R1. It's extremely wordy and often gets caught in thought loops.

My interpretation is that the models at the top of ArtificialAnalysis are focusing the most on public benchmarks in their training. Note I am not saying XAI is necessarily nefariously doing this, could just be that they decided it's better bang for the buck to rely on public benchmarks than to try to focus on building their own evaluation systems.

But Grok is not very good compared to the anthropic, openai, or google models despite ranking so highly in benchmarks.

byefruit · 18 days ago

The openrouter rankings can be biased.

For example, Google's inexplicable design decisions around libraries and APIs means it's often worth the 5% premium to just use OpenRouter to access their models. In other cases it's about which models particular agents default to.

Sonnet 4 is extremely good for tool-usage agentic setups though - something I have found other models struggle to do over a long-context.

byefruit commented on The U.K. closed a tax loophole for the global rich, now they're fleeing wsj.com/world/uk/the-u-k-... · Posted by u/fortran77

byefruit · a month ago

Before just accepting this at face value, New Statesman claim this is not the case:

https://www.newstatesman.com/politics/2025/07/the-british-we...

byefruit commented on Mistral ships Le Chat – enterprise AI assistant that can run on prem mistral.ai/news/le-chat-e... · Posted by u/_lateralus_

resource_waste · 4 months ago

Expected this comment.

Mistral has been consistently last place, or at least last place among ChatGPT, Claude, Llama, and Gemini/Gemma.

I know this because I had to use a permissive license for a side project and I was tortured by how miserably bad Mistral was, and how much better every other LLM was.

Need the best? ChatGPT

Need local stuff? Llama(maybe Gemma)

Need to do barely legal things that break most company's TOS? Mistral... although deepseek probably beats it in 2025.

For people outside Europe, we don't have patriotism for our LLMs, we just use the best. Mistral has barely any usecase.

byefruit · 4 months ago

You are probably getting downvoted because you don't give any model generations or versions ('ChatGPT') which makes this not very credible.

byefruit commented on Google Gemini has the worst LLM API venki.dev/notes/google-ge... · Posted by u/indigodaddy

jeswin · 4 months ago

Can you allow prepaid credits as well please?

byefruit · 4 months ago

100% this. We actually use OpenRouter (and pay their surcharge) with Gemini 2.5 Pro just because we can actually control spend via spent limit on keys (A++ feature) and prepaid credit.

byefruit commented on Fossil fuels fall below 50% of US electricity for the first month on record ember-energy.org/latest-u... · Posted by u/xnx

kasey_junk · 4 months ago

You are paying less than $5 a month for that level of energy generation?

Thats ummm extremely cheap.

byefruit · 4 months ago

Indeed, average in CA is $260/month so $5k pays off very fast in some places.

byefruit commented on Gemini 2.5 Flash developers.googleblog.com... · Posted by u/meetpateltech

byefruit · 4 months ago

It's interesting that there's a price nearly 6x price difference between reasoning and no reasoning.

This implies it's not a hybrid model that can just skip reasoning steps if requested.

Anyone know what else they might be doing?

Reasoning means contexts will be longer (for thinking tokens) and there's an increase in cost to inference with a longer context but it's not going to be 6x.

Or is it just market pricing?

byefruit commented on AWS announces 85% price reductions for S3 Express One Zone aws.amazon.com/blogs/aws/... · Posted by u/panrobo

byefruit · 4 months ago

"In addition, S3 Express One Zone has reduced the per-GB charges for data uploads and retrievals by 60 percent, and these charges now apply to all bytes transferred rather than just portions of requests greater than 512 KB"

It's not clear but are there cases where this could be a significant price rise? If you exclusively had small objects (<512kb) being written and read then this could add up quickly.

byefruit commented on Skywork-OR1: new SOTA 32B thinking model with open weight github.com/SkyworkAI/Skyw... · Posted by u/naomiclarkson

byefruit · 4 months ago

> Both of our models are trained on top of DeepSeek-R1-Distill-Qwen-7B and DeepSeek-R1-Distill-Qwen-32B.

Not to take away from their work but this shouldn't be buried at the bottom of the page - there's a gulf between completely new models and fine-tuning.

byefruit commented on It's five grand a day to miss our S3 exit world.hey.com/dhh/it-s-fi... · Posted by u/ksec

jarito · 5 months ago

Pretty sure those services are only for transferring data into S3, not out.

byefruit · 5 months ago

https://aws.amazon.com/snowball/pricing/ snowball seems to support getting data out of S3 though you still end up paying extortionate egress charges.