Readit News logoReadit News
byefruit commented on Jules, our asynchronous coding agent   blog.google/technology/go... · Posted by u/meetpateltech
byefruit · 18 days ago
How is this different from https://github.com/google-gemini/gemini-cli ?

Edit: it seems this is a hosted version. Would be nice if they actually joined up some of their products.

byefruit commented on Qwen3-4B-Thinking-2507   huggingface.co/Qwen/Qwen3... · Posted by u/IdealeZahlen
cowpig · 18 days ago
Compare these rankings to actual usage: https://openrouter.ai/rankings

Claude is not cheap, why is it far and away the most popular if it's not top 10 in performance?

Qwen3 235b ranks highest on these benchmarks among open models, but I have never met someone who prefers its output over Deepseek R1. It's extremely wordy and often gets caught in thought loops.

My interpretation is that the models at the top of ArtificialAnalysis are focusing the most on public benchmarks in their training. Note I am not saying XAI is necessarily nefariously doing this, could just be that they decided it's better bang for the buck to rely on public benchmarks than to try to focus on building their own evaluation systems.

But Grok is not very good compared to the anthropic, openai, or google models despite ranking so highly in benchmarks.

byefruit · 18 days ago
The openrouter rankings can be biased.

For example, Google's inexplicable design decisions around libraries and APIs means it's often worth the 5% premium to just use OpenRouter to access their models. In other cases it's about which models particular agents default to.

Sonnet 4 is extremely good for tool-usage agentic setups though - something I have found other models struggle to do over a long-context.

byefruit commented on The U.K. closed a tax loophole for the global rich, now they're fleeing   wsj.com/world/uk/the-u-k-... · Posted by u/fortran77
byefruit · a month ago
Before just accepting this at face value, New Statesman claim this is not the case:

https://www.newstatesman.com/politics/2025/07/the-british-we...

byefruit commented on Mistral ships Le Chat – enterprise AI assistant that can run on prem   mistral.ai/news/le-chat-e... · Posted by u/_lateralus_
resource_waste · 4 months ago
Expected this comment.

Mistral has been consistently last place, or at least last place among ChatGPT, Claude, Llama, and Gemini/Gemma.

I know this because I had to use a permissive license for a side project and I was tortured by how miserably bad Mistral was, and how much better every other LLM was.

Need the best? ChatGPT

Need local stuff? Llama(maybe Gemma)

Need to do barely legal things that break most company's TOS? Mistral... although deepseek probably beats it in 2025.

For people outside Europe, we don't have patriotism for our LLMs, we just use the best. Mistral has barely any usecase.

byefruit · 4 months ago
You are probably getting downvoted because you don't give any model generations or versions ('ChatGPT') which makes this not very credible.
byefruit commented on Google Gemini has the worst LLM API   venki.dev/notes/google-ge... · Posted by u/indigodaddy
jeswin · 4 months ago
Can you allow prepaid credits as well please?
byefruit · 4 months ago
100% this. We actually use OpenRouter (and pay their surcharge) with Gemini 2.5 Pro just because we can actually control spend via spent limit on keys (A++ feature) and prepaid credit.
byefruit commented on Fossil fuels fall below 50% of US electricity for the first month on record   ember-energy.org/latest-u... · Posted by u/xnx
kasey_junk · 4 months ago
You are paying less than $5 a month for that level of energy generation?

Thats ummm extremely cheap.

byefruit · 4 months ago
Indeed, average in CA is $260/month so $5k pays off very fast in some places.
byefruit commented on Gemini 2.5 Flash   developers.googleblog.com... · Posted by u/meetpateltech
byefruit · 4 months ago
It's interesting that there's a price nearly 6x price difference between reasoning and no reasoning.

This implies it's not a hybrid model that can just skip reasoning steps if requested.

Anyone know what else they might be doing?

Reasoning means contexts will be longer (for thinking tokens) and there's an increase in cost to inference with a longer context but it's not going to be 6x.

Or is it just market pricing?

byefruit commented on AWS announces 85% price reductions for S3 Express One Zone   aws.amazon.com/blogs/aws/... · Posted by u/panrobo
byefruit · 4 months ago
"In addition, S3 Express One Zone has reduced the per-GB charges for data uploads and retrievals by 60 percent, and these charges now apply to all bytes transferred rather than just portions of requests greater than 512 KB"

It's not clear but are there cases where this could be a significant price rise? If you exclusively had small objects (<512kb) being written and read then this could add up quickly.

byefruit commented on Skywork-OR1: new SOTA 32B thinking model with open weight   github.com/SkyworkAI/Skyw... · Posted by u/naomiclarkson
byefruit · 4 months ago
> Both of our models are trained on top of DeepSeek-R1-Distill-Qwen-7B and DeepSeek-R1-Distill-Qwen-32B.

Not to take away from their work but this shouldn't be buried at the bottom of the page - there's a gulf between completely new models and fine-tuning.

byefruit commented on It's five grand a day to miss our S3 exit   world.hey.com/dhh/it-s-fi... · Posted by u/ksec
jarito · 5 months ago
Pretty sure those services are only for transferring data into S3, not out.
byefruit · 5 months ago
https://aws.amazon.com/snowball/pricing/ snowball seems to support getting data out of S3 though you still end up paying extortionate egress charges.

u/byefruit

KarmaCake day913August 27, 2012View Original