Readit News logoReadit News
irthomasthomas commented on Kimi K2 1T model runs on 2 512GB M3 Ultras   twitter.com/awnihannun/st... · Posted by u/jeudesprits
bgwalter · 4 days ago
Because you have Cloudflare (MITM 1), Openrouter (MITM 2) and finally the "AI" provider who can all read, store, analyze and resell your queries.

EDIT: Thanks for downvoting what is literally one of the most important reasons for people to use local models. Denying and censoring reality does not prevent the bubble from bursting.

irthomasthomas · 3 days ago
you can use chutes.ai TEE (Trusted Execution Environment) and Kimi K2 is running at about 100t/s rn
irthomasthomas commented on GPT-5.2   openai.com/index/introduc... · Posted by u/atgctg
rsanek · 6 days ago
it's the biggest model on OpenRouter, even if you exclude free tier usage https://openrouter.ai/state-of-ai
irthomasthomas · 6 days ago
Roleplay is the largest use-case on openrouter.
irthomasthomas commented on GPT-5.2   openai.com/index/introduc... · Posted by u/atgctg
simonkagedal · 6 days ago
irthomasthomas · 6 days ago
I was expecting to see a pterodactyl :(
irthomasthomas commented on DeepSeek uses banned Nvidia chips for AI model, report says   finance.yahoo.com/news/ch... · Posted by u/goodway
irthomasthomas · 8 days ago
Related: Deepseek just leapfrogged the competition. Scores gold in 2025's IMO, IOI, and ICPC world finals with an openweights model.
irthomasthomas commented on New benchmark shows top LLMs struggle in real mental health care   swordhealth.com/newsroom/... · Posted by u/RicardoRei
embedding-shape · 8 days ago
> Otherwise comparison would not be super fair.

Wouldn't that be easy to make fair by making sure all models tried it with the same prompts? So you have model X and Y, and prompts A and B, and X runs once with A, once with B, and same for Y.

Reason I ask, is because in my own local benchmarks I do for each model release with my own tasks, I've noticed a huge variance in quality of responses based on the prompts themselves. Slight variation of wording seems to have a big effect on the final responses, and those variations seems to again have a big variance of effect depending on the model.

Sometimes a huge system prompt makes a model return much higher quality responses while another model gives much higher quality responses when the system prompt is as small as it possible can. At least this is what I'm seeing with the local models I'm putting under test with my private benchmarks.

irthomasthomas · 8 days ago
Did you re-test the past models with the new prompt you found? How many times did you run each prompt? Did you use the same rubric to score each experiment?
irthomasthomas commented on Jony Ive's OpenAI Device Barred From Using 'io' Name   macrumors.com/2025/12/05/... · Posted by u/thm
shaftway · 12 days ago
The article says the claimed in court that they're not working on a wearable device. So that rules out headphones, glasses, maybe comm badges.
irthomasthomas · 12 days ago
So, an echo style tabletop device?
irthomasthomas commented on Gemini 3 Pro: the frontier of vision AI   blog.google/technology/de... · Posted by u/xnx
Zambyte · 12 days ago
I wonder how they would behave given a system prompt that asserts "dogs may have more or less than four legs".
irthomasthomas · 12 days ago
That may work but what actual use would it be? You would be plugging one of a million holes. A general solution is needed.
irthomasthomas commented on Gemini 3 Pro: the frontier of vision AI   blog.google/technology/de... · Posted by u/xnx
vunderba · 12 days ago
I just re-ran that image through Gemini 3.0 Pro via AI Studio and it reported:

  I've moved on to the right hand, meticulously tagging each finger. After completing the initial count of five digits, I noticed a sixth! There appears to be an extra digit on the far right. This is an unexpected finding, and I have counted it as well. That makes a total of eleven fingers in the image.
This right HERE is the issue. It's not nearly deterministic enough to rely on.

irthomasthomas · 12 days ago
Thanks for that. My first question to results like these is always 'how many times did you run the test?'. N=1 tells us nothing. N=2 tells us something.
irthomasthomas commented on Gemini 3 Pro: the frontier of vision AI   blog.google/technology/de... · Posted by u/xnx
Rover222 · 12 days ago
I just tried to get Gemini to produce an image of a dog with 5 legs to test this out, and it really struggled with that. It either made a normal dog, or turned the tail into a weird appendage.

Then I asked both Gemini and Grok to count the legs, both kept saying 4.

Gemini just refused to consider it was actually wrong.

Grok seemed to have an existential crisis when I told it it was wrong, becoming convinced that I had given it an elaborate riddle. After thinking for an additional 2.5 minutes, it concluded: "Oh, I see now—upon closer inspection, this is that famous optical illusion photo of a "headless" dog. It's actually a three-legged dog (due to an amputation), with its head turned all the way back to lick its side, which creates the bizarre perspective making it look decapitated at first glance. So, you're right; the dog has 3 legs."

You're right, this is a good test. Right when I'm starting to feel LLMs are intelligent.

irthomasthomas · 12 days ago
Isn't this proof that LLMs still don't really generalize beyond their training data?
irthomasthomas commented on The RAM shortage comes for us all   jeffgeerling.com/blog/202... · Posted by u/speckx
agoodusername63 · 13 days ago
isn't that what the mixture of experts trick that all the big players do is? Bunch of smaller, tightly focused models
irthomasthomas · 13 days ago
Not exactly. MoE uses a router model to select a subset of layers per token. This makes them faster but still requires the same amount of RAM.

u/irthomasthomas

KarmaCake day3106December 29, 2019
About
undecidability.com

crispysky.com

x.com/xundecidability

github.com/irthomasthomas/llm-consortium

View Original