soleveloper (u/soleveloper)

soleveloper commented on How will OpenAI compete? ben-evans.com/benedicteva... · Posted by u/iamskeole

bandrami · 17 days ago

They're literally text generators so that's... troubling

soleveloper · 16 days ago

There are incredible authors who happen to be dyslexic, and brilliant mathematicians who struggle with basic arithmetic. We don't dismiss their core work just because a minor lemma was miscalculated or a word was misspelled. The same logic applies here: if we dismiss the semantic capabilities of these models based entirely on their token-level spelling flaws, we miss out on their actual utility.

soleveloper commented on How will OpenAI compete? ben-evans.com/benedicteva... · Posted by u/iamskeole

bandrami · 17 days ago

Yesterday I asked mistral to list five mammals that don't have "e" in their name. Number three was "otter" and number five was "camel".

phi4-mini-reasoning took the same prompt and bailed out because (at least according to its trace) it interpreted it as meaning "can't have a, e, i, o, or u in the name".

Local is the only inference paradigm I'm interested in, but these things have a way to go.

soleveloper · 17 days ago

Treat LLMs as dyslexic when it comes to spelling. Assess their strengths and weaknesses accordingly.

soleveloper commented on Claws are now a new layer on top of LLM agents twitter.com/karpathy/stat... · Posted by u/Cyphase

jameslk · 21 days ago

One safety pattern I’m baking into CLI tools meant for agents: anytime an agent could do something very bad, like email blast too many people, CLI tools now require a one-time password

The tool tells the agent to ask the user for it, and the agent cannot proceed without it. The instructions from the tool show an all caps message explaining the risk and telling the agent that they must prompt the user for the OTP

I haven't used any of the *Claws yet, but this seems like an essential poor man's human-in-the-loop implementation that may help prevent some pain

I prefer to make my own agent CLIs for everything for reasons like this and many others to fully control aspects of what the tool may do and to make them more useful

soleveloper · 21 days ago

Will that protect you from the agent changing the code to bypass those safety mechanisms, since the human is "too slow to respond" or in case of "agent decided emergency"?

soleveloper commented on The path to ubiquitous AI (17k tokens/sec) taalas.com/the-path-to-ub... · Posted by u/sidnarsipur

noveltyaccount · 22 days ago

That would be very cool, get an upgraded model every couple of months. Maybe PCIe form factor.

soleveloper · 22 days ago

Yes, and even holding couple of cartridges for different scenarios e.g image generation, coding, tts/stt, etc

soleveloper commented on The path to ubiquitous AI (17k tokens/sec) taalas.com/the-path-to-ub... · Posted by u/sidnarsipur

soleveloper · 22 days ago

There are so many use cases for small and super fast models that are already in size capacity -

* Many top quality tts and stt models

* Image recognition, object tracking

* speculative decoding, attached to a much bigger model (big/small architecture?)

* agentic loop trying 20 different approaches / algorithms, and then picking the best one

* edited to add! Put 50 such small models to create a SOTA super fast model

soleveloper commented on The path to ubiquitous AI (17k tokens/sec) taalas.com/the-path-to-ub... · Posted by u/sidnarsipur

dust42 · 22 days ago

This is not a general purpose chip but specialized for high speed, low latency inference with small context. But it is potentially a lot cheaper than Nvidia for those purposes.

Tech summary:

  - 15k tok/sec on 8B dense 3bit quant (llama 3.1) 
  - limited KV cache
  - 880mm^2 die, TSMC 6nm, 53B transistors
  - presumably 200W per chip
  - 20x cheaper to produce
  - 10x less energy per token for inference
  - max context size: flexible
  - mid-sized thinking model upcoming this spring on same hardware
  - next hardware supposed to be FP4 
  - a frontier LLM planned within twelve months

This is all from their website, I am not affiliated. The founders have 25 years of career across AMD, Nvidia and others, $200M VC so far.

Certainly interesting for very low latency applications which need < 10k tokens context. If they deliver in spring, they will likely be flooded with VC money.

Not exactly a competitor for Nvidia but probably for 5-10% of the market.

Back of napkin, the cost for 1mm^2 of 6nm wafer is ~$0.20. So 1B parameters need about $20 of die. The larger the die size, the lower the yield. Supposedly the inference speed remains almost the same with larger models.

Interview with the founders: https://www.nextplatform.com/2026/02/19/taalas-etches-ai-mod...

soleveloper · 22 days ago

In 20$ a die, they could sell Gameboy style cartridges for different models.

soleveloper commented on Show HN: Trained YOLOX from scratch to avoid Ultralytics (aircraft detection) austinsnerdythings.com/20... · Posted by u/auspiv

soleveloper · 24 days ago

Great write-up! There are so many directions you can take it to:

* By training on user data, you can source specific model data images, and then train & classify the airplane model. It might require another model, where only the bbox will be the input, together with distance/calculated measurements of the object ("pixel size"), and the orientation of the plane (side? front? belly?).

* Provide alerts/notification of special aircrafts like helicopters, military, airforce-1, etc'

* When bbox is detected, you can run super-resolution upscaling on the photo/stream of images

soleveloper commented on Lessons from 14 years at Google addyosmani.com/blog/21-le... · Posted by u/cdrnsf

trescenzi · 2 months ago

> At scale, even your bugs have users.

First place I worked right out of college had a big training seminar for new hires. One day we were told the story of how they’d improved load times from around 5min to 30seconds, this improvement was in the mid 90s. The negative responses from clients were instant. The load time improvements had destroyed their company culture. Instead of everyone coming into the office, turning on their computers, and spending the next 10min chatting and drinking coffee the software was ready before they’d even stood up from their desk!

The moral of the story, and the quote, isn’t that you shouldn’t improve things. Instead it’s a reminder that the software you’re building doesn’t exist in a PRD or a test suite. It’s a system that people will interact with out there in the world. Habits with form, workarounds will be developed, bugs will be leaned for actual use cases.

This makes it critically important that you, the software engineer, understand the purpose and real world usage of your software. Your job isn’t to complete tickets that fulfill a list of asks from your product manager. Your job is to build software that solves users problems.

soleveloper · 2 months ago

This is a perfect example of a "bug" actually being a requirement. The travel industry faced a similar paradox known as the Labor Illusion: users didn't trust results that returned too quickly. Companies intentionally faked the "loading" phase because A/B tests showed that artificial latency increased conversion. The "inefficiency" was the only way to convince users the software was working hard. Millions of collective hours were spent staring at placebo progress bars until Google Flights finally leveraged their search-engine trust to shift the industry to instant results.

soleveloper commented on Date bug in Rust-based coreutils affects Ubuntu 25.10 automatic updates lwn.net/Articles/1043103/... · Posted by u/blueflow

junon · 5 months ago

That's so, so much easier said than done.

soleveloper · 5 months ago

Is it? I hope I won't step on somebody's else toes: GenAI would greatly help cover existing functionality and prevent regressions in new implementation. For each tool, generate multiple cases, some based on documentation and some from the LLM understanding of the util. Classic input + expected pairs. Run with both GNU old impl and the new Rust impl.

First - cases where expected+old+new are identical, should go to regression suite. Now a HUMAN should take a look in this order: 1. Cases where expected+old are identical, but rust is different. 2. If time allows - Cases where expected+rust are identical , but old is different.

TBH, after #1 (expected+old, vs. rust) I'd be asking the GenAI to generate more test cases in these faulty areas.

soleveloper commented on Show HN: ModernBERT in Pure C github.com/hardik-vala/mo... · Posted by u/HardikVala

soleveloper · 5 months ago

Hey, cool initiative!

Worth mentioning in the title that it's CPU-only: >1200 tokens/s on a single thread is impressive.

Have you considered doing optimization iterations like nanogpt-speedrun? Would be interesting to see how far you can push the performance.