killerstorm (u/killerstorm)

killerstorm commented on Tell HN: Anthropic expires paid credits after a year · Posted by u/maytc

backprop1989 · 20 days ago

Accounting rules. If the credits last indefinitely, any unused credits cannot be counted as revenue. Ran into this at my last company when we signed a big contract and gave them hundreds of thousands of dollars in non-expiring credits. Our accountant went nuts when we told him.

killerstorm · 18 days ago

Skype expires the credits. But then allows customer to restore them. I think it's the best to comply with laws: it's just a mild annoyance for the customer.

killerstorm commented on Show HN: FFlags – Feature flags as code, served from the edge fflags.com... · Posted by u/tushr

ricardobayes · 19 days ago

Why does this need to be a dependency? In my view feature flags are core enough not to be outsourced to a third party. Although, there are companies using libraries for "isEven" so there might be a market for it.

killerstorm · 18 days ago

I thought it's satire. Like "/dev/null as a service".

killerstorm commented on Claude Opus 4.1 anthropic.com/news/claude... · Posted by u/meetpateltech

djha-skin · 19 days ago

Opus 4(.1) is so expensive[1]. Even Sonnet[2] costs me $5 per hour (basically) using OpenRouter + Codename Goose[3]. The crazy thing is Sonnet 3.5 costs the same thing[4] right now. Gemini Flash is more reasonable[5], but always seems to make the wrong decisions in the end, spinning in circles. OpenAI is better, but still falls short of Claude's performance. Claude also gives back 400's from its API if you CTRL-C in the middle though, so that's annoying.

Economics is important. Best bang for the buck seems to be OpenAI ChatGPT 4.1 mini[6]. Does a decent job, doesn't flood my context window with useless tokens like Claude does, API works every time. Gets me out of bad spots. Can get confused, but I've been able to muddle through with it.

1: https://openrouter.ai/anthropic/claude-opus-4.1

2: https://openrouter.ai/anthropic/claude-sonnet-4

3: https://block.github.io/goose/

4: https://openrouter.ai/anthropic/claude-3.5-sonnet

5: https://openrouter.ai/google/gemini-2.5-flash

6: https://openrouter.ai/openai/gpt-4.1-mini

killerstorm · 19 days ago

Well, it's expensive compared to other models. But it's often much cheaper than human labor.

E.g. if need a self-contained script to do some data processing, for example, Opus can often do that in one shot. 500 line Python script would cost around $1, and as long as it's not tricky it just works - you don't need back-and-forth.

I don't think it's possible to employ any human to make 500 line Python script for $1 (unless it's a free intern or a student), let alone do it in one minute.

Of course, if you use LLM interactively, for many small tasks, Opus might be too expensive, and you probably want a faster model anyway. Really depends on how you use it.

(You can do quite a lot in file-at-once mode. E.g. Gemini 2.5 Flash could write 35 KB of code of a full ML experiment in Python - self-contained with data loading, model setup training, evaluation, all in one file, pretty much on the first try.)

killerstorm commented on Show HN: Kitten TTS – 25MB CPU-Only, Open-Source TTS Model github.com/KittenML/Kitte... · Posted by u/divamgupta

killerstorm · 19 days ago

I'm curious why smallish TTS models have metallic voice quality.

The pronunciation sounds about right - i thought it's the hard part. And the model does it well. But voice timbre should be simpler to fix? Like, a simple FIR might improve it?

killerstorm commented on Persona vectors: Monitoring and controlling character traits in language models anthropic.com/research/pe... · Posted by u/itchyjunk

andsoitis · 21 days ago

> Other personality changes are subtler but still unsettling, like when models start sucking up to users or making up facts.

My understanding is that the former (sucking up) is a personality trait, substantially influenced by the desire to facilitate engagement. The latter (making up facts), I do not think is correct to ascribe to a personality trait (like compulsive liar); instead, it is because the fitness function of LLMs drive them to produce some answer and they do not know what they're talking about, but produce strings of text based on statistics.

killerstorm · 21 days ago

"I don't know" is one of possible answers.

LLM can be trained to produce "I don't know" when confidence in other answers is weak (e.g. weak or mixed signals). Persona vector can also nudge it into that direction.

killerstorm commented on Women dating safety app 'Tea' breached, users' IDs posted to 4chan 404media.co/women-dating-... · Posted by u/gloxkiqcza

ok123456 · a month ago

We need to stop allowing companies that are not directly engaged in financial services to request government IDs.

Facebook shouldn't legally be allowed to demand an ID any more than this disaster of an "app."

Now tens of thousands of people will be subject to identity theft because someone thought this was a neat growth hacking pattern for their ethically dubious idea of a social networking site.

killerstorm · a month ago

There are verifiable credentials protocols which would let a site to check something (and prove that they checked it) without de-anonymizing the user.

It can be done with fairly basic cryptography. But the infrastructure around it would grow only if there's a demand. Otherwise people go with lowest denominator.

killerstorm commented on Jujutsu for busy devs maddie.wtf/posts/2025-07-... · Posted by u/Bogdanp

tomasyany · a month ago

Don't want to sound old school, but git works perfectly fine.

The learning curve might be a bit difficult, but afterwards everything makes sense. And let's be honest, you just need a few actions (pull, add, reset, branch, commit) to use it in 95% of the cases.

killerstorm · a month ago

If you work in a team you need to understand rebasing and squashing, unless you can convince team to never use these features.

A lot of people are religious about rebasing, "clean" commit history. But it's pretty much incompatible with several devs working on a single branch. I.e. when you work on something complex, perhaps under time pressure, git habits bite you in the ass. It's not fine.

killerstorm commented on Gemini with Deep Think achieves gold-medal standard at the IMO deepmind.google/discover/... · Posted by u/meetpateltech

bwfan123 · a month ago

Problem 6 is puzzling. Neither openai nor deepmind answered it. Humans would put out partial answers - but here we saw no answer which is odd.

Does that mean that the llms realized they could not solve it. I thought that was one of the limitations of LLMs in that they dont know what they dont know, and it is really impossible without a solver to know the consistency of an argument, ie, know that one knows.

killerstorm · a month ago

That applies only to most basic use of LLM: pre-trained LLM generating text.

You can do a lot of things on top: e.g. train a linear probe to give a confidence score. Yes, it won't be 100% reliable, but it might be reliable if you constraint it to a domain like math.

killerstorm commented on Vibe coding service Replit deleted production database, faked data, told fibs theregister.com/2025/07/2... · Posted by u/beardyw

sixhobbits · a month ago

I followed this on Twitter and it all seems a bit contrived to me, as if the guy set up the situation to go viral.

- He's a courseboi that sells a community that will make you 'Get from $0 to $100 Million in ARR'

- The stuff about 'it was during a code freeze' doesn't make sense. What does 'code freeze' even mean when you're working alone and vibe coding and asking the agent to do things

- Yes LLMs hallucinate. The guy seems smart and I guess he knows it. Yet he deliberately drives up the emotional side of everything saying that replit "fibbed" and "lied" because it created tests that didn't work.

- He had a lot of tweets saying that there was no rollback, because the LLM doesn't know about the rollback. Which is expected. He managed to rollback the database using Replit's rollback functionality[0], but still really milks the 'it deleted my production database'

- It looks like this was a thread about vibe coding daily. This was day 8. So this was an app in very early development and the 'production' database was probably the dev database?

Overall just looks like a lot of attention seeking to me.

[0] https://x.com/jasonlk/status/1946240562736365809 "It turns out Replit was wrong, and the rollback did work."

killerstorm · a month ago

Calling Rplit Agent (the AI) just Replit is also a bit sus, as it might sound like the company itself is doing these nefarious things, while it's more like the agent doesn't understand features of the environment it is in.

killerstorm commented on ChatGPT agent: bridging research and action openai.com/index/introduc... · Posted by u/Topfi

chaos_emergent · a month ago

the point is that minimizing 4 months as an insufficient timeline along which to prove out ones ability to predict a sequence of events is dumb when the rate of progress is incredibly fast.

killerstorm · a month ago

They predicted something completely unsurprising. Like "we'll see clouds next week".