lgessler (u/lgessler)

lgessler commented on Claude Code is being dumbed down? symmetrybreak.ing/blog/cl... · Posted by u/WXLCKNO

deaux · 2 days ago

Hi Boris, by far the most upvoted issue at 2550 on your github is "Support AGENTS.md" with 2550 upvotes. The second highest one has 563. Every single other agent supports AGENTS.md. Care to share why you haven't?

> Yoshi and others -- please keep the feedback coming. We want to hear it, and we genuinely want to improve the product in a way that gives great defaults for the majority of users, while being extremely hackable and customizable for everyone else.

I think an issue with 2550 upvotes, more than 4 times of the second-highest, is very clear feedback about your defaults and/or making it customizable.

lgessler · 2 days ago

Let's be real here, regardless of what Boris thinks, this decision is not in his hands.

lgessler commented on Cognitive load is what matters github.com/zakirullin/cog... · Posted by u/nromiun

Disposal8433 · 5 months ago

> three personas for software engineers

The kind of psycho-bullshit that we should stay away from, and wouldn't happen if we respected each other. Coming from Microsoft is not surprising though.

lgessler · 5 months ago

Novels are fictional too. So long as they're not taken too literally, archetypes can be helpful mental prompts.

lgessler commented on Gemma 3 270M re-implemented in pure PyTorch for local tinkering github.com/rasbt/LLMs-fro... · Posted by u/ModelForge

shekhar101 · 6 months ago

Can someone (or OP) point me to a recipe to fine tune a model like this for natural language tasks like complicated NER or similar workflows? I tried finetuning Gemma3 270M when it came out last week without any success. A lot of tutorials are geared towards chat applications and role playing but I feel this model could be great for usecases like mine where I am trying to extract clean up and extract data from PDFs with entity identification and such.

lgessler · 6 months ago

If you're really just doing traditional NER (identifying non-overlapping spans of tokens which refer to named entities) then you're probably better off using encoder-only (e.g. https://huggingface.co/dslim/bert-large-NER) or encoder-decoder (e.g. https://huggingface.co/dbmdz/t5-base-conll03-english) models. These models aren't making headlines anymore because they're not decoder-only, but for established NLP tasks like this which don't involve generation, I think there's still a place for them, and I'd assume that at equal parameter counts they quite significantly outperform decoder-only models at NER, depending on the nature of the dataset.

lgessler commented on FFmpeg 8.0 adds Whisper support code.ffmpeg.org/FFmpeg/FF... · Posted by u/rilawa

londons_explore · 6 months ago

Does this have the ability to edit historic words as more info becomes available?

Eg. If I say "I scream", it sounds phonetically identical to "Ice cream".

Yet the transcription of "I scream is the best dessert" makes a lot less sense than "Ice cream is the best dessert".

Doing this seems necessary to have both low latency and high accuracy, and things like transcription on android do that and you can see the adjusting guesses as you talk.

lgessler · 6 months ago

I recommend having a look at 16.3 onward here if you're curious about this: https://web.stanford.edu/~jurafsky/slp3/16.pdf

I'm not familiar with Whisper in particular, but typically what happens in an ASR model is that the decoder, speaking loosely, sees "the future" (i.e. the audio after the chunk it's trying to decode) in a sentence like this, and also has the benefit of a language model guiding its decoding so that grammatical productions like "I like ice cream" are favored over "I like I scream".

lgessler commented on Grok: Searching X for "From:Elonmusk (Israel or Palestine or Hamas or Gaza)" simonwillison.net/2025/Ju... · Posted by u/simonw

msgodel · 7 months ago

I run my local LLMs with a seed of one. If I re-run my "ai" command (which starts a conversation with its parameters as a prompt) I get exactly the same output every single time.

lgessler · 7 months ago

In my (poor) understanding, this can depend on hardware details. What are you running your models on? I haven't paid close attention to this with LLMs, but I've tried very hard to get non-deterministic behavior out of my training runs for other kinds of transformer models and was never able to on my 2080, 4090, or an A100. PyTorch docs have a note saying that in general it's impossible: https://docs.pytorch.org/docs/stable/notes/randomness.html

Inference on a generic LLM may not be subject to these non-determinisms even on a GPU though, idk