Readit News logoReadit News
cleverwebble commented on Enough AI copilots, we need AI HUDs   geoffreylitt.com/2025/07/... · Posted by u/walterbell
irthomasthomas · 8 months ago

    import openai, math, os, textwrap, json, sys
    query = 'Paris is the capital of'  # short demo input
    os.environ['OPENAI_API_KEY']       # check key early
    client = openai.OpenAI()
    resp = client.chat.completions.create(
        model='gpt-3.5-turbo',
        messages=[{'role': 'user', 'content': query}],
        max_tokens=12,
        logprobs=True,
        top_logprobs=1
    )

    logprobs = [t.logprob for t in resp.choices[0].logprobs.content]
    perplexity = math.exp(-sum(logprobs) / len(logprobs))
    print('Prompt: "', query, '"', sep='')
    print('\nCompletion:', resp.choices[0].message.content)
    print('\nToken count:', len(logprobs))
    print('Perplexity:', round(perplexity, 2))
Output:

    Prompt: "Paris is the capital of"
    
    Completion:  France.
    
    Token count: 2
    Perplexity: 1.17

Meta: Out of three models: k2, qwen3-coder and opus4, only opus one-shot the correct formatting for this comment.

cleverwebble · 7 months ago
If you want to generate a heatmap of existing text, you will have to take a different approach here.

The naive solution I could come up with would be really expensive with openai, but if you have an open source model, you can write up custom inference that goes one-token-at-a-time through the text, and on each token you look up the difference in logprobs between the token that the LLM predicted vs what was actually there, and use that to color the token.

The downside I imagine to this approach is it would probably tend to highlight the beginning of bad code, and not the entire block - because once you commit to a mistake, the model will generally roll with it - ie, a 'hallucination' - so logprobs of tokens after the bug happened might only be slightly higher than normal.

Another option might be to use a diffusion based model, adding some noise to the input and having it iterate a few times through, then measuring the parts of the text that changed the most. I have only a light theory understanding of these models though, so I'm not sure how well that would work

cleverwebble commented on Enough AI copilots, we need AI HUDs   geoffreylitt.com/2025/07/... · Posted by u/walterbell
_jab · 7 months ago
There’s a lot of ideation for coding HUDs in the comments, but ironically I think the core feature of most coding copilots is already best described as a HUD: tab completion.

And interestingly, that is indeed the feature I find most compelling from Cursor. I particularly love when I’m doing a small refactor, like changing a naming convention for a few variables, and after I make the first edit manually Cursor will jump in with tab suggestions for the rest.

To me, that fully encapsulates the definition of a HUD. It’s a delightful experience, and it’s also why I think anyone who pushes the exclusively-copilot oriented Claude Code as a superior replacement is just wrong.

cleverwebble · 7 months ago
Agreed!

I've spent the last few months using Claude Code and Cursor - experimenting with both. For simple tasks, both are pretty good (like identifying a bug given console output) - but when it comes to making a big change, like adding a brand new feature to existing code that requires changes to lots of files, writing tests, etc - it often will make at least a few mistakes I catch on review, and then prompting the model to fix those mistakes often causes it to fix things in strange ways.

A few days ago, I had a bug I just couldn't figure out. I prompted Claude to diagnose and fix the issue - but after 5 minutes or so of it trying out different ideas, rerunning the test, and getting stuck just like I did - it just turned off the test and called it complete. If I wasn't watching what it was doing, I could have missed that it did that and deployed bad code.

The last week or so, I've totally switched from relying on prompting to just writing the code myself and using tab complete to autocomplete like 80% of it. It is slower, but I have more control and honestly, it's much more enjoyable of an experience.

cleverwebble commented on Kiro: A new agentic IDE   kiro.dev/blog/introducing... · Posted by u/QuinnyPig
cleverwebble · 8 months ago
I do like the control you have over organizing. The only thing I was shocked was I couldn't revert a series of changes (checkpointing) like you can can in Cursor. Sometimes the LLM does it a different way I dont like, so I want to go back and edit the prompt and try again.
cleverwebble commented on Dolly Parton's Dollywood Express   thetransitguy.substack.co... · Posted by u/FinnKuhn
cleverwebble · 9 months ago
I grew up in a town called Seymour, which is in the same county as Gatlinburg/Pigeon Forge. It's Dolly country. She's a hero - she's done so much for her community. Much of the money you spend at Dollywood goes to help so many public causes. One of them is the Imagination Library, which gives out free books every month from birth to 5 years of age. I was one of the first families that got to take a part in it. I certainly don't remember much from when I was 5, but I do remember getting those books. I imagine it had a positive impact on my growth.

We'd go to Dollywood a few times a year - she would give out free tickets to people who worked in Gatlinburg to go. It's really well run, and their water park is great too. Growing up, we'd ride the train when we visited. While I didn't appreciate it much as a kid, when I grew up I realized how awesome of an opportunity that was.. I moved away from Tennessee about 12 years ago, one of the biggest things I have missed is Dollywood and their big steam train.

cleverwebble commented on When Fine-Tuning Makes Sense: A Developer's Guide   getkiln.ai/blog/why_fine_... · Posted by u/scosman
simonw · 9 months ago
This is a post by a vendor that sells fine-tuning tools.

Here's a suggestion: show me a demo!

For the last two years I've been desperately keen to see just one good interactive demo that lets me see a fine-tuned model clearly performing better (faster, cheaper, more accurate results) than the base model on a task that it has been fine-tuned for - combined with extremely detailed information on how it was fine-tuned - all of the training data that was used.

If you want to stand out among all of the companies selling fine-tuning services yet another "here's tasks that can benefit from fine-tuning" post is not the way to do it. Build a compelling demo!

cleverwebble · 9 months ago
I can't really show an interactive demo, but my team at my day job has been fine tuning OpenAI models since GPT-3.5 and fine tuning can drastically improves output quality & prompt adherence. Heck, we found you can reduce your prompt to very simple instructions, and encode the style guidelines via your fine tuning examples.

This really only works though if:

1) The task is limited to a relatively small domain (relatively small could probably be misnomer, as most LLMs are trying to solve every-problem-all-at-once. As long as you are having it specialize in a specific field even, FT can help you achieve superior results.) 2) You have high quality examples (you don't need a lot, maybe 200 at most) Quality is often better than quantity here.

Often, distillation is all you need. Eg, do some prompt engineering on a high quality model (GPT-4.1, Gemini-Pro, Claude, etc.) - generate a few hundred examples, optionally (ideally) check for correctness via evaluations, and then fine tune a smaller, cheaper model. The new fine tuned model will not perform as well at generalist tasks as before, but it will be much more accurate at your specific domain, which is what most businesses care about.

Deleted Comment

cleverwebble commented on A style of leadership that is direct and forceful, yet also respectful (2023)   respectfulleadership.subs... · Posted by u/lkrubner
cleverwebble · a year ago
"No, I'll do it myself" and "I feel like you aren't listening to me" comes straight out of couple's therapy handbooks on what not to do.

You can be direct and respectful, but this was not respectful, this was just aggressive.

cleverwebble commented on Apple Invites   apple.com/newsroom/2025/0... · Posted by u/openchampagne
cleverwebble · a year ago
I'm in my mid-thirties and most of my friends have ditched Facebook. I didn't really realize this until when I used it to create an event for a house party... I was somewhat surprised that only 2 people out of 15 even saw it. I ended up resorting to good old text message and that worked, but it was tedious. Not sure how popular this will become, but having a social-media-less event invite/broadcasting system would be nice, and having one that most people with an iPhone have access to covers much of my friend base
cleverwebble commented on The young, inexperienced engineers aiding DOGE   wired.com/story/elon-musk... · Posted by u/medler
cleverwebble · a year ago
thats a wild, and dangerous take
cleverwebble commented on A drone that calculates coordinates using a camera and Google Maps   twitter.com/ilaffey2/stat... · Posted by u/defrost
dgfitz · 2 years ago
DTED data is like a fingerprint. This isn’t much of a story.
cleverwebble · 2 years ago
The story is that these young engineers built a 500$ drone that consumes this kind of data to do mapping. In 24 hours for a hackathon no less.

If the US government didn't already have this kind of tech, they would spend millions just for the same prototype they built. And probably tens or hundreds of millions for a final product.

I think that's a story

u/cleverwebble

KarmaCake day149December 9, 2021View Original