Readit News logoReadit News
idopmstuff commented on OpenAI are quietly adopting skills, now available in ChatGPT and Codex CLI   simonwillison.net/2025/De... · Posted by u/simonw
qouteall · 8 days ago
Goodhart's law: When a measure becomes a target, it ceases to be a good measure.

AI companies have high incentive to make score go up. They may employ human to write similar-to-benchmark training data to hack benchmark (while not directly train on test).

Throwing your hard problem at work to LLM is a better metric than benchmarks.

idopmstuff · 8 days ago
I own a business and am constantly using working on using AI in every part of it, both for actual time savings and also as my very practical eval. On the "can this successfully be used to do work that I do or pay someone else to do more quickly/cheaply/etc." eval, I can confirm that models are progressing nicely!
idopmstuff commented on OpenAI are quietly adopting skills, now available in ChatGPT and Codex CLI   simonwillison.net/2025/De... · Posted by u/simonw
AdieuToLogic · 8 days ago
> I'm not sure English is a bad way to outline what the system should do.

It isn't, as these are how stakeholders convey needs to those charged with satisfying same (a.k.a. "requirements"). Where expectations become unrealistic is believing language models can somehow "understand" those outlines as if a human expert were doing so in order to produce an equivalent work product.

Language models can produce nondeterministic results based on the statistical model derived from their training data set(s), with varying degrees of relevance as determined by persons interpreting the generated content.

They do not understand "what the system should do."

idopmstuff · 8 days ago
This is just semantics. You can say they don't understand, but I'm sitting here with Nano Banana Pro creating infographics, and it's doing as good of a job as my human designer does with the same kinds of instructions. Does it matter if that's understanding or not?
idopmstuff commented on It's Always the Process, Stupid   its.promp.td/its-always-t... · Posted by u/DocIsInDaHouse
grvdrm · 21 days ago
Question: could you use something like (example) Selenium to perform some or all of those pre-LLM tasks?
idopmstuff · 19 days ago
I could (I mean in theory - practically, I'm not technically proficient enough to do so), and in fact one of the most promising web browsing agents I've tested is director.ai, which just writes Stagehand code on the fly to achieve the objectives you give it. Unfortunately it can't be invoked via API yet, so doesn't work for my use case.

Honestly, it takes such a relatively small amount of time that it makes sense to just do it myself until there's an agent that can easily handle it; I'm really only spending time trying to automate it now as a test of AI capabilities. If I actually wanted to get it automated tomorrow, the most time-efficient way to do that would just be to involve a VA from somewhere cheap for the work I'm doing.

idopmstuff commented on It's Always the Process, Stupid   its.promp.td/its-always-t... · Posted by u/DocIsInDaHouse
idopmstuff · 21 days ago
I have always found writing documentation to be incredibly helpful for clarifying my thinking. It prevents me from doing mental hand-waving around details, and often times writing down a process that I have done a thousand times is the thing that makes me realize how I can cut steps or improve it.

I'm now in the process of trying to hand off chunks of the work I do to run my business to AI (both to save time but also just as my very broad, practical eval). It really is all about documentation. I buy small e-commerce brands, and they're simple enough that current SOTA models have more than enough intelligence to take a first pass at listings + financials to determine whether I should take a call with the seller. To make that work, though, I've got a prompt that's currently at six pages that is just every single thing I look when evaluating a business codified.

Using that has really convinced me that people are overrating the importance of intelligence in LLMs in terms of driving real economic value. Most work is like my evaluations - it requires intelligence, but there's a ceiling to how much you need. Someone with 150 IQ points wouldn't do any better at this task than someone with 100 IQ points.

Instead, I think what's going to drive actual change is the scaffolding that lets LLMs take on increasing numbers of tasks. My big issue right now is that I have to go to the listing page for a business that's for sale, screenshot the page, download the files, upload that all to ChatGPT and then give it the prompt. I'm still waiting for a web browsing agent that can handle all of that for me, so I can automate the full flow and just get an analysis of each listing sent to me without having to do anything.

idopmstuff commented on Ask HN: How to boost Gemini transcription accuracy for company names?    · Posted by u/bingwu1995
tifa2up · 2 months ago
Don't solve it on the STT level. Get the raw transcription from Gemini then pass the output to an LLM to fix company names and other modifications.

Happy to share more details if helpful.

idopmstuff · 2 months ago
Yeah, I've done it with industry-specific acronyms and this works well. Generate a list of company names and other terms it gets wrong, and give it definitions and any other useful context. For industry jargon, example sentences are good, but that's probably not relevant for company names.

Feed it that list and the transcript along with a simple prompt along the lines of "Attached is a transcript of a conversation created from an audio file. The model doing the transcription has trouble with company names/industry terms/acronyms/whatever else and will have made errors with those. I have also attached a list of company names/etc. that may have been spoken in the transcribed audio. Please review the transcription, and output a corrected version, along with a list of all corrections that you made. The list of corrections should include the original version of the word that you fixed, what you updated it to, and where it is in the document." If it's getting things wrong, you can also ask it to give an explanation of why it made each change that it did and use that to iterate on your prompt and the context you're giving it with your list of words.

idopmstuff commented on Good Vibes Only: Vibe coding an internal app to manage my business   theautomatedoperator.subs... · Posted by u/idopmstuff
idopmstuff · 2 months ago
I buy and operate small e-commerce brands, and since GPT-3.5, I've been attempting to vibe code software to help me manage the business. With GPT-5 Codex I have finally managed to create something legitimately useful for myself. The code may be (almost certainly is) not of great quality, but for the purposes of an internal application that only I use, it's doing the job just fine.

u/idopmstuff

KarmaCake day3027January 7, 2023View Original