SteveJS (u/SteveJS) - Readit News

Here is a (3 month old) repo where i did something like that and all the tasks are checked into the linear git history — https://github.com/KnowSeams/KnowSeams

SteveJS commented on Superpowers: How I'm using coding agents in October 2025 blog.fsck.com/2025/10/09/... · Posted by u/Ch00k

causal · 3 months ago

Using LLMs for coding complex projects at scale over a long time is really challenging! This is partly because defining requirements alone is much more challenging than most people want to believe. LLMs accelerate any move in the wrong direction.

SteveJS · 3 months ago

Having the llm write the spec/workunit from a conversation works well. Exploring a problem space with a (good) coding agent is fantastic.

However for complex projects IMO one must read what was written by the llm … every actual word.

When it ‘got away’ from me, in each case I left something in the llm written markdown that I should have removed.

99% “I can ask for that later” and 1% “that’s a good idea i hadn’t considered” might be the right ratio when reading an llm generated plan/spec/workunit.

Breaking work into single context passes … 50-60k tokens in sonnet 4.5 has had typically fantastic results for me.

My side project is using lean 4 and a carelessly left in ‘validate’ rather than ‘verify’ lead down a hilariously complicated path equivalent to matching an output against a known string.

I recovered, but it wasn’t obvious to me that was happening. I however would not be able to write lean proofs myself, so diagnosing the problem and fixing it is a small price to be able to mechanically verify part of my software is correct.

SteveJS commented on GitHub Copilot CLI is now in public preview github.blog/changelog/202... · Posted by u/CharlesW

SteveJS · 3 months ago

I do like the oddish (half cli / half tui) form factor of these ‘CLI’ coding agents. But i pretty much always pair with vscode for diff viewing.

SteveJS commented on Windows ML is generally available blogs.windows.com/windows... · Posted by u/sorenjan

MrCoffee7 · 3 months ago

How does Windows ML compare to just using something like Ollama plus an LLM that you download to your device (which seems like it would be much simpler)? What are the privacy implications of using Windows ML with respect to how much of your data it is sending back to Microsoft?

SteveJS · 3 months ago

It is not llm specific. A large swathe of it isn’t that much Microsoft specific either.

And it is a developer feature hidden from end users. e.g. - In your ollama example, does the developer ask end users to install ollama? Does the dev redistribute ollama and keep it updated?

The ONNX format is pretty much a boring de-facto standard for ML model exchange. It is under the linux foundation.

The ONNX Runtime is a microsoft thing, but it is an MIT licensed runtime for cross language use and cross OS/HW platform deployment of ML models in the ONNX format.

That bit needs to support everything because Microsoft itself ships software on everything.(Mac/linux/iOS/Android/Windows.

ORT — https://onnxruntime.ai

Here is the Windows ML part of this —https://learn.microsoft.com/en-us/windows/ai/new-windows-ml/...

The primary value claims for Windows ML (for a developer using it)— This eliminates the need to: Bundle execution providers for specific hardware vendors

Create separate app builds for different execution providers

Handle execution provider updates manually.

Since ‘EP’ is ultra-super-techno-jargon:

Here is what GPT-5 provides:

Intensional (what an EP is)

In ONNX Runtime, an Execution Provider (EP) is a pluggable backend that advertises which ops/kernels it can run and supplies the optimized implementations, memory allocators, and (optionally) graph rewrites for a specific target (CPU, CUDA/TensorRT, Core ML, OpenVINO, etc.). ONNX Runtime then partitions your model graph and assigns each partition to the highest-priority EP that claims it; anything unsupported falls back (by default) to the CPU EP.

Extensional (how you use them) • You pick/priority-order EPs per session; ORT maps graph pieces accordingly and falls back as needed. • Each EP has its own options (e.g., TensorRT workspace size, OpenVINO device string, QNN context cache). • Common EPs: CPU, CUDA, TensorRT (NVIDIA), DirectML (Windows), Core ML (Apple), NNAPI (Android), OpenVINO (Intel), ROCm (AMD), QNN (Qualcomm).

SteveJS commented on We reverse-engineered Flash Attention 4 modal.com/blog/reverse-en... · Posted by u/birdculture

petters · 3 months ago

Is reading the source code reverse engineering?

SteveJS · 3 months ago

The content is good. I’m glad i ignored a similar negative reaction to the reverse engineering framing.