I'd say spending at least a quarter of my vibe coding time on refactoring + documentation refresh to ensure the codebase looking impeccable is the only way my projects can work at all long term. We don't want to confuse the coding agent.
IMO this was the more elegant design if you think about it: tool calling is really just structured output and structured output is tool calling. The "do not provide multiple ways of doing the same thing" philosophy.
Where are you getting that number from?
Anthropic added quite strict limits on usage - visible from the /usage method inside Claude Code. I would be surprised if those limits turn out to still result in expensive losses for them.
My theory is this:
- we know from benchmarks that open-weight models like Deepseek R1 and Kimi K2's capabilities are not far behind SOTA GPT/Claude
- open-weight API pricing (e.g. on openrouter) is roughly 1/10~1/5 that of GPT/Claude
- users can more or less choose to hook their agent CLI/IDEs to either closed or open models
If these points are true, then the only reason people are primarily on CC & Codex plans is because they are subsidized by at least 5~10x. When confronted with true costs, users will quickly switch to the lowest inference cost vendor, and we get perfect competition + zero margin for all vendors.
> Claude Code is reportedly close to generating $1 billion in annualized revenue, up from about $400 million in July.
https://techcrunch.com/2025/11/04/anthropic-expects-b2b-dema...
As soon as users are confronted with their true API cost, the appearance of this being a good business falls apart. At the end of the day, there is no moat around large language models - OpenAI, Anthropic, Google, DeepSeek, Alibaba, Moonshot... any company can make a SOTA model if they wish, so in the long run it's guaranteed to be a race to the bottom where nobody can turn a profit.
How hard would it be to build an MCP that's basically a proxy for web search except it always tries to build the markdown version of the web pages instead of passing HTML?
Basically Sosumi.ai but instead of working on only for Apple docs it works for any web page (including every doc on the internet)
But stripping complex formats like html & pdf down to simple markdown is a hard problem. It's nearly impossible to infer what the rendered page looks like by looking at the raw html / pdf code. https://github.com/mozilla/readability helps but it often breaks down over unconventional div structures. I heard the state of the art solution is using multimodal LLM OCR to really look at the rendered page and rewrite the thing in markdown.
Which makes me wonder: how did OpenAI make their model read pdf, docx and images at all?
Take memory for example: give LLM a persistent computer and ask it to jot down its long-term memory as hierarchical directories of markdown documents. Recalling a piece of memory means a bunch of `tree` and `grep` commands. It's very, very rudimentary, but it kinda works, today. We just have to think of incrementally smarter ways to query & maintain this type of memory repo, which is a pure engineering problem.
Document how to use and install your tool in the readme.
Document how to compile, test, architecture decisions, coding standards, repository structure etc in the agents doc.
But if your job is to assemble a car in order to explore what modifications to make to the design, experiment with a single prototype, and determine how to program those robot arms, you’re probably not thinking about the risk of being automated.
I know a lot of counter arguments are a form of, “but AI is automating that second class of job!” But I just really haven’t seen that at all. What I have seen is a misclassification of the former as the latter.
Ask a robot arm "how should we improve our car design this year", it'll certainly get stuck. Ask an AI, it'll give you a real opinion that's at least on par with a human's opinion. If a company builds enough tooling to complete the "AI comes up with idea -> AI designs prototype -> AI robot physically builds the car -> AI robot test drives the car -> AI evaluates all prototypes and confirms next year's design" feedback loop, then theoretically this definitely can work.
This is why AI is seen as such a big deal - it's fundamentally different from all previous technologies. To an AI, there is no line that would distinguish class I from II.