sophiabits (u/sophiabits)

sophiabits commented on When does MCP make sense vs CLI? ejholmes.github.io/2026/0... · Posted by u/ejholmes

goranmoomin · 12 days ago

I can't believe everyone is talking about MCP vs CLI and which is superior; both are a method of tool calling, it does not matter which format the LLM uses for tool calling as long as it provides the same capabilities. CLIs might be marginably better (LLMs might have been trained on common CLIs), but MCPs have their uses (complex auth, connecting users to data sources) and in my experience if you're using any of the frontier models, it doesn't really matter which tool calling format you're using; a bespoke format also works.

The difference that should be talked about, should be how skills allow much more efficient context management. Skills are frequently connected to CLI usage, but I don't see any reason why. For example, Amp allows skills to attach MCP servers to them – the MCP server is automatically launched when the Agent loads that skill[0]. I belive that both for MCP servers and CLIs, having them in skills is the way for efficent context, and hoping that other agents also adopt this same feature.

[0]: https://ampcode.com/manual#mcp-servers-in-skills

sophiabits · 12 days ago

> the MCP server is automatically launched when the Agent loads that skill

The main problem with this approach at the moment is it busts your prompt cache, because LLMs expect all tool definitions to be defined at the beginning of the context window. Input tokens are the main driver of inference costs and a lot of use cases aren't economical without prompt caching.

Hopefully in future LLMs are trained so you can add tool definitions anywhere in the context window. Lots of use cases benefit from this, e.g. in ecommerce there's really no point providing a "clear cart" tool to the LLM upfront, it'd be nice if you could dynamically provide it after item(s) are first added.

sophiabits commented on Claude Code 2.0 npmjs.com/package/@anthro... · Posted by u/polyrand

robertfw · 5 months ago

comments for me are a code smell:

- like all documentation, they are prone to code rot (going out of date)

- ideally code should be obvious; if you need a comment to explain it, perhaps it's not as simple as it could be, or perhaps we're doing something hacky that we shouldn't

sophiabits · 5 months ago

Comments are often the best tool for explaining why a bit of code is formulated how it is, or explaining why a more obvious alternate implementation is a dead end.

An example of this: assume you live in a world where the formula for the circumference of a circle has not been derived. You end up deriving the formula yourself and write a function which returns 2piradius. This is as simple as it gets, not hacky at all, and you would /definitely/ want to include a comment explaining how you arrived at your weird and arbitrary-looking "3.1415" constant.

sophiabits commented on Structured Outputs in the API openai.com/index/introduc... · Posted by u/davidbarker

santiagobasulto · 2 years ago

I’ve noticed that lately GPT has gotten more and more verbose. I’m wondering if it’s a subtle way to “raise prices”, as the average response is going to incur I more tokens, which makes any API conversation to keep growing in tokens of course (each IN message concatenates the previous OUT messages).

sophiabits · 2 years ago

I’ve especially noticed this with gpt-4o-mini [1], and it’s a big problem. My particular use case involves keeping a running summary of a conversation between a user and the LLM, and 4o-mini has a really bad tendency of inventing details in order to hit the desired summary word limit. I didn’t see this with 4o or earlier models

Fwiw my subjective experience has been that non-technical stakeholders tend to be more impressed with / agreeable to longer AI outputs, regardless of underlying quality. I have lost count of the number of times I’ve been asked to make outputs longer. Maybe this is just OpenAI responding to what users want?

[1] https://sophiabits.com/blog/new-llms-arent-always-better#exa...

sophiabits commented on New LLMs aren't always better sophiabits.com/blog/new-l... · Posted by u/sophiabits

sophiabits · 2 years ago

I wanted to document a particular genAI antipattern which I've seen a few times now.

LLMs are theoretically pretty fungible, because you send English and get English back--but in practice you still need to do some amount of technical due diligence before swapping model. These things are benchmarked on tasks which rarely resemble your specific use case. Blindly swap models at your own risk!

Something that has become very clear since the advent of GPT-3.5 is that LLMs are far from magic, and using them does not remove the need for good engineering fundamentals. It's important to have a solid eval suite so you can quickly benchmark your system against different LLMs, because the APIs we're all building on are constant moving targets.

sophiabits commented on Anti-patterns in event-driven architecture codeopinion.com/beware-an... · Posted by u/indentit

onetimeuse92304 · 2 years ago

Well, technically, you can construct the microservices preserving type safety. You can have an interface with two implementations

- on the service provider, the implementation provides the actual functionality,

- on the client, the implementation of the interface is just a stub connecting to the actual service provider.

Thus you can sort of provide separation of services as an implementation detail.

However in practice very few projects elect to do this.

sophiabits · 2 years ago

Even with this setup in place you need a heightened level of caution relative to a monolith. In a monolith I can refactor function signatures however I desire because the whole service is an atomically deployed unit. Once you have two independently deployed components that goes out the window and you now need to be a lot more mindful when introducing breaking changes to an endpoint’s types

sophiabits commented on Breaking up is hard to do: Chunking in RAG applications stackoverflow.blog/2024/0... · Posted by u/meysamazad

emporas · 2 years ago

Why not use langchain-rust and make your own client? If you don't know about langchain, i think you are missing out. I took a look at other langchain implementions in js and python, in each one people have done some serious work. Langchain-rust also uses tree-sitter to chunk code, it works very well in some quick tests i tried.

>The problem with most chunking schemes I’ve seen is they’re naive. They don’t care about content; only token count.

I think controlling different inputs depending on context is used in agents. For the moment i haven't seen anything really impressive coming out of agents. Maybe Perplexity style web search, but nothing more.

sophiabits · 2 years ago

Not sure what the situation is like now, but we stopped using LangChain last year because the rate of change in the library was huge. Whenever we needed to upgrade for a new feature or bug fix we’d be 20~ versions behind and need to work through breaking changes. Eventually we decided that it was easier to just write everything ourselves.

This is from the first half of 2023 or so; maybe things are more stable now, but looks like the Python implementation is still pre-v1.

sophiabits commented on Garbage collect your technical debt (2021) ieeexplore.ieee.org/docum... · Posted by u/gfairbanks

caseyohara · 2 years ago

> Building software iteratively leads inevitably to tech debt because we choose to deliver systems before we have looked at all the requirements. Not knowing what’s next distorts our designs, and that distortion is the tech debt.

This article frames technical debt as something that happens passively because you can't know future requirements. That's sometimes true, of course, but in my experience the majority of technical debt is accrued deliberately in a much more active process.

When developing a new feature that doesn't neatly fit into the existing system, you must choose between two compromises:

1. Build it the "fast way", shoehorning the feature into the system just enough to work, compromising quality for velocity and accruing technical debt; or

2. Build it the "right way", adapting the system to accommodate the new feature, compromising velocity for quality to avoid technical debt.

This is usually a deliberate decision, so choosing to accrue technical debt is an active process. The only way it could be passive like the article describes is if the developers don't know or otherwise don't consider the "right way" and go straight for the "fast way". I hope to never work on a team that operates like that.

sophiabits · 2 years ago

The other possibility (which is common in startups) is that often the “right way” is different depending on the scale of the system you need to design for. In cases like this you end up with technical debt a year down the line, but at the time the feature was shipped the engineering decisions made were extremely reasonable.

I’ve seen a few colleagues jump to writing off all technical debt as being inherently bad, but in cases like this it’s a sign of success and something that’s largely impossible to avoid (the EV of building for 10-100x current scale is generally negative, factoring in the risk of the business going bust). There’s a kind of entropy at play here.

Big fan of tidying things up incrementally as you go [1], because it enables teams to at least mitigate this natural degradation over time

[1] https://sophiabits.com/blog/be-a-tidy-kiwi