The pattern I've converged on: spend the first 30 minutes writing detailed markdown specs (inputs, outputs, edge cases, integration points), then let Claude Code chew through the implementation while I review, test, and iterate. For a typical automation project — say a WhatsApp bot that handles booking flows and integrates with a client's CRM — this cuts delivery time roughly in half compared to writing everything manually.
The biggest practical lesson: the spec quality is everything. A vague spec produces code you'll spend more time debugging than you saved. A good spec with explicit error handling expectations, API response formats, and state transitions produces code that's 80-90% production-ready on the first pass.
Where I disagree slightly with the parallel agent approach: for client-facing work where correctness matters more than speed, I've found 2-3 focused agents (one on backend, one on frontend, one on tests) more reliable than 6-8 competing agents that create merge conflicts. The overhead of resolving conflicts and ensuring consistency across parallel outputs eats into the productivity gains fast.
Patterns that have helped in production:
1. Multi-provider fallback. For conversational systems, route to Claude by default, fall back to GPT-4 on 5xx errors. The response quality difference is usually acceptable for the 2-3% of requests that hit the fallback. This turns a hard outage into a slight quality degradation.
2. Async queuing for non-real-time workflows. If you're processing documents, generating reports, or running batch analysis — don't call the API synchronously. Queue the work, retry with exponential backoff, and let the system self-heal when the API recovers. Most of our automation pipelines run with a 15-minute SLA, not a 500ms one.
3. Graceful degradation in real-time systems. For chatbots and voice agents, have a scripted fallback path. "I'm having trouble processing that right now — let me transfer you to a human" is infinitely better than a hung connection or error message.
The broader issue: we're all building on infrastructure where "four nines" isn't even on the roadmap yet. That's fine if you architect for it — treat LLM APIs like any other unreliable external dependency, not like a database query.