I'm curious about the solutions the op has tried so far here.
In general, a more generic eval setup is needed, with minimal requirements from AI engineers, if we want to move forward from Vibe's reliability engineering practices as a sector.
Deleted Comment
I'm curious about the solutions the op has tried so far here.
In general, a more generic eval setup is needed, with minimal requirements from AI engineers, if we want to move forward from Vibe's reliability engineering practices as a sector.
After analyzing hundreds of production agent workflows, we discovered something: 40-70% of agent tool calls and text prompts don't need expensive flagship models. Yet most implementations route everything through their selected flagship model.
Here's what that looks like in practice:
A customer support agent handling 1,000 queries/day: - Current cost: ~$225/month - Actual need: 60% could use smaller or domain specific models (faster, cheaper) - Wasted spend: $135/month per agent
A data analysis agent making 5,000 tool calls/day: - Current cost: ~$1,125/month - Actual need: 70% are simple operations - Wasted spend: $787/month
Multiply this across multiple agents, and you're looking at hundreds in unnecessary costs per month.
The root cause? Agent frameworks don't differentiate between "check database status" and "analyze complex business logic" - they treat every call the same.
The Solution: Intelligent Model Cascading
We built CascadeFlow's LangChain integration as a drop-in replacement that:
1. Tries fast, cheap models first 2. Validates response quality automatically 3. Escalates to flagship models only when needed 4. Tracks costs per query in real-time
The integration is dead simple - it works exactly like any LangChain chat model. No architecture changes. Just swap your chat model for CascadeFlow.
What you get: - Full LCEL chain support - Streaming and tool calling - LangSmith tracing out of the box - 40-85% cost reduction - 2-10x faster responses for simple queries - Zero quality loss
Real production results from teams already using it.
Open source, MIT licensed. Takes 5 minutes to integrate.
About the company:
• Technical founding team with relevant industry experience (part of Nvidia Inception Program)
• Backed by well-known European VCs (SpeedInvest + Galion.exe)
• Building infrastructure-less agentic evals and agent-as-a-judge monitoring
Ideal candidate:
• Demonstrated shipping ability with past projects & roles
• "Young and hungry” mindset, prioritising ability to learn with agency over experience
• Familiar with fine-tuning algorithms and frameworks, transformers/trl, ART, verl and unsloth
• Bonus points for experience in contributing to open-source projects, startups, AI agents, & similar technologies
Reach out to the founder directly: https://www.linkedin.com/in/rhommes/ or visit our website https://moyai.ai
- many, if not most MLEs that got started after LLMs do not generally know anything about machine learning. For lack of clearer industry titles, they are really AI developers or AI devops
- machine learning as a trade is moving toward the same fate as data engineering and analytics. Big companies only want people using platform tools. Some ai products, even in cloud platforms like azure, don’t even give you the evaluation metrics that would be required to properly build ml solutions. Few people seem to have an issue with it.
- fine tuning, especially RL, is packed with nuance and details… lots to monitor, a lot of training signals that need interpretation and data refinement. It’s a much bigger gap than training simpler ML models, which people are also not doing/learning very often.
- The limited number of good use cases means people are not learning those skills from more senior engineers.
- companies have gotten stingy with sme-time and labeling
What confidence do companies have in supporting these solutions in the future? How long will you be around and who will take up the mantle after you leave?
AutoML never really panned out, so I’m less confident that platforming RL will go any better. The unfortunate reality is that companies are almost always willing to pay more for inferior products because it scales. Industry “skills” are mostly experience with proprietary platform products. Sure they might list “pytorch” as a required skill, but 99% of the time, there isn’t hardly anyone at the company that has spent any meaningful time with it. Worse, you can’t use it, because it would be too hard to support.
In true hacker spirit, I don't think trying to train a model on a wonky GPU is something that needs an ROI for the individual engineer. It's something they do because they yearn to acquire knowledge.
Deleted Comment
About the company:
• Technical founding team with relevant industry experience (part of Nvidia Inception Program)
• Backed by well-known European VCs (SpeedInvest + Galion.exe)
• Building agentic advanced analytics and fine-tuning analytical reasoning models
Ideal candidate:
• Demonstrated shipping ability with past projects & roles
• "Young and hungry” mindset, prioritising ability to learn with agency over experience
• Familiar with fine-tuning algorithms and frameworks, transformers/trl, ART, verl and unsloth
• Bonus points for experience in contributing to open-source projects, startups, AI agents, & similar technologies
Reach out to the founder directly: https://www.linkedin.com/in/rhommes/
About the company:
• Technical founding team with relevant industry experience
• Backed by well-known European VCs (SpeedInvest + Galion.exe)
• Cloud native & freedom to shape our tech stack (TypeScript + Python)
Ideal candidate:
• Previous experience at start-up building tech at scale
• Thinks in terms of product functionality and customer demands not just features
• Familiar with API first practices and frameworks
• Bonus points if you are an ex-founder or have been first hire before
Moyai is an AI-powered agent monitoring tool for AI engineers looking to catch agent failures in production. Reach out to the founder directly: https://www.linkedin.com/in/rhommes/ or visit our website https://moyai.ai
*No agencies or recruiters, and we are unable to provide visa sponsorship