ultmaster (u/ultmaster)

ultmaster · 16 days ago

I trained the agent itself to write SQL, run it, check results, then rewrite until correct. The write and rewrite policies are optimized with RL, using a client–server setup in Agent Lightning and a LangGraph state machine. On a 500-sample Spider eval subset, Qwen2.5-Coder-3B with 4096 context reaches 80.4% at three turns of write and rewrite, 80.2% at one turn. After training, model Qwen2.5-Coder-1.5B can be even better than Qwen2.5-Coder-3B (untrained). I have compared multiple models and settings, hoping to shed light on tuning AI agents.

Full article: https://medium.com/@yugez/training-ai-agents-to-write-and-se...

Related projects:

- Agent Lightning as the glue: https://github.com/microsoft/agent-lightning

- verl for RL algorithms: https://github.com/volcengine/verl

- vLLM for efficient rollouts: https://github.com/vllm-project/vllm

- AgentOps for collecting training data (telemetry): https://github.com/AgentOps-AI/agentops

- LangGraph for agent orchestration: https://www.langchain.com/langgraph

ultmaster · 16 days ago

I'm the sole code contributor of POML, maybe except for Codex and cc. I think I've found where all that GitHub stars suddenly came from. :)

I'm from a small group under Microsoft Research. POML originally came from a research idea that Prompt should have a view layer like the traditional MVC architecture in the frontend system. The view layer should take care of the data, the styles and rendering logic, so that the user no longer needs to care how some table needs to be rendered, how to present few-shot examples, how to reformat the whole prompt with another syntax (e.g., from markdown to XML).

I have to admit that I spent so much time on making POML work well with VSCode, building all the auto completion, preview, hover stuff. The time is long enough that the codebase is almost becoming a monster for an individual developer to handle. The outside environment is also changing drastically. The rise of Agentic AI, tool calls, response format. The models today are no longer sensitive to small changes in prompt format as they used to. AI-aided programming can simply give you code to read in PDFs, Excels and render them in any style you want. With all that in mind, I used to feel hopeless about POML.

Nevertheless, after several months of working on another projects, I recently noticed that the view layer can be more of just a view layer. With proper user interface (e.g., a VSCode live preview), it can deliver a very smooth experience in prompt debugging, especially in a multi-prompt agent workflow. I also noticed that the "orchestration" idea can go beyond a XML-like code. I'll share more details when I had a tutorial / screenshot to share.

Going through this thread, I saw a lot of thoughts that once went through my mind. We love markdowns. We love template engines like jinja. We need those response formats. I'm thinking what is the missing piece here. I've spend so much time writing prompts and building agents in the past few months. What's my biggest pain points?

I'm quite surprised that the news hit me first before I'm ready to hit the news. If you have tried POML, please send me feedbacks. I'll see what I can do; or maybe we end up not needing a prompt language at all.