Readit News logoReadit News
anndvision commented on Supervised fine tuning on curated data is reinforcement learning   arxiv.org/abs/2507.12856... · Posted by u/GabrielBianconi
chongliqin · a month ago
Cool! If you are interested, we have open sourced our code: https://github.com/emmyqin/iw_sft
anndvision · a month ago
thanks
anndvision commented on Supervised fine tuning on curated data is reinforcement learning   arxiv.org/abs/2507.12856... · Posted by u/GabrielBianconi
anndvision · a month ago
We recently ran similar experiments and saw that fine-tuning small models on automatically curated high-quality outputs from a large model can beat large-model performance while reducing inference costs by up to 30x and inference time by up to 4x.

We benchmarked closed-source (OpenAI, Google) and open-source (Qwen) models on multi-turn maze navigation (BabyAI), agentic RAG (Multi-Hop), and agentic tool use (τ-bench).

We're still running a few experiments and plan to update the post with additional results in a few days.

Looking forward to trying out importance weighting soon!

Curated Behavior Cloning: Small LLMs Can Beat Large Ones at 5-30x Lower Cost: https://www.tensorzero.com/blog/curated-behavior-cloning-sma...

u/anndvision

KarmaCake day12July 29, 2025View Original