Readit News logoReadit News
GabrielBianconi commented on Fine-tuned small LLMs can beat large ones with programmatic data curation   tensorzero.com/blog/fine-... · Posted by u/GabrielBianconi
simianwords · 19 days ago
I think its a good idea but how do you not accidentally benchmark hack here?
GabrielBianconi · 19 days ago
We set up dataset splits and the usual best practices. Of course, if you overdo things, you can still hack benchmarks; our goal isn't to publish SOTA numbers but rather to illustrate results from our methodology. We didn't even tune hyperparameters, we just used the default choices. Definitely a valid concern for teams chasing SOTA though.

Thanks!

GabrielBianconi commented on Ask HN: How does the Postgres ecosystem compare to Vitess at 1PB+?    · Posted by u/GabrielBianconi
samlambert · 19 days ago
There is nothing as mature as Vitess in the Postgres world. But PlanetScale the company behind Vitess is building a Postgres sharding project.
GabrielBianconi · 19 days ago
Thanks, Sam! I'm excited to see what you guys come up with.
GabrielBianconi commented on Fine-tuned small LLMs can beat large ones with programmatic data curation   tensorzero.com/blog/fine-... · Posted by u/GabrielBianconi
6510 · 19 days ago
Noob question: Would it be possible to train a small model for a single prompt?
GabrielBianconi · 19 days ago
With supervised fine-tuning (SFT), you'll often see good results with 100-1000+ datapoints (they can be variations of the same prompt template). If you have more limited data, reinforcement fine-tuning (RFT) can work well in the 10-100 range.

Good luck!

GabrielBianconi commented on Fine-tuned small LLMs can beat large ones with programmatic data curation   tensorzero.com/blog/fine-... · Posted by u/GabrielBianconi
mwigdahl · 19 days ago
Is this just distillation but with a step to filter out low-quality responses first?
GabrielBianconi · 19 days ago
AFAIK, distillation typically refers to tuning on the logits of the larger model, so you wouldn't be able to do that with fine-tuning APIs (OpenAI + Google in our blog post). We fine-tune on the outputs themselves.

But broadly speaking, yes, we generate data using a large model, curate the best samples using metrics from the environment, and fine-tune on that data. This isn't a novel technique from an academic perspective; our focus is on applying it to different use cases (e.g. agentic RAG, agentic tool use) and models (OpenAI, Google, Qwen).

Thanks!

GabrielBianconi commented on Fine-tuned small LLMs can beat large ones with programmatic data curation   tensorzero.com/blog/fine-... · Posted by u/GabrielBianconi
k8si · 19 days ago
Maybe this is a nitpick but CoNLL NER is not a "challenging task". Even pre-LLM systems were getting >90 F1 on that as far back as 2016.

Also, just in case people want to lit review further on this topic: they call their method "programmatic data curation" but I believe this approach is also called model distillation and/or student-teacher training.

GabrielBianconi · 19 days ago
Thanks for the feedback!

We chose a set of tasks with different levels of complexity to see how this approach would scale. For LLMs, the "challenge" with NER is not the task itself but the arbitrariness of the labels in the dataset. I agree it's still much simpler than the other tasks we present (agentic RAG, agentic tool use, maze navigation).

There are definitely strong parallels to model distillation and student-teacher training, with the primary difference being that we don't simply take all the data from the larger model but rather filter the dataset based on metrics from the environment. In the "Does curation even matter?" section, we show that this generally improves the result by a good margin.

We link to Vicuna, which might be the closest reference as prior art: https://lmsys.org/blog/2023-03-30-vicuna/

Thanks!

GabrielBianconi commented on Supervised fine tuning on curated data is reinforcement learning   arxiv.org/abs/2507.12856... · Posted by u/GabrielBianconi
TheTaytay · 25 days ago
Thanks for this - I’ve spent the last hour reading your docs and blog. I like the primitives you’ve exposed in your APO, and particularly like the decision to separate out the structured inputs from the prompt when you record an LLM call, so I can finally perform optimizations and evals on past calls.

Quick question : you mentioned unsloth in the blog post. Which of the fine tuning providers mentioned is using unsloth under the hood?

GabrielBianconi · 25 days ago
[I'm his coworker.] We ran Unsloth ourselves on a GPU-by-the-hour server. We have a notebook in the repository showing how to query historical data and use it with Unsloth.

It's a WIP PR that we plan to merge soon: https://github.com/tensorzero/tensorzero/pull/2273

GabrielBianconi commented on Supervised fine tuning on curated data is reinforcement learning   arxiv.org/abs/2507.12856... · Posted by u/GabrielBianconi
mandevil · 25 days ago
Interesting to see two independent researchers on this. Makes me curious as to what the back-story is? Side project?
GabrielBianconi · 25 days ago
Yeah, I hadn't noticed!

u/GabrielBianconi

KarmaCake day64August 9, 2013
About
co-founder @ tensorzero – open-source LLM infra

https://github.com/tensorzero/tensorzero

View Original