It's absolutely fantastic that they're releasing an actually OSS model, but isn't "the best fully open" a bit of a low bar? I'm not aware of any other fully open models.
We are competitive with open weights models in general, just a couple points behind best Qwen.
Fully open models are important for research community; a lot of fundamental discoveries are made when you have access to training data. We call out we are the best fully open model because researchers would want to know about that.
I then had it show the "OlmoTrace" for its response, which seems like it finds exact matches for text strings in its training data that end up in the response. Some of the matched sources were related (pages about Go, Rust, Python, etc), while others were completely unrelated, but just happened to have the same turn of phrase (e.g. "Steeper learning curve").
It was interesting, but is it useful? It was impossible for me to actually fact-check any of the claims in the response based on the matched training data. At this stage, it felt about as helpful as linking every word to that word's entry in a dictionary. "Yep, that's a word alright." I don't think it's really tracing the "thought."
What could be interesting is if the user could dynamically exclude certain training sources before the response is generated. Like, I want to ask a question about climate change, but I want to exclude all newspapers and focus on academic journals.
Transparency is a good first step, but I think we're missing the "Step 2."
> It was impossible for me to actually fact-check any of the claims in the response based on the matched training data.
this is true! the point of OlmoTrace is to show that even the smallest phrases generated by a langue model are a product of its training data. It’s not verification; a search system doing post hoc checks would be much more effective