comp_raccoon (u/comp_raccoon)

comp_raccoon commented on Olmo 3: Charting a path through the model flow to lead open-source AI allenai.org/blog/olmo3... · Posted by u/mseri

turnsout · 3 months ago

I used the Ai2 Playground and Olmo 3 32GB Think, and asked it to recommend a language for a green-field web app based on a list of criteria. It gave me a very good and well-reasoned answer (Go, with Rust as a backup), formatted like a high-quality ChatGPT or Claude response.

I then had it show the "OlmoTrace" for its response, which seems like it finds exact matches for text strings in its training data that end up in the response. Some of the matched sources were related (pages about Go, Rust, Python, etc), while others were completely unrelated, but just happened to have the same turn of phrase (e.g. "Steeper learning curve").

It was interesting, but is it useful? It was impossible for me to actually fact-check any of the claims in the response based on the matched training data. At this stage, it felt about as helpful as linking every word to that word's entry in a dictionary. "Yep, that's a word alright." I don't think it's really tracing the "thought."

What could be interesting is if the user could dynamically exclude certain training sources before the response is generated. Like, I want to ask a question about climate change, but I want to exclude all newspapers and focus on academic journals.

Transparency is a good first step, but I think we're missing the "Step 2."

comp_raccoon · 3 months ago

Olmo author here! Your are absolutely spot on on

> It was impossible for me to actually fact-check any of the claims in the response based on the matched training data.

this is true! the point of OlmoTrace is to show that even the smallest phrases generated by a langue model are a product of its training data. It’s not verification; a search system doing post hoc checks would be much more effective

comp_raccoon commented on Olmo 3: Charting a path through the model flow to lead open-source AI allenai.org/blog/olmo3... · Posted by u/mseri

stavros · 3 months ago

> the best fully open 32B-scale thinking model

It's absolutely fantastic that they're releasing an actually OSS model, but isn't "the best fully open" a bit of a low bar? I'm not aware of any other fully open models.

comp_raccoon · 3 months ago

Olmo author here… would be nice to have some more competition!! I don’t like that we are so lonely either.

We are competitive with open weights models in general, just a couple points behind best Qwen.

Fully open models are important for research community; a lot of fundamental discoveries are made when you have access to training data. We call out we are the best fully open model because researchers would want to know about that.

comp_raccoon commented on Olmo 3: Charting a path through the model flow to lead open-source AI allenai.org/blog/olmo3... · Posted by u/mseri

andai · 3 months ago

I'm out of the loop... so Qwen3-30B-VL is smart and Qwen3-30B is dumb... and that has to do not with the size but architecture?

comp_raccoon · 3 months ago

Olmo author here, but I can help! First release of Qwen 3 left a lot of performance on the table bc they had some challenges balancing thinking and non-thinking modes. VL series has refreshed posttrain, so they are much better!

comp_raccoon commented on Olmo 3: Charting a path through the model flow to lead open-source AI allenai.org/blog/olmo3... · Posted by u/mseri

thot_experiment · 3 months ago

Qwen3-30B-VL is going to be fucking hard to beat as a daily driver, it's so good for the base 80% of tasks I want an AI for, and holy fuck is it fast. 90tok/s on my machine, I pretty much keep it in vram permanently. I think this sort of work is important and I'm really glad it's being done, but in terms of something I want to use every day there's no way a dense model can compete unless it's smart as fuck. Even dumb models like Qwen3-30B get a lot of stuff right and not having to wait is amazing.

comp_raccoon · 3 months ago

Olmo author here! Qwenmodels are in general amazing, but 30B is v fast cuz it’s an MoE. MoEs very much on the roadmap for next Olmo.

comp_raccoon commented on Olmo 3: Charting a path through the model flow to lead open-source AI allenai.org/blog/olmo3... · Posted by u/mseri

weregiraffe · 3 months ago

Is the training data open-source? And can you validate that the model was trained on the claimed training data alone? Without this, all benchmarks are useless.

comp_raccoon · 3 months ago

Olmo author here! we release all training data and all our training scripts, plus intermediate checkpoints, so you could take a checkpoint, reproduce a few steps on the training data, and check if loss matches.

it’s no cryptography proof, and you can’t get perfect determinism on nvidia GPUs, but it’s pretty close.

comp_raccoon commented on Olmo 3: Charting a path through the model flow to lead open-source AI allenai.org/blog/olmo3... · Posted by u/mseri

silviot · 3 months ago

I tried the playground at https://playground.allenai.org/ and clicked the "Show OlmoTrace" button.

Above the response it says

> Documents from the training data that have exact text matches with the model response. Powered by infini-gram

so, if I understand correctly, it searches the training data for matches in the LLM output. This is not traceability in my opinion. This is an attempt at guessing.

Checking individual sources I got texts completely unrelated with the question/answer, but that happen to share an N-gram [1] (I saw sequences up to 6 words) with the LLM answer.

I think they're being dishonest in their presentation of what Olmo can and can't do.

[1] https://en.wikipedia.org/wiki/N-gram

comp_raccoon · 3 months ago

Olmo researcher here. The point of OlmoTrace is not no attribute the entire response to one document in the training data—that’s not how language models “acquire” knowledge, and finding a single or few documents as support for an answer is impossible.

The point of OlmoTrace is to show that fragments of model response are influenced by its training data. sometimes is how specific adjectives are used together in way that seem unnatural to us, but are combination of training data (ask for a movie review!)

A favorite example of mine is asking to tell a joke or ask for a random number, because strangely all LLMs return the same joke or number. Well with OlmoTrace, you can see which docs in the training data contain the super common response!

hope this helps

comp_raccoon commented on Molmo: a family of open multimodal AI models molmo.allenai.org/blog... · Posted by u/jasondavies

jszymborski · a year ago

I think the "Molmo-7B-O" and "MolmoE-1B" models are using Olmo, judging by the fact its LLM backbone is the only one listed as having open data.

EDIT: From the post "For the LLM, we have trained models on a variety of choices at different scales and degrees of openness including: the fully open-weight and data OLMo-7B-1024 (using the October, 2024 pre-released weights, which will be public at a later date), the efficient fully open-weight and data OLMoE-1B-7B-0924, open-weight Qwen2 7B, open-weight Qwen2 72B, open-weight Mistral 7B, open-weight Gemma2 9B, and Phi 3 Medium). Today we are releasing 4 samples from this family."

comp_raccoon · a year ago

This is correct! we wanted to show that you can use PixMo dataset and our training code to improve any open model, not just ours!

comp_raccoon commented on Molmo: a family of open multimodal AI models molmo.allenai.org/blog... · Posted by u/jasondavies

naiv · a year ago

image was flagged as inappropriate by the google vision api ?

comp_raccoon · a year ago

google image APIs are not great, yeah it’s only for demo, though—checkpoints on huggingface are uncensored.

comp_raccoon commented on Molmo: a family of open multimodal AI models molmo.allenai.org/blog... · Posted by u/jasondavies

imjonse · a year ago

Apart from results on benchmarks, what sets Allenai models apart - Olmo/OlMoE/Molmo - is they are fully open, not just open-weights/free to use. The datasets used, a crucial ingredient, are also disclosed and open. UPDATE: they say the datasets will be made available, but they aren't yet.

comp_raccoon · a year ago

it’s coming! just takes a bit more time to properly release it.