Colnomic and nvidia models are great for embedding images and MUVERA can transform those to 1D vectors.
- additional modalities
- Faster FPS (inferences per second)
- Reaction time tuning (latency vs quality tradeoff) for visual and audio inputs/outputs
- built-in planning modules in the architecture (think premotor frontal lobe)
- time awareness during inference (towards an always inferring / always learning architecture)I think that's especially true when you look at how well GPT-4o worked out of the box -- it makes clear what you get from the battle-hardening that's done to the big commercial models. For the numbers we did include, the thought was that was the most meaningful signal was that going from 8B to 70B with Llama3 actually gives you a lot in terms of mitigating that brittleness. That goes a step towards explaining the story of what we're seeing, moreso than showing a bunch of comparison LLMs fall over out of the box.
In the end, we presented those models that did best with light tuning and optimization (say a week's worth of iteration or so). I anticipate that we'll have to expand these results to include OpenBio as we work through the conference reviewer gauntlet. Any others you think we definitely should work to include? Would definitely be helpful!
Have you checked out dataset building with nemotron? The nemotron synthetic data builder is quite powerful.
Moreso, check out model merging. It's possible if you merge some of your model against llama3.1 base it may perform much better.
Check out max labonne's work on hugging face
In the roadmap is adding video export, digital twin presentations, and real-time presentations. We don't wrap a public LLM, so we don't share any data.
[1] -- https://www.biorxiv.org/content/10.1101/2022.11.18.517004v3
1. First, models will predict pollution. The outcomes will help shape urban policy. But these won't solve crime or stop people from driving.
2. Second, models will predict individual behavior and track person level emissions. The outcomes will force behavior changes, mostly freedom limiting.
3. Third, and finally, models will predict thoughts. The the thought of driving instead of walking might trigger a response.
It's a slippery slope and we need to draw a line between prediction and policy.