This is so fun and creative. Congrats on launching!
Deleted Comment
Having run many red teams recently as I build out promptfoo's red teaming featureset [0], I've noticed the Llama models punch above their weight in terms of accuracy when it comes to safety. People hate excessive guardrails and Llama seems to thread the needle.
Very bullish on open source.
prompts:
- 'Answer this coding problem in Python: {{ask}}'
providers:
- ollama:chat:gemma2:9b
- ollama:chat:llama3:8b
tests:
- vars:
ask: function to find the nth fibonacci number
- vars:
ask: calculate pi to the nth digit
- # ...
One small thing I've always appreciated about Gemma is that it doesn't include a "Sure, I can help you" preamble. It just gets right into the code, and follows it with an explanation. The training seems to emphasize response structure and ease of comprehension.Also, best to run evals that don't rely on rote memorization of public code... so please substitute with your personal tests :)
This is insane to me.
[0] https://dinosaurpictures.org/ancient-earth#240
[1] https://www.gplates.org/