7B: - https://huggingface.co/tloen/alpaca-lora-7b - https://huggingface.co/ozcur/alpaca-native-4bit 13B: - https://huggingface.co/samwit/alpaca13B-lora - https://huggingface.co/Dogge/alpaca-13b 30B: - https://huggingface.co/baseten/alpaca-30b - https://huggingface.co/Pi3141/alpaca-30B-ggml
Fine-tuning can get you similar results on smaller / faster models. The downside is you have to craft the dataset in the right way. There are trade-offs to both approaches but fwiw, I don't think Alpaca-7b can do few-shot learning.
One way you can do this is pass your documentation to a larger model (like a GPT3.5 / OSS equivalent) and have it generate the questions/answers. You can then use that dataset to fine-tune something like Llama to get conversation / relevant answers.
Wonder if it's the 4bit quantization.
We also fine-tuned and OSS'd a 30b version here that you can checkout (on the cleaned 52k Alpaca dataset) https://huggingface.co/baseten/alpaca-30b
Deleted Comment