Did Meta add scalable rope to the official implementation?
We changed RoPE's theta from 10k to 1m and fine-tuned with 16k tokens long sequences.
http://bl.ocks.org/syntagmatic/3150059
It’s a shame Kai isn’t created in the README, LICENSE, or announcement.
I think I remember a similar thing happening with previous wav2letter releases.
I would love for a simple tutorial on just using a pretrained model but that feels unlikely to ever happen