Deleted Comment
None of these modern recurrent architecture have a way to do this.
This sounds like what they call "Bamba-9B" is actually an 18B model quantised to 8 bits.
I thought generally we were naming models "nB" by their number of params and treating quantisation as a separate concern. Are there any other models that instead treat the name as an indicative memory requirement?
Is this an attempt to hide that it fares poorly vs other ~18B parameter models?
EDIT: no, I just misunderstood
For example you could never fill in the last chapter of any good book without having knowledge of every previous chapter. Not highly detailed knowledge, but still knowledge.
1. the laws of nature (i.e. how accurately do the laws of physics permit measuring the system and how determined are future states based on current states)
2. one's present understanding of the laws of nature
3. one's ability to measure the state of a system accurately and compute the predictions in practice
It strikes me as odd to include 2 and 3 in a definition of "entropy."
I can only hope at the end of the day their data doesn't end up in the wrong hands. It is their most valuable asset, and this is a way bigger deal than it seems.