Witty comment aside, the human brain is pretty efficient in terms of energy use considering it's taking in a ton of data while it's conscious. Two each audio and video streams, olfactory, gustatory, touch, vestibular, and all the interoception. Inference and training in real time. All for the low price of 125 watts, a quarter that if you're just measuring the brain and not the whole body.
The paper was published last year. https://www.frontiersin.org/journals/science/articles/10.338...
I'm not convinced this field will outpace silicon or whatever succeeds it, considering how big the semiconductor industry is.
Quantization is black magic of the software variety that seems to be able to significantly reduce that without a commensurate loss in quality, though the results are a little subjective. Some well reviewed quantizations of 7B models can get them below 9GB.