A fast Trusted Execution Environment protocol utilizing the H100 confidential mode. Prompts are decrypted and processed in the GPU enclave. Key innovation is that it can be really fast especially on ≥10B parameter models, with latency overhead less than 1%.
Like in CPU confidential cloud computing, that opens up a channel for communication with cloud GenAI models that even the provider cannot intercept.
Wonder whether something like that could boost the trust for all the AI neoclouds out there.
>Intelligence too cheap to meter is well within grasp
And also:
>cost of intelligence should eventually converge to near the cost of electricity.
Which is a meter-worthy resource. So intelligence effect on people's lives is in the order of magnitude of one second of a toaster use each day, in present value. This begs the question: what could you do with a toaster-second say 5 years from today?