The release in Tahoe 26.2 will enable us to do fast tensor parallelism in MLX. Each layer of the model is sharded across all machines. With this type of parallelism you can get close to N-times faster for N machines. The main challenge is latency since you have to do much more frequent communication.
Exo-Labs: https://github.com/exo-explore/exo
Needless to say I prefer open access since those outside institutions can then read science, but the incentive model is heavily broken, and I'm not sure it's a good price to pay for the reward.