Dead Comment
There is very little value that a company that has to support multiple different providers, such as Cursor, can offer on top of tailored agents (and "unlimited" subscription models) by LLM providers.
Why: LLMs are increasingly becoming multimodal, so an image "token" or video "token" is not as simple as a text token. Also, it's difficult to compare across competitors because tokenization is different.
Eventually prices will just be in $/Mb of data processed. Just like bandwidth. I'm surprised this hasn't already happened.
Update: Here is what o3 thinks about this topic: https://chatgpt.com/share/688030a9-8700-800b-8104-cca4cb1d0f...
Yes you can. The community creates quantized variants of these that can run on consumer GPUs. A 4-bit quantization of LLAMA 70b works pretty well on Macbook pros, the neural engine with unified CPU memory is quite solid for these. GPUs is a bit tougher because consumer GPU RAM is still kinda small.
You can also fine-tune them. There are lot of frameworks like unsloth that make this easier. https://github.com/unslothai/unsloth . Fine-tuning can be pretty tricky to get right, you need to be aware of things like learning rates, but there are good resources on the internet where a lot of hobbyists have gotten things working. You do not need a PhD in ML to accomplish this. You will, however, need data that you can represent textually.
Source: Director of Engineering for model serving at Databricks.