holomorphiclabs (u/holomorphiclabs)

Ask HN: Where to Host an LLM in 2024?

Deleted Comment

holomorphiclabs commented on Ask HN: Most efficient way to fine-tune an LLM in 2024? · Posted by u/holomorphiclabs

dhouston · a year ago

Qlora + axolotl + good foundation model (llama/mistral/etc, usually instruction fine tuned) + runpod works great.

A single A100 or H100 with 80GB VRAM can fine tune 70B open models (and obviously scaling out to many nodes/GPUs is faster, or can use much cheaper GPUs for fine tuning smaller models.)

The localllama Reddit sub at https://www.reddit.com/r/LocalLLaMA/ is also an awesome community for the GPU poor :)

holomorphiclabs · a year ago

Thank you! and yes huge fan of r/localllama :)

holomorphiclabs commented on Ask HN: Most efficient way to fine-tune an LLM in 2024? · Posted by u/holomorphiclabs

blissfulresup · a year ago

Look into LoRa

https://arxiv.org/abs/2106.09685

holomorphiclabs · a year ago

Thank you we have been exploring this.

holomorphiclabs commented on Ask HN: Most efficient way to fine-tune an LLM in 2024? · Posted by u/holomorphiclabs

dvt · a year ago

I think you may be misunderstanding what fine tuning does. It does not teach the model new knowledge. In fact, Meta has a paper out that argues you only need a data set of 1000[1] to achieve pretty good alignment (fine-tuning) results. (100M is way overkill.) For knowledge retrieval, you need RAG (usually using the context window).

[1] https://arxiv.org/pdf/2305.11206.pdf

holomorphiclabs · a year ago

Our findings are that RAG does not generalize well when critical understanding is shared over a large corpus of information. We do not think it is a question of either context length or retrieval. In our case it is very clearly capturing understanding within the model architecture itself.

holomorphiclabs commented on Ask HN: Most efficient way to fine-tune an LLM in 2024? · Posted by u/holomorphiclabs

Redster · a year ago

What LLM are you hoping to use. Have you considered using HelixML? If I am reading you right, the primary concern is compute costs, not human-time costs?

holomorphiclabs · a year ago

We are finding there is a trade-off between model performance and hosting costs post-training. The optimal outcome is where we have a model that performs well on next-token prediction (and some other in-house tasks we've defined) that ultimately results in a model that we can host on the lowest-cost hosting provider rather than be locked in. I think we'd only go the proprietary model route if the model really was that much better. We're just trying to save our selves weeks/months of benchmarking time/costs if there was already an established option in this space.

holomorphiclabs commented on Ask HN: Most efficient way to fine-tune an LLM in 2024? · Posted by u/holomorphiclabs

tdba · a year ago

What's your measure of performance?

Theres no one size fits all answer yet, but if you just want to test it out there are many commercial offerings on which you should be able to get some results for under $10k.

holomorphiclabs · a year ago

Are there any that are recommended? Honestly we would rather not share data with any 3P vendors. It's been a painstaking progress to curate it.

Posted by u/holomorphiclabs a year ago

Ask HN: Most efficient way to fine-tune an LLM in 2024?

u/holomorphiclabs

KarmaCake day29April 4, 2024

About

Currently building out Holomorphic Labs.

We are a research lab interested in next-generation model and agent architectures.

Hiring for Engineers and Applied Scientists: careers [at] holomorphic [dot] ai

General Info: info [at] holomorphic [dot] ai

View Original