roosgit (u/roosgit) - Readit News

To Infinity but Not Beyond meyerweb.com/eric/thought...

roosgit commented on Ask HN: What's in your crontab? · Posted by u/greyface-

roosgit · 20 days ago

# Runs the DB backup script on Thu at 22:00 -- I download the database backup for a few websites that get new data every week. I do this in case my host bans my account.

# Runs the IP change check on Mon - Sun at 09:00, 10:30, 12:00, 20:00 -- If the power goes out or the router reboots I get a new IP. On the server I use fail2ban and if I log into the admin panel I might get banned for making too many requests. So my IP needs to be "blessed".

# Runs Letsencrypt certificate expiry check on Sundays at 11:00 and 18:00 -- I still have a server where I update the certificates by hand.

# Runs the "daily" backup -- Just rsync

# Download Godaddy auction data every day at 19:00 -- I don't actively do this anymore but I used to check, based on certain criteria, for domains that were about to expire.

# Download the sellers.json on the 1st of every month at 19:00 -- I use this to collect data on websites that appear and disappear from the Mediavine and Adthrive sellers.json

roosgit commented on Reality Check on Deep Research by Ben Evans ben-evans.com/benedicteva... · Posted by u/alexdong

roosgit · 6 months ago

I've known about this issue since Lllama 1. Tried it with Llama 2 and Mistral when those models were released. LLMs are not databases.

The test I ran was to ask the LLM about an expired domain of a doctor (obstetrician). I no longer remember the exact domain, but it was similar to annasmithmd.com. One LLM would tell me it used to belong to a doctor named Megan Smith. Another got the name right, Anna Smith, but when I asked it what kind of a doctor, which specialty, it answered pediatrician.

So the LLM had no clue, but from the name of the domain it could infer (I guess that's why they call it inference) that the "md" part was associated with doctors.

By the way, newer LLMs are very good at making domains more human readable by splitting them into words.

roosgit commented on Ask HN: If building a computer, what will be good for possible local GenAI use? · Posted by u/chrismorgan

roosgit · 7 months ago

I can answer question 3. Prompt processing (how fast your input is parsed) is highly correlated with computing speed. Inference (how fast the LLM answers) is highly correlated with memory bandwidth. So a good CPU might read your question faster, but it will answer pretty much as slow as a cheap CPU with the same RAM.

I have a Ryzen 3 4100. Just tested Qwen2.5-Coder-32B-Instruct-Q3_K_S.gguf with llama.cpp.

CPU-only:

54.08 t/s prompt eval

2.69 t/s inference

---

CPU + 52/65 layers offloaded to GPU (RTX 3060 12GB):

166.79 t/s prompt eval

6.62 t/s inference

roosgit commented on Ask HN: Anyone running AI locally? Mind to share your experience? · Posted by u/skwee357

roosgit · 7 months ago

Renting could be a good choice to get started. I used to rent a g4dn.xlarge instance from AWS (for Stable Diffusion, not LLMs). More affordable options are Runpod and Vast.ai.

I started with a local system using llama.cpp on CPU alone and for short questions and answers it was OK for me. Because (in 2023) I didn't know if LLMs would be any good, I chose cheap components https://news.ycombinator.com/item?id=40267208.

Since AWS was getting pretty expensive, I also bought an RTX 3060(16GB), an extra 16GB RAM (for a total of 32GB) and a superfast 1TB M.2 SSD. The total cost of the components was around €620.

Here are some basic LLM performance numbers for my system:

https://news.ycombinator.com/item?id=41845936

https://news.ycombinator.com/item?id=42843313

roosgit commented on Ask HN: Building a PC for AI Tasks · Posted by u/mrbonner

roosgit · 7 months ago

Start with r/LocalLLama and r/StableDiffusion. Look for benchmarks for various GPUs.

I have an RTX 3060(12GB) and 32GB RAM. Just ran Qwen2.5-14B-Instruct-Q4_K_M.gguf in llama.cpp with flash attention enabled and 8K context. I get get 845t/s for prompt processing and 25t/s for generation.

For a while I even ran llama.cpp without a GPU (don't recommend it for diffusion) and with the same model (Qwen2.5 14B) I would get 11t/s for processing and 4t/s for generation. Acceptable for chats with short questions/instructions and answers.

roosgit commented on · Posted by u/roschdal

roosgit · 8 months ago

How rich?

You can get some inspiration from businesses for sale on Empire Flippers https://empireflippers.com/marketplace/.

As a rule of thumb for choosing the niche, pick from one of these https://support.google.com/admob/answer/3150953?hl=en

roosgit commented on Ask HN: What is your local LLM setup? · Posted by u/anditherobot

roosgit · 10 months ago

I have a separate PC that I access through SSH. I recently bought a GPU for it, before that I was running it on CPU alone.

- B550MH motherboard

- Ryzen 3 4100 CPU

- 32GB (2x16) RAM cranked up to 3200MHz (prompt generation in memory bound)

- 256GB M.2 NVMe (helps with loading models faster)

- Nvidia 3060 12GB

Software-wise, I use llamafile because on the CPU it's faster by 10-20% for prompt processing than llama.cpp.

Performance "Meta-Llama-3.1-8B-Instruct-Q4_K_M.gguf":

CPU-only: 23.47 t/s (processing), 8.73 t/s (generation)

GPU: 941.5 t/s (processing), 29.4 t/s (generation)

Posted by u/roosgit a year ago

Sanding UI blog.jim-nielsen.com/2024...

roosgit commented on Ask HN: How can I experiment with LLMs with a old machine? · Posted by u/hedgehog0

roosgit · a year ago

I've never used it, but I think Google Colab has a free plan.

As another option, you can rent a machine with a decent GPU on vast.ai. An Nvidia 3090 can be rented for about $0.20/hr.

u/roosgit

KarmaCake day863October 28, 2019View Original