sammyd56 (u/sammyd56)

sammyd56 commented on Andrej Karpathy – It will take a decade to work through the issues with agents dwarkesh.com/p/andrej-kar... · Posted by u/ctoth

reenorap · 5 months ago

Are "agents" just programs that call into an LLM and based on the response, it will do something?

sammyd56 · 5 months ago

An agent is just an LLM calling tools in a loop. If you're a "show me the code" type person like me, here's a worked example: https://samdobson.uk/posts/how-to-build-an-agent/

sammyd56 commented on NanoChat – The best ChatGPT that $100 can buy github.com/karpathy/nanoc... · Posted by u/huseyinkeles

bravura · 5 months ago

The measures that drop exponentially like val/bpb and train/loss you should put the x-axis in log-scale. That will better show you if it's converged

sammyd56 · 5 months ago

Great call, thankyou - I switched to log scale for those metrics - agree that it is much clearer.

sammyd56 commented on NanoChat – The best ChatGPT that $100 can buy github.com/karpathy/nanoc... · Posted by u/huseyinkeles

simonw · 5 months ago

I got your model working on CPU on macOS by having Claude Code hack away furiously for a while. Here's a script that should work for anyone: https://gist.github.com/simonw/912623bf00d6c13cc0211508969a1...

You can run it like this:

  cd /tmp
  git clone https://huggingface.co/sdobson/nanochat
  uv run https://gist.githubusercontent.com/simonw/912623bf00d6c13cc0211508969a100a/raw/80f79c6a6f1e1b5d4485368ef3ddafa5ce853131/generate_cpu.py \
    --model-dir /tmp/nanochat \
    --prompt "Tell me about dogs."

sammyd56 · 5 months ago

This is a much easier way to run the model. I'm going to update the huggingface README to point to this. The one thing that could be improved is the turn-taking between user and assistant, which it sometimes gets confused about. I fixed that in my fork of your gist here: https://gist.github.com/samdobson/975c8b095a71bbdf1488987eac...

sammyd56 commented on NanoChat – The best ChatGPT that $100 can buy github.com/karpathy/nanoc... · Posted by u/huseyinkeles

sammyd56 · 5 months ago

I'm doing a training run right now (started 20min ago). You can follow it at https://api.wandb.ai/links/sjd333-none/dsv4zkij

Will share the resulting model once ready (4 hours from now) for anyone to test inference.

sammyd56 · 5 months ago

I've uploaded the model here: https://huggingface.co/sdobson/nanochat

I didn't get as good results as Karpathy (unlucky seed?)

It's fun to play with though...

User: How many legs does a dog have? Assistant: That's a great question that has been debated by dog enthusiasts for centuries. There's no one "right" answer (...)

sammyd56 commented on NanoChat – The best ChatGPT that $100 can buy github.com/karpathy/nanoc... · Posted by u/huseyinkeles

royosherove · 5 months ago

Cool. Is there a simple "howto" on running this repo with training on W&B for a programmer like me who has never done model training flows? Maybe you could share the steps you took?

sammyd56 · 5 months ago

There's not much to it... it took longer to spin up the cloud machine than it did to kick off the training run. I'll be writing up a blog post with a step-by-step guide when I get a free moment, but in the meantime, here are the commands I ran: https://pastebin.com/sdKVy0NR

sammyd56 commented on NanoChat – The best ChatGPT that $100 can buy github.com/karpathy/nanoc... · Posted by u/huseyinkeles

sammyd56 · 5 months ago

I'm doing a training run right now (started 20min ago). You can follow it at https://api.wandb.ai/links/sjd333-none/dsv4zkij

Will share the resulting model once ready (4 hours from now) for anyone to test inference.

sammyd56 commented on GPT3 Get answers to technical questions from your documentation site jointwig.com/... · Posted by u/chandan_maruthi

visarga · 3 years ago

It seems to do open domain question answering without restricting to the topic.

> Is the word cat made of 4 or 5 letters?

>> The word cat is made of 4 letters, 3 of which are in the stem.

sammyd56 · 3 years ago

That's probably because fine-tuning is not the right approach for this use-case. A better approach might look more like this: https://github.com/openai/openai-cookbook/blob/main/examples...

sammyd56 commented on Poetry meets journalism, with LLMs and diffusion models rhymingreporter.art/... · Posted by u/sammyd56

sammyd56 · 3 years ago

Hi HN,

This one is mine. It's a light-hearted digital newspaper of sorts, covering news from local British communities through the medium of verse (generated by LLMs).

Until now I've been using ChatGPT for the generation, with a fairly generic prompt that asks for a poem about the article that follows. ChatGPT's ability to summarise is incredible. It's really not great, though, at rhyme and meter. That means a decent amount of curation and heavy editing is needed for the best to get something passable. Prompt engineering has not seemed to have a meaningful impact. I'm looking to fine-tune a davinci model, which I think will deliver higher quality with less effort.

Some example from the current process:

Poem: https://rhymingreporter.art/farewell-little-red/ | Original article: https://www.cornwalllive.com/whats-on/food-drink/little-red-...

Poem: https://rhymingreporter.art/flowing-frocks-icy-blue/ | Original article: https://www.cornwalllive.com/whats-on/whats-on-news/gallery/...

The quality can mostly be blamed on me, rather than GPT-3. I haven't written a poem since school :)

The accompanying illustrations are created with Stable Diffusion using DiffusionBee. Images take around 30s to generate on my Macbook Air M1. I'm looking to switch to MochiDiffusion to cut generation time a bit.

The blog is running Ghost on a small DigitalOcean VPS, with emails delivered by Mailgun.

The process right now is somewhat labour-intensive: between researching news stories, iterating on the content, and publishing, it takes a decent amount of time for each piece of content. I'm confident in being able to automate a large part of it, in time.

One fun fact I learned when planning the virtual road-trip for this project: in average traffic conditions, it's possible to visit every city in England in less than 48 hours. The near-optimum solution to this formulation of the Traveling Salesmen Problem (starting in the South West), a route taking 47:00:10, was calculated in less than 5 seconds with a Guided Local Search algorithm. [1]

Technology means that I can virtually, learn about, write creatively, and publish regularly, all whilst having a family and a full-time job. What a time to be alive!

Very open to your thoughts, and indeed to feedback on the concept or the execution.

[1] https://developers.google.com/optimization/routing/tsp