Readit News logoReadit News
ajhai commented on π0.5: A VLA with open-world generalization   pi.website/blog/pi05... · Posted by u/lachyg
desertmonad · 8 months ago
Finally, machines doing the work we dont want to do
ajhai · 8 months ago
https://x.com/ajhai/status/1899528923303809217 something I have been working on for a few months now.
ajhai commented on π0.5: A VLA with open-world generalization   pi.website/blog/pi05... · Posted by u/lachyg
djoldman · 8 months ago
I'm genuinely asking (not trying to be snarky)... Why are these robots so slow?

Is it a throughput constraint given too much data from the environment sensors?

Is it processing the data?

I'm curious about where the bottleneck is.

ajhai · 8 months ago
It is inference latency most of the time. These VLA models take in an image + state + text and spit out a set of joint angle deltas.

Depending on the model being used, we may get just one set of joint angle deltas or a series of them. In order to be able to complete a task, it will need to capture images from the cameras, current joint angles and send them to the model along with the task text to get the joint angle changes we will need to apply. Once the joint angles are updated, we will need to check if the task is complete (this can come from the model too). We run this loop till the task is complete.

Combine this with the motion planning that has to happen to make sure the joint angles we are getting do not result in colliding with the surroundings and are safe, results in overall slowness.

ajhai commented on Ask HN: What are you working on? (February 2025)    · Posted by u/david927
ajhai · 10 months ago
Building a wheeled robot with arms to help automate household chores - https://x.com/ajhai/status/1891933005729747096

I have been working with LLMs and VLMs to automate browser based workflows among other things for the last couple of years. Given how good the vision models have gotten lately, the perception problem is solved to level where it opens up a lot of possibilities. Manipulation is not generally solved yet but there is a lot of activity in the field and there are promising approaches to solve (OpenVLA, π0). Given these, I'm trying to build an affordable robot that can help around with household chores using language and vision models. Idea is to ship capable enough hardware that can do a few things really well with the currently available models and keep upgrading the AI stack as manipulation models get better over time.

ajhai commented on Llama 3.1   llama.meta.com/... · Posted by u/luiscosio
jxy · a year ago
You are a maintainer of a software that depends on ollama, so you should know that ollama depends on llama.cpp. And as of now, llama.cpp doesn't support the new ROPE: https://github.com/ggerganov/llama.cpp/issues/8650, and all ollama can do is wait for llama.cpp: https://github.com/ollama/ollama/issues/5881
ajhai · a year ago
I've tested Q4 on M1 and it works though the quality may not likely be the same as you'd expect as others have pointed out on the issue.
ajhai commented on Llama 3.1   llama.meta.com/... · Posted by u/luiscosio
ajhai · a year ago
You can already run these models locally with Ollama (ollama run llama3.1:latest) along with at places like huggingface, groq etc.

If you want a playground to test this model locally or want to quickly build some applications with it, you can try LLMStack (https://github.com/trypromptly/LLMStack). I wrote last week about how to configure and use Ollama with LLMStack at https://docs.trypromptly.com/guides/using-llama3-with-ollama.

Disclaimer: I'm the maintainer of LLMStack

ajhai commented on Txtai: Open-source vector search and RAG for minimalists   neuml.github.io/txtai/... · Posted by u/dmezzetti
ipsi · a year ago
So here's something I've been wanting to do for a while, but have kinda been struggling to figure out _how_ to do it. txtai looks like it has all the tools necessary to do the job, I'm just not sure which tool(s), and how I'd use them.

Basically, I'd like to be able to take PDFs of, say, D&D books, extract that data (this step is, at least, something I can already do), and load it into an LLM to be able to ask questions like:

* What does the feat "Sentinel" do?

* Who is Elminster?

* Which God(s) do Elves worship in Faerûn?

* Where I can I find the spell "Crusader's Mantle"?

And so on. Given this data is all under copyright, I'd probably have to stick to using a local LLM to avoid problems. And, while I wouldn't expect it to have good answers to all (or possibly any!) of those questions, I'd nevertheless love to be able to give it a try.

I'm just not sure where to start - I think I'd want to fine-tune an existing model since this is all natural language content, but I get a bit lost after that. Do I need to pre-process the content to add extra information that I can't fetch relatively automatically. e.g., page numbers are simple to add in, but would I need to mark out things like chapter/section headings, or in-character vs out-of-character text? Do I need to add all the content in as a series of questions and answers, like "What information is on page 52 of the Player's Handbook? => <text of page>"?

ajhai · a year ago
You can actually do this with LLMStack (https://github.com/trypromptly/LLMStack) quite easily in a no-code way. Put together a guide to use LLMStack with Ollama last week - https://docs.trypromptly.com/guides/using-llama3-with-ollama for using local models. It lets you load all your files as a datasource and then build a RAG app over it.

For now it still uses openai for embeddings generation by default and we are updating that in the next couple of releases to be able to use a local model for embedding generation before writing to a vector db.

Disclosure: I'm the maintainer of LLMStack project

u/ajhai

KarmaCake day980July 15, 2009
About
Hi, I'm Ajay

email: ajay [at] trypromptly [dot] com

twitter: @ajhai

Building https://trypromptly.com and https://makerdojo.io

View Original