Readit News logoReadit News
pplonski86 · 2 years ago
How RestGPT differs from ToolLLM or Gorilla?

papers:

1. ToolLLM: Facilitating Large Language Models to Master 16000+ Real-world APIs https://arxiv.org/abs/2307.16789

2. Gorilla: Large Language Model Connected with Massive APIs https://arxiv.org/abs/2305.15334

mellosouls · 2 years ago
Did you open the linked paper in the OP?

Gorilla is compared;

ToolLLM seems to postdate this project.

two_in_one · 2 years ago
The problem here is where these actions come from. Generic LLM cannot generate correct actions in many (if not most) real life cases. So, it will have to learn, and LLMs aren't good at learning. For example: "I'm tired, play my favorite". The action depends on _who_ is saying and on what's going on right now. There may be someone sleeping, or watching TV. I'm afraid that acceptable solution is much more complicated.
liampulles · 2 years ago
I have investigated use of agents for real support agent type work and the rate of failure made it unacceptable for my use case. This is even after giving it very explicit and finely tuned context.

I suspect that if engineering of LLM solutions utilizes unseen testing data more, it's going to become apparent that it really does not have sufficiently reliable "cognitive" ability to do any practical agent type work.

pplonski86 · 2 years ago
Have you used any benchmark to test agents? I'm currently looking for REST API usage benchmark for LLMs.
pmx · 2 years ago
DO we have to expect _that_ level of understanding from the agent, though? If my wife said that to me, I may have a good chance of queuing up the song she has in mind, but anyone else? No chance. I don't expect tools like this to be able to understand cryptic requests and always come to the right answer. I'm happy if I can request a song or an action, or anything else in the same way i might ask another human who doesn't know me intimately.
swexbe · 2 years ago
If not, how is this more useful than something like Siri?
eternityforest · 2 years ago
Why would we want this at all if it doesn't know you that well? Current voice assistants without AI can already handle songs and actions like that. Seems like it's largely solved.
pplonski86 · 2 years ago
I think this can be easily fixed, if LLM can do notes on what's going on. If it will have additional context before the prompt:

```

You are home assistant. Here is information what's going on in the house:

It's 4PM. Bob likes Chopin Fantaisie-Impromptu. Alice likes Mozart Rondo in D. Bob is in the house. Alice will be back from office at 5PM.

You get a prompt: I'm tired, play my favorite

```

For the above input any LLM will play Chopin.

troupo · 2 years ago
Where is that input coming from?
regularfry · 2 years ago
I'm genuinely not seeing a problem there that the Planner part of the paper couldn't cover. "Who said that" and "what's going on right now" are just API calls. Besides which, if one person says "play my favourite" while another person is watching TV, that's not the LLM's job to unpack.

The point is that the ability to call APIs gives them the ability to learn so that the actions that are eventually taken are correct in context. It's like a more generic version of https://code-as-policies.github.io/.

powerapple · 2 years ago
hopefully it can be solved with the target API, the target API knows who is calling this API, the service has user information. Or this will be translated into "Play the most played playlist", and the action will be enough.

I agree with you in general though, more useful AI is, more data it will need to see. I strongly believe companies like Microsoft, Google or Apple will bring the best experience because they own operating systems. It is going to be very hard for a third party to build a general AI assistant.

jsemrau · 2 years ago
The whole notion of "memory" in LLM research solves this problem.
selalipop · 2 years ago
> So, it will have to learn, and LLMs aren't good at learning

LLMs are bad at human-like learning, but their zero-shot performance + semantic search more than make up for it.

If you give an LLM access to your Spotify account via an API, it has access to your playlists and access to details about each song like `BPM`, `vocality`, even `energy` :

https://developer.spotify.com/documentation/web-api/referenc...https://developer.spotify.com/documentation/web-api/referenc...

An LLM with no prior explanation of either endpoint, can figure out that it should look at your favorites playlists, and find which songs in your favorite list are most suitable for a tired person.

-

But it can go even further and identify its own sorting criteria for different situations with chain of thought:

Bedroom at night: https://chat.openai.com/share/6b1787ef-cd84-4834-b582-5024f8... Kitchen at 5pm: https://chat.openai.com/share/7ddaa047-0855-48c1-bcea-308083...

Rather than blindly selecting the most relaxing songs it understands nuance like:

> Room State: "lights on" and "garage door open" can imply either returning home from work or engaging in some evening activity. The environment is probably not yet set for relaxation completely.

And genuinely comes up with an intelligently adapted strategy based on the situation -

And say it gets your favorite wrong, and you correct it: an LLM with no specialized training can classify your follow up as a correction vs an unrelated command. It can even use chain-of-thought to posit why it may have been wrong.

You can then store all messages it classified as corrections and fetch those using semantic similarity.

That addresses both the customization and determinism issues: You don't need to rely on the zero-shot performance getting it right every time, the model can use the same chain of thought to translate past corrections into future guidance without further training.

For example, if your last correction was from classical music to hard metal when you got back from work, it's able to understand that you prefer higher energy songs, but still able to understand that doesn't mean every time in perpetuity it should play hard metal

Kitchen w/ memory: https://chat.openai.com/share/43635427-55d5-4394-b282-46acae... Bedroom w/ memory: https://chat.openai.com/share/8c146dd5-2233-4aba-8f6a-b97b7a...

-

I experimented heavily with things like this when GPT came out; part of me wants to go back to it since I've seen shockingly few projects do what I assumed everyone would do.

LLMs + well thought out memory access can do some incredible things as general assistants right now, but that seemed so obvious I moved on from the idea almost immediately.

In retrospect, there's an interesting irony at play: LLMs make simple products very attractive. But if you embed them in more thoroughly engineered solutions, you can do some incredible things that are far above what they otherwise seem capable of.

Yet a large number of the people most experienced in creating thoroughly engineered solutions view LLMs very cynically because of the simple (and shallow) solutions that are being churned out.

Eventually LLMs may just advanced far enough that they bridge the gap in implementation, but I think there's a lot of opportunity left on the table because of that catch-22

troupo · 2 years ago
> Yet a large number of the people most experienced in creating thoroughly engineered solutions view LLMs very cynically because of the simple (and shallow) solutions that are being churned out.

Maybe, just maybe, because even simple solutions are invariably an incomplete brittle complicated unpredictable mess that you can't use to build anything complex with?

As eloquently demonstrated by your "simple" solutions

westurner · 2 years ago
"Gorilla: Large Language Model Connected with Massive APIs" (2023) https://gorilla.cs.berkeley.edu/ :

> Gorilla enables LLMs to use tools by invoking APIs. Given a natural language query, Gorilla comes up with the semantically- and syntactically- correct API to invoke. With Gorilla, we are the first to demonstrate how to use LLMs to invoke 1,600+ (and growing) API calls accurately while reducing hallucination. We also release APIBench, the largest collection of APIs, curated and easy to be trained on! Join us, as we try to expand the largest API store and teach LLMs how to write them!

eval/: https://github.com/ShishirPatil/gorilla/tree/main/eval

- "Gorilla: Large Language Model connected with massive APIs" (2023-05) https://news.ycombinator.com/item?id=36073241

- "Gorilla: Large Language Model Connected with APIs" (2023-06) https://news.ycombinator.com/item?id=36333290

- "Gorilla-CLI: LLMs for CLI including K8s/AWS/GCP/Azure/sed and 1500 APIs (github.com/gorilla-llm)" (2023-06) https://news.ycombinator.com/item?id=36524078

behnamoh · 2 years ago
It seems after 1-2 years that the true power of LLMs is in DevOps. I got pretty excited when I tried GPT-3 (completion model), but as time went by and OpenAI shifted to chat models, we lost control over the LLM part and found new meaning in taking whatever model OpenAI made available as a blackbox and "chained" it to other tools we already had, like data bases, APIs, function calls/tools, etc. I'd say DevOps is exactly where open source is seriously behind; there are decent open source models but it costs so much to self host them, despite the full power and control we have on them (via text generation webui and the like).

OpenAI is playing the DevOps game (starting maybe with introduction of ChatML). Open source community plays the LLM and benchmarks game. Ironically, the two are converging, meaning that OpenAI's models are getting dumber (not the API) thanks to censorship and RLHF, to the point that open source models are even better than some OpenAI models in some aspects. On the other hand, open source models are getting better tooling and DevOps thanks to oobabooga, llama.cpp, etc.

I'm seriously waiting for competitors to change nVidia's monopoly in this space. Maybe Apple?

antupis · 2 years ago
I think currently M2 max is best bang for buck running interface in open source model. But use case is so niche that Apple probably doesn't actively start supporting open source models. In the long run I hope some smaller company gets shit together and starts competing with NVIDIA.
rankun203 · 2 years ago
The GPU support in ML frameworks however is really not impressive. I have a Macbook with M1 Max 64G RAM, I can load a 7b model for fine-tuning (Huggingface Trainer, Pytorch, MPS), but the speed is just too slow, can only reach to 50% the speed of an i5-12500 CPU in my tests.
behnamoh · 2 years ago
At $6,000, how is M2 Max the best bang for the buck?!

One could get two used 3090s and setup a decent PC at lower prices.

sam_goody · 2 years ago
> I'm seriously waiting for competitors to change nVidia's monopoly in this space. Maybe Apple?

I would have thought AMD is the obvious contender. They are #2 in GPU's, they have formidable programming talent (based on their advances with Ryzen vs Intel) and they have targeted AI as their goal.

Am I missing something?

Philpax · 2 years ago
AMD have repeatedly dropped the ball when it comes to software support for compute and AI. Their hardware is quite capable, but very few people can actually make it work, which means most of the existing models have poor AMD support.

This is getting better with ROCm and such, but that's Linux-only and only works for a subset of tasks.

Both Intel and Apple have better "out of the box" support for ML and the ability to invest more into making these things work (e.g. Apple have implemented Stable Diffusion against Core ML themselves)

stevage · 2 years ago
> WARNING: this will remove all your data from spotify!

That is quite the caveat.

seanthemon · 2 years ago
I feel like that script needs a few 'are you completely sure?' Prompts
Towaway69 · 2 years ago
thanks for pointing that out!
salamo · 2 years ago
It's actually really interesting to see GOFAI techniques like planning used in conjunction with LLMs.
cscurmudgeon · 2 years ago
I see a GOFAI resurgence thanks to LLMs.

Deleted Comment

impulser_ · 2 years ago
The examples are pretty lame since you can do what the examples do way faster without using a LLM and paying OpenAI.
albert_e · 2 years ago
ChatGPT + Noteable is already powerful to get some work done via API calls (after installing and importing the libraries, writing Python code, managing secrets for authentication etc)

There is surely scope to streamline this much further

I am very intently watching this space

thelittleone · 2 years ago
Interested to learn more (big fan of data stories). Do you have any particular use cases you would recommend to look into?
pplonski86 · 2 years ago
I've seen Noteable+ChatGPT demo, where user can chat with ChatGPT and responses where executed in the Noteable-hosted Python notebook. It was cool!

It would be also cool to have such plugin for Google Colab.

I hope someone will come with a new way to interact with LLM models other than chat UI. It would make code writing even more faster.