Readit News logoReadit News
verdverm · 3 years ago
I don't see the value add here.

Here's the core of the message sent to the LLM: https://github.com/microsoft/TypeChat/blob/main/src/typechat...

You are basically getting a fixed prompt to return structured data with a small amount of automation and vendor lockin. All these LLM libraries are just crappy APIs to the underlying API. It is trivial to write a script that does the same and will be much more flexible as models and user needs evolve.

As an example, think about how you could change the prompt or use python classes instead. How much work would this be using a library like this versus something that lifts the API calls and text templating to the user like: https://github.com/hofstadter-io/hof/blob/_dev/flow/chat/llm...

bwestergard · 3 years ago
The value is in:

1. Running the typescript type checker against what is returned by the LLM.

2. If there are type errors, combining those into a "repair prompt" that will (it is assumed) have a higher likelihood of eliciting an LLM output that type checks.

3. Gracefully handling the cases where the heuristic in #2 fails.

https://github.com/microsoft/TypeChat/blob/main/src/typechat...

In my experience experimenting with the same basic idea, the heuristic in #2 works surprisingly well for relatively simple types (i.e. records and arrays not nested too deeply, limited use of type variables). It turns out that prompting LLMs to return values inhabiting relatively simple types can be used to create useful applications. Since that is valuable, this library is valuable inasmuch as it eliminates the need to hand roll this request pattern, and provides a standardized integration with the typescript codebase.

BoorishBears · 3 years ago
Here's a project that does that better imo:

https://github.com/dzhng/zod-gpt

And by better I mean doesn't tie you to OpenAI for no good reason

verdverm · 3 years ago
these are trivial steps you can add in any script, as your link demonstrates.

Why would I want to add all this extra stuff just for that? The opaque retry until it returns valid JSON? That sounds like it will make for many pleasant support cases or issues

Personally, I have found investing more effort in the actual prompt engineering improves success rates and reduces the need to retry with an appended error message. Especially helpful are input/output pairs (i.e. few-shot) and while we haven't tried it yet, I imagine fine-tuning and distillation would improve the situation even more

politelemon · 3 years ago
Pretty much all the LLM libraries I'm seeing are like this. They boil down to a request to the LLM to do something in a certain way. I've noticed under complex conditions, they stop listening and start reverting to their 'default' behavior.

But that said it still feels like using a library is the right thing to do... so I'm still watching this space to see what matures and emerges as a good-enough approach.

TechBro8615 · 3 years ago
Where's the vendor lock-in? This is an open source library and the file you linked to even includes configs for two vendors: ChatGPT and Bard.
verdverm · 3 years ago
vendor lock in to a library and the design choices they make

basically, since it reduces the user input space, you are giving up flexibility and control for some questionably valuable abstractions, such as a predefined prompt, no ability to prompt engineer, CoT/ToT, etc...

if anything, choose a broader framework like langchain and have something like this an extension or plugin to the framework, no need for a library for this one little thing

parentheses · 3 years ago
The value is turn unstructured data into structured data and ensure it satisfies schema constraints.

For example: you have 1000 free-text survey responses about your product, building a schema and for-each `TypeChat`ing them would get you a dataset for that free-text. It's mind-bogglingly useful.

verdverm · 3 years ago
yes, turning unstructured data into structured data is one of the most useful ways to use an LLM right now. It has been done before with using schemas and could be done without all the extra cruft.

There was a similar example a few months back using XML instead, but I haven't heard much about it since, because again, the library did not add value on top of doing these things in a more open or scripted setting.

MSFT has another project in similar vain, guardrails, interesting idea, but made worse by wrapping it in a library. Most of these LLM ideas are better as a function than a library, make them transform the i/o rather than every library needing to write wrappers around the LLM APIs as well

There are several more making use of OpenAPI / JSONSchema rather than TS.

We use a subset of CUE, essentially JSON without as many quotes or commas. The LLMs are quite flexible with few-shot learning. They can be made more reliable with fine-tuning. They can be made faster and cheaper with distillation.

whimsicalism · 3 years ago
Yes as the abstractions gets better it becomes easier to code useful things.
verdverm · 3 years ago
the debate is about how valuable the abstraction here is to warrant a library, and the fact that it predefines the prompt and api call flow, so you cannot prompt engineer or use something like CoT/ToT
ofslidingfeet · 3 years ago
Getting these models to reliably return a consistent structure without frequent human intervention and/or having to account for the personal moral opinions of big tech CEOs is not trivial, no.
verdverm · 3 years ago
There are multiple ways to get structured output, and what this library is doing is not really that interesting. The concept is interesting and has had multiple implementations already, the code (and abstraction) here is not interesting and creates more issues than it solves
nfw2 · 3 years ago
It’s essentially prompt engineering as a service with some basic quality-control features thrown in.

Sure, your engineers could implement it themselves, but don’t they have better things to do?

verdverm · 3 years ago
the quality of the prompt does not look that good from my experience reaching flexible structured output based on a schema

There are other questionable decisions and a valuable use of engineering time is indeed to evaluate candidate abstractions and think about the long-term cost of adopting them. In this case, it does not seem like it saves that much effort and in the long run means a lot of important LLM knobs are out of your control. Not a good tradeoff

quickthrower2 · 3 years ago
You can probably define the python language grammar as a typescript type though!
andy_xor_andrew · 3 years ago
Here's one thing I don't get.

Why all the rigamarole of hoping you get a valid response, adding last-mile validators to detect invalid responses, trying to beg the model to pretty please give me the syntax I'm asking for...

...when you can guarantee a valid JSON syntax by only sampling tokens that are valid? Instead of greedily picking the highest-scoring token every time, you select the highest-scoring token that conforms to the requested format.

This is what Guidance does already, also from Microsoft: https://github.com/microsoft/guidance

But OpenAI apparently does not expose the full scores of all tokens, it only exposes the highest-scoring token. Which is so odd, because if you run models locally, using Guidance is trivial, and you can guarantee your json is correct every time. It's faster to generate, too!

zarzavat · 3 years ago
It’s like the story of the brown M&Ms[0]. If the model is returning semantically correct data, you would hope that it can at least get the syntax correct. And if it can’t then you ought to throw the response away anyway.

Also I believe that such a method cannot capture the full complexity of TypeScript types.

[0] https://www.snopes.com/fact-check/brown-out/

tonyonodi · 3 years ago
That's a great analogy! I'd been wondering for a while whether that's a problem with this approach; to be honest I still don't know whether it is, so it would be good to see someone test it empirically.
rolisz · 3 years ago
> when you can guarantee a valid JSON syntax by only sampling tokens that are valid? Instead of greedily picking the highest-scoring token every time, you select the highest-scoring token that conforms to the requested format.

Yes, you can guarantee a syntactically correct JSON that way, but will it be a semantically correct? If the model really really really wanted to put another token there, but you are forcing it to put a {, maybe the following generated text won't be as good.

I'm not sure, I'm just wondering out loud.

geysersam · 3 years ago
Well, if the output doesn't conform to the format it's useless. If the model can't produce good and correct output then it's simply not up to the task.
donfotto · 3 years ago
I agree that sampling only valid tokens is a very promising approach.

I experimented a bit with finetuning open source LLMs for JSON parsing (without guided token sampling). Depending on one's use case, 70B parameters might be an overkill. I've seen promising results with much much smaller models. Finetuning a small model combined with guided token sampling would be interesting.

Then again, finetuning is perhaps not perfect for very general applications. When you get input that you didn't anticipate in your training dataset, you're in trouble.

csomar · 3 years ago
The LLM will be able to handle more complex scenarios. I could imagine a use-case: If you are ordering from a self-vending machine, instead of having to go through the whole process you just say your order out loud. You can say, for example, a couple chocolate bars and the LLM tries to guess from inventory.

Of course, if you are on the web, it makes no sense. It is much easier to use the mouse to click on a couple of items.

Scaevolus · 3 years ago
Llama.cpp recently added grammar based sampling, which constraints token selection to follow a rigid format like you describe.

https://github.com/ggerganov/llama.cpp/pull/1773

CGamesPlay · 3 years ago
OpenAI doesn’t expose this information because it makes it vastly easier to train your model off theirs.
paxys · 3 years ago
I swear I think of something and Anders Hejlsberg builds it.

Structured requests and responses are 100% the next evolution of LLMs. People are already getting tired of chatbots. Being able to plug in any backend without worrying about text parsing and prompts will be amazing.

unshavedyak · 3 years ago
> Structured requests and responses are 100% the next evolution of LLMs. People are already getting tired of chatbots. Being able to plug in any backend without worrying about text parsing and prompts will be amazing.

Yup, a general desire of mine is to locally run an LLM which has actionable interfaces that i provide. Things like "check time", "check calendar", "send message to user" and etc.

TypeChat seems to be in the right area. I can imagine an extra layer of "fit this JSON input to a possible action, if any" and etc.

I see a neat hybrid future where a bot (LLM/etc) works to glue layers of real code together. Sometimes part of ingestion, tagging, etc - sometimes part of responding to input, etc.

All around this is a super interesting area to me but frankly, everything is moving so fast i haven't concerned myself with diving too deep in it yet. Lots of smart people are working on it so i feel the need to let the dust settle a bit. But i think we're already there to have my "dream home interface" working.

psyphy · 3 years ago
I just published CopilotKit, which lets you implement this exact functionality for any web app via react hooks.

`useMakeCopilotActionable` = you pass the type of the input, and an arbitrary typescript function implementation.

https://github.com/RecursivelyAI/CopilotKit

Feedback welcome

sdwr · 3 years ago
I was thinking about this yesterday. ChatGPT really is good enough to act as a proper virtual assistant / home manager, with enough toggles exposed.
paragraft · 3 years ago
Tell me about it - I implemented this just yesterday except with a focus on functions rather than objects.
_the_inflator · 3 years ago
This as a dynamic mapper in a backend layer can be huge.

For example, try to keep up with (frequent) API payload changes around a consumer in Java. We implemented a NodeJS layer just to stay sane. (Banking, huge JSON payloads, backends in Java)

Mapping is really something LLMs could shine.

tylerrobinson · 3 years ago
It could shine, or it could be an absolute disaster.

Code/functionality archeology is already insanely hard in orgs with old codebases. Imagine the facepalming that Future You will have when you see that the way the system works is some sort of nondeterministic translation layer that magically connects two APIs where versions are allowed to fluctuate.

sidnb13 · 3 years ago
sidnb13 · 3 years ago
dvt · 3 years ago
This is my hot take: we're slowly entering the "tooling" phase of AI, where people realize there's no real value generation here, but people are so heavily invested in AI, that money is still being pumped into building stuff (and of course, it's one of the best way to guarantee your academic paper gets published). I mean, LangChain is kind of a joke and they raised $10M seed lol.

DeFi/crypto went through this phase 2 years ago. Mark my words, it's going to end up being this weird limbo for a few years where people will slowly realize that AI is a feature, not a product. And that its applicability is limited and that it won't save the world. It won't be able to self-drive cars due to all the edge cases, it won't be able to perform surgeries because it might kill people, etc.

I keep mentioning that even the most useful AI tools (Copilot, etc.) are marginally useful at best. At the very best it saves me a few clicks on Google, but the agents are not "intelligent" in the least. We went through a similar bubble a few years ago with chatbots[1]. These days, no one cares about them. "The metaverse" was much more short-lived, but the same herd mentality applies. "It's the next big thing" until it isn't.

[1] https://venturebeat.com/business/facebook-opens-its-messenge...

JSavageOne · 3 years ago
Hard disagree on AI being just a bubble with limited applicability.

> It won't be able to self-drive cars due to all the edge cases, it won't be able to perform surgeries because it might kill people, etc.

You literally just cherry-picked the most difficult applications of AI. The vast majority of peoples' jobs don't involve life or death, and thus are ripe for automation. And even if the life or death jobs retain a human element, they will most certainly be augmented by AI agents. For example a surgery might still be handled by a human, but it will probably become mandatory for a doctor or nurse to diagnose a patient in conjunction with an AI.

> We went through a similar bubble a few years ago with chatbots

Are you honestly comparing that to now? ChatGPT got to 100 million users in a few months and everyone and their grandma has used it. I wasn't even aware of any chatbot bubble a few years ago, it certainly wasn't that significant.

> even the most useful AI tools (Copilot, etc.) are marginally useful at best

Sure, but you're literally seeing them in their worst versions. ChatGPT has been a life-changer for me, and it doesn't even execute code yet (Code Interpreter does though, which I haven't tested yet)

By 2030 humans probably won't be typing code anymore, it'll just be prompting machines and directing AI agents. By then most peoples' jobs will also be automated.

AI isn't just some fad, it's going to change literally every industry, and way faster than people think. The cynicism here trying to dismiss the implications of AI by comparing it to the metaverse are just absurd and utterly lacking in imagination. Yes there is still a lot of work that needs to be done, specifically in the AI agent side of things, but we will get there, probably way faster than people realize, and the implications are enormous.

hnlmorg · 3 years ago
> By 2030 humans probably won't be typing code anymore, it'll just be prompting machines and directing AI agents. By then most peoples' jobs will also be automated.

Eventually, perhaps. But by 2023? Definitely not.

I think both you and the GP are at opposite ends of the extreme and the reality is somewhere in that gulf in between

Deleted Comment

coffeemug · 3 years ago
When I use ChatGPT I feel like I'm looking at a different technology than other people. It's supposed to be able to answer every question and teach me anything, but in practice it turns out to be a content-farm-as-a-service (CFaaS?) Copilot is similar, it's usually easier for me to write the code than iterate through it to find the least bad example and then fix the bugs.

That said, AlphaGo went from "hallucinating" bad moves to the best player in the world in a fairly short period of time. If this is at all doable for language models, GPT-x may blow all this out of the water.

dvt · 3 years ago
> That said, AlphaGo went from "hallucinating" bad moves to the best player in the world in a fairly short period of time. If this is at all doable for language models, GPT-x may blow all this out of the water.

I think the state space when looking at something like Go v. natural language (or even formal languages like programming languages or first/second order logic) is not even remotely comparable. The number of states in Go is 3^361. The number of possible sentences in English, while technically infinite, has some sensible estimates (Googling shows the relatively tame 10^570 figure).

dwaltrip · 3 years ago
> we're slowly entering the "tooling" phase of AI, where people realize there's no real value generation here

Hard disagree. A very clear counterexample from my usage:

Gpt-4 is phenomenal at helping a skilled person work on tangential tasks where their skills generally translate but they don’t have strong domain knowledge.

I’ve been writing code for a decade, and recently I’ve been learning some ML for the first time. I’m using gpt-4 everyday and it’s been a delight.

To be fair, I can see one might find the rough edges annoying on occasion. For me, it’s quite manageable and not much of a bother. I’ve gotten better at ignoring or working around them. There is definitely an art to using these tools.

I expect the value provided to continue growing. We haven’t plucked all of the low-hanging or mid-hanging fruit yet.

I can share chat transcripts if you are interested.

phillipcarter · 3 years ago
> DeFi/crypto went through this phase 2 years ago.

A key difference is that these things, no matter how impressive their technical merits, required people to completely reshape whatever they were doing to get the first bit of benefit.

Modern AI (and really, usually LLMs) has immediate and broad applicability across nearly every economic sector, and that's why so many of us are already building and releasing features with it. There's incredible value in this stuff. Completely world-changing? No. But enough to create new product categories and fundamentally improve large swaths of existing product capabilities? Absolutely.

notRobot · 3 years ago
I feel like this is actually a very sensible take. AI has many uses, and it can be really good at some things, but it's not the hail mary it's being treated as.
ploppyploppy · 3 years ago
Your analysis is based on what's possible now. This is the worst it'll ever be.
bottlepalm · 3 years ago
How does no voice assistant (Apple, Google, Amazon, Microsoft) integrate LLMs into their service yet, and how has OpenAI not released their own voice assistant?

Also like RSS, if there were some standard URL a websites exposed for AI interaction, using this TypeChat to expose the interfaces, we'd be well on our way here.

dbish · 3 years ago
OpenAI is pretty likely working on their own (see Kaparthy's "Building a kind of JARVIS @ OреոΑӏ"), and Microsoft of course is doing an integration or reinterpretation of Cortana with OpenAI's LLMS (since they are incapable of building their own models nowadays it seems - "Why do we have Microsoft Research at all?”-S.N.), but there's a lot less value in voice driven LLM then there is in actually being able to perform actions. Take Alexa for example, you need a system that can handle smart home control in a predictable, debuggable, way otherwise people would get annoyed. I definitely think you can do this, but the current system as built (and others like Siri and to a lesser use Cortana) all have a bunch of hooks and APIs being used by years and years of rules and software built atop less powerful models. They need to both maintain the current quality and improve on it while swapping out major parts of their system in order to make this work, which takes time.

Not to mention that none of these assistants actually make any money, they all lose money really, and are only worthwhile to big companies with other ways to make cash or drive other parts of their business (phones, shopping, whatever), so there's less incentive for a startup to do it.

I worked on both Cortana and Alexa in the past, thought a lot about trying to build a new version of them ground up with the LLM advancements, and while the tech was all straight forward and even had some new ideas for use cases that are enabled now, could not figure out a business model that would work (and hence, working on something completely different now).

bottlepalm · 3 years ago
It's July, they just needed to put a voice interface on ChatGPT, it'd easily help them sell more pro licenses as well. I'm not a conspiracy person, but this just seems so obvious it feels like there's something else going on here.
nonethewiser · 3 years ago
> How does no voice assistant (Apple, Google, Amazon, Microsoft) integrate LLMs into their service yet

When I first learned what ChatGPT was my thought was "oh so like what Siri is supposed to be."

perryizgr8 · 3 years ago
Talking to Alexa is laughable now, after having interacted with ChatGPT and Bing. It's so frustrating to see capable hardware being let down by crappy software for years upon years.
zitterbewegung · 3 years ago
Microsoft is doing that to replace Cortana in windows 11
nathan_f77 · 3 years ago
I'm really looking forward to something that I can use to control Home Assistant. I'm just really nervous about using any cloud-based API for this, so I would like to get something running on a server in my own house. But I would also want the voice recognition and response times to be extremely fast so I don't feel like I'm ever waiting for anything. I've seen a few DIY attempts at a personal assistant but there's always a significant delay that would become very annoying if I used it regularly.
9dev · 3 years ago
Seriously, it feels like there’s some collusion going on behind the scenes. This is the most obvious use case for the technology, but none of the big vendors have explored it.
jomohke · 3 years ago
It takes a while to develop a product, and the world only woke up to them mere months ago
mavamaarten · 3 years ago
I think it's because it turns out that taming a generative language model is really difficult. It's what we need to support more than some hardcoded simple questions, but companies like Google who are known for search want to keep their image of "use us to find what you're looking for". In the current state, their models (especially Bard in my experience) simply return bullshit and want to sound confident. They need to get beyond that stage.

But I feel you. My Google Assistant doesn't even seem to look for answers to questions anymore. All I get, even for simple queries, is a "sorry, I don't understand".

COGlory · 3 years ago
Willow, and the Willow Interference Server have the option to use Vicuna with speech input and TTS
joefreeman · 3 years ago
> It's unfortunately easy to get a response that includes { "name": "grande latte" }

    type Item = {
        name: string;
        ...
        size?: string;
I'm not really following how this would avoid `name: "grande latte"`?

But then the example response:

    "size": 16
> This is pretty great!

Is it? It's not even returning the type being asked for?

I'm guessing this is more of a typo in the example, because otherwise this seems cool.

DanRosenwasser · 3 years ago
Whoops - thanks for catching this. Earlier iterations of this blog post used an different schema where `size` had been accidentally specified as a `number`. While we changed the schema, we hadn't re-run the prompt. It should be fixed now!
graypegg · 3 years ago
Their example here is really weak overall IMO. Like more than just that typo. You also probably wouldn’t want a “name” string field anyway. Like there’s nothing stoping you from receiving

    {
        name: “the brown one”,
        size: “the espresso cup”,
    … }
Like that’s just as bad as parsing the original string. You probably want big string union types for each one of those representing whatever known values you want, so the LLM can try and match them.

But now why would you want that to be locked into the type syntax? You probably want something more like Zod where you can use some runtime data to build up those union types.

You also want restrictions on the types too, like quantity should be a positive, non-fractional integer. Of course you can just validate the JSON values afterwards, but now the user gets two kinds of errors. One from the LLM which is fluent and human sounding, and the other which is a weird technical “oops! You provided a value that is too large for quantity” error.

The type syntax seems like the wrong place to describe this stuff.

mynameisvlad · 3 years ago
I feel like that's just a documentation bug. I'm guessing they changed from number of ounces to canonical size late in the drafting of the announcement and forgot to change the output value to match.

There would be no way for a system to map "grande" to 16 based on the code provided, and 16 does not seem to be used anywhere else.

hirsin · 3 years ago
The rest of the paragraph discusses "what happens when it ignores type?", so I think that's where they were going with that?
33a · 3 years ago
Looks like it just runs the LLM in a loop until it spits out something that type checks, prompting with the error message.

This is a cute idea and it looks like it should work, but I could see this getting expensive with larger models and input prompts. Probably not a fix for all scenarios.

babyshake · 3 years ago
At least with OpenAI, wouldn't it be better if under the hood it was using the new function call feature?
akavi · 3 years ago
Typescript's type system is much more expressive than the one the function call feature makes available.

I imagine closing the loop (using the TS compiler to restrict token output weights) is in the works, though it's probably not totally trivial. You'd need:

* An incremental TS compiler that could report "valid" or "valid prefix" (ie, valid as long as the next token is not EOF)

* The ability to backtrack the model

Idk how hard either one piece is.

osaariki · 3 years ago
I'm not familiar with how TypeChat works, but Guidance [1] is another similar project that can actually integrate into the token sampling to enforce formats.

[1]: https://github.com/microsoft/guidance

J_Shelby_J · 3 years ago
It’s logit bias. You don’t even need another library to do this. You can do it with three lines of python.

Here’s an example of one of my implementations of logit bias.

https://github.com/ShelbyJenkins/shelby-as-a-service/blob/74...

behnamoh · 3 years ago
except that guidance is defunct and is not maintained anymore.
SkyPuncher · 3 years ago
I suspect most products are concerned about product-market fit then they can wrangle costs down.

There's also a good assumption that models will be improving structured output as the market is demanding it.

garrett_makes · 3 years ago
I built and released something really similar to this (but smaller scope) for Laravel PHP this week: https://github.com/adrenallen/ai-agents-laravel

My take on this is, it should be easy for an engineer to spin up a new "bot" with a given LLM. There's a lot of boring work around translating your functions into something ChatGPT understands, then dealing with the response and parsing it back again.

With systems like these you can just focus on writing the actual PHP code, adding a few clear comments, and then the bot can immediately use your code like a tool in whatever task you give it.

Another benefit to things like this, is that it makes it much easier for code to be shared. If someone writes a function, you could pull it into a new bot and immediately use it. It eliminates the layer of "converting this for the LLM to use and understand", which I think is pretty cool and makes building so much quicker!

None of this is perfect yet, but I think this is the direction everything will go so that we can start to leverage each others code better. Think about how we use package managers in coding today, I want a package manager for AI specific tooling. Just install the "get the weather" library, add it to my bot, and now it can get the weather.

jasongill · 3 years ago
Starred this as I've been working on a similar but maybe more broader scoped approach, but I think some of your ideas are really slick!