Fructose
Fructose is a python package to call LLMs as strongly typed functions. It uses function type signatures to guide the generation and guarantee a correctly typed output, in whatever basic/complex python datatype requested.
By guaranteeing output structure, we believe this will enable more complex applications to be built, interweaving code with LLMs with code. For now, we’ve shipped Fructose as a client-only library simply calling gpt-4 (by default) with json mode, pretty simple and not unlike other packages such as marvin and instructor, but we’re also working on our own lightweight formatting model that we’ll host and/or distribute to the client, to help reduce token burn and increase accuracy.
We figure, no time like the present to show y’all what we’re working on! Questions, compliments, and roasts welcomed.
This strikes a happy medium, where machines are assisting programmers, making them much more productive. Yet the resulting code is understandable as a human has decomposed everything into functions, and also robust as it is formally verified.
I am working on a F# proof-of-concept system like this, there are other alternatives around implemented in Haskell and other languages with varying levels of automation. It is potentially an interesting niche for a startup.
https://www.microsoft.com/en-us/research/publication/program...
I’ve wanted to see the traditional techniques combined with modern ML to sort of drive the search and generation process. Then, we’d still have the advantages of both formal specifications and classic AI (esp traceability). While looking for a synthesis link, I stumbled onto one paper trying to mix the two approaches:
https://ojs.aaai.org/index.php/AAAI/article/download/5048/49...
This is how I use copilot currently, so I might not be following on what part of this is 'future' facing or relevant to this Fructose project?
Not being contrarian, I thought this was an interesting point but as I thought about it more I realized, "wait, they're describing what I already do".
Being able to sometimes answer a given question is perhaps a first step to writing code that can answer that question reliably, but it's a long way from an LLM that does the former to one that does the latter.
"Type-Driven Program Synthesis" by Nadia Polikarpova https://www.youtube.com/watch?v=HnOix9TFy1A
Links to more projects and papers by Prof. Polikarpova: https://cseweb.ucsd.edu/~npolikarpova/
I think this is one of the main projects she discusses in the talk: https://github.com/nadia-polikarpova/synquid
EDIT: meant to mention this too, which I think has been around a bit longer, not that I've ever used it in production: https://ucsd-progsys.github.io/liquidhaskell/
Looking at the prompt templates (https://github.com/bananaml/fructose/tree/main/src/fructose/... ), they use LangChain-esque "just try to make the output to be valid JSON" when APIs such as GPT-4 Turbo which this model uses by default now support function calling/structured data natively and do a very good job of it (https://news.ycombinator.com/item?id=38782678), and libraries such as outlines (https://github.com/outlines-dev/outlines) which is more complex but can better ensure a dictionary output for local LLMs.
The future here really lies in compiling down context free grammars. They let you model json, yml, csv, and other programming languages as finite state machines that can force LLM transitions. They end up being pretty magical: you can force value typing, enums, and syntax validation of multivariate payloads. For use in data pipelines they can't be beat.
I did some experiments a few weeks ago on training models to generate these formats explicitly with jsonformers/outlines. Finetuning in the right format is still important to maximize output. You can end up seeing a 7% lift if you finetune explicitly for your desired format. [^1] At inference time the CFGs will constrain your model to do what it's actually intended to.
[^1]: https://freeman.vc/notes/constraining-llm-outputs
https://platform.openai.com/docs/guides/text-generation/json...
This feels pretty much identical to Marvin? Like the entire API?
From a genuine place of curiosity: I get that your prompts are different, but like why in the name of open source would you just not contribute to these libraries instead of starting your own from scratch?
If you run your own models as a part of it, surely you could hook up your models as a backend to whatever abstractions you’re copying here.
Instead of this:
@ai() def describe(animals: list[str]) -> str: """ Given a list of animals, use one word that'd describe them all. """
it would seem a lot more intuitive to do this:
def describe(animals: list[str]) -> str: return ai("""Given a list of animals, use one word that'd describe them all.""", animals)
Of course, this doesn't really matter at all, and I get that it feels strange. I've just been thinking about grammars and syntax lately, and it's been interesting to now have the vocabulary and mental model to understand these unintuitive things :)
For your suggestion, the decorator would still be required to overload the function execution with the remote call, otherwise you'd just be calling the function body, but we have considered special wrapper return types to help play better with pyright (and also give programmatic access to debug details of the call), but that'd add bloat to the package and subtract from the more native python feel we're aiming for.
Python has an existing convention for this (so its not a "trick"), the use of the special value Ellipsis (literal: ...)
https://mypy.readthedocs.io/en/stable/stubs.html
https://jxnl.github.io/instructor/
Another project I'm excited about in this area is GPTScript, which launched last week: http://github.com/gptscript-ai/gptscript.