It takes all the core relational operators and makes an easy semantic version of each as a python dataframe library extension . Each call ends up being a 'model' point in case you also want to do fancier things later like more learning based approaches. Afaict, snowflake and friends are moving in this direction for their cloud SQLs as well.
We ended up doing something similar for louie.ai , where you use AI notebooks/dashboards/APIs (ex: MCP) to talk to your data (splunk, databricks, graph db, whatever), and it'll figure out symbolic + semantic operators based on the context. Super helpful in practice.
My 80% case here is:
- semantic map: "get all the alerts from splunk index xyz, add a column flagging anything suspicious and another explaining why" <--- generates an enriched dataframe
- semantic map => semantic reduce: "... then summarize what you found" <--- then tells you about it in natural text
I think it's interpreting the command as "replace each fruit with a vegetable", and it might intuit "make the resulting vegetables unique from one another" but otherwise it's not trying to find the "most similar" vegetable to every fruit or anything like that.
That's gonna be a very, very, long answer. What's funny is that not much changed since 2022 (eoy) when the projected started; the models just got better, but we had a good chunk of the primitives since gpt-3.
What's more recent is the DbC contribution which I think is unique. It literally solved anything agent related I've thrown at it -- especially because I can chain contracts together and the guardrails propagate nicely.
I've built most the custom tools myself. For instance, not only perplexity was rendered useless by openai's web search, but openai's web search itself is not good enough for what you can customize yourself. To this end, I've built my own deep research agent. Here's a thread with some results from the first day it was working: https://x.com/futurisold/status/1931751644233945216
I'm also running a company, and we've built an e2e document generation just from contracts (3 contracts chained together in this case). Here's an output (sorry about the PDF rendering, that's not what we serve, it's just what I've quickly hacked for local dev): https://drive.google.com/file/d/1Va7ALq_N-fTYeumKhH4jSxsTrWD...
This was the input:
---
Prompt:
> I want the files to be analyzed and I am interested in finding patterns; feel free to make suggestions as well. I want to understand how different providers use their system prompts, therefore things like: what kind of tags do they use – are they XML, markdown, etc, are they prone toward sycophancy or trying to manipulate the user, are they using tools and if so how, etc. I want the tech report to deconstruct and synthesize and compare the information, find interesting patterns that would be hard to spot.
Generated instructions:
(a) Query:
Conduct a comparative analysis of system prompts across major AI providers (OpenAI, Google, Anthropic, xAI, etc.) to identify structural patterns, linguistic frameworks, and operational constraints that shape AI behavior and responses.
(b) Specific Questions:
1. What syntactic structures and formatting conventions (XML, markdown, JSON, etc.) are employed across different AI system prompts, and how do these technical choices reflect different approaches to model instruction?
2. To what extent do system prompts encode instructions for deference, agreeability, or user manipulation, and how do these psychological frameworks vary between commercial and research-focused models?
3. How do AI providers implement and constrain tool usage in their system prompts, and what patterns emerge in permission structures, capability boundaries, and function calling conventions?
4. What ethical guardrails and content moderation approaches appear consistently across system prompts, and how do implementation details reveal different risk tolerance levels between major AI labs?
5. What unique architectural elements in specific providers' system prompts reveal distinctive engineering approaches to model alignment, and how might these design choices influence downstream user experiences?
One last comment here on contracts; an excerpt from the linked post I think it's extremely relevant for LLMs, maybe it triggers an interesting discussion here:
"The scope of contracts extends beyond basic validation. One key observation is that a contract is considered fulfilled if both the LLM’s input and output are successfully validated against their specifications. This leads to a deep implication: if two different agents satisfy the same contract, they are functionally equivalent, at least with respect to that specific contract.
This concept of functional equivalence through contracts opens up promising opportunities. In principle, you could replace one LLM with another, or even substitute an LLM with a rule-based system, and as long as both satisfy the same contract, your application should continue functioning correctly. This creates a level of abstraction that shields higher-level components from the implementation details of underlying models."
Very cool. Being able to use semantic (as opposed to syntactic) operators like `==`, `+`, etc. feels like fertilizer for some novel ideas. Sort of like when word embeddings first came out and there was a loose concept algebra introduced with it ("King - Man + Woman = Queen").
That said the neuro + symbolic integration here is, like most systems, pretty shallow/firewalled (taxonomically, Type 3 / Neuro;Symbolic — https://harshakokel.com/posts/neurosymbolic-systems). I think the real magic is going to come when we start heading toward a much more fundamental integration. We're actually working on this at my company (https://onton.com). How do we create a post-LLM system that:
1) features an integrated representation (neither purely symbolic nor dense floating point matrix); 2) can learn incrementally from small amounts of noisy data, without being subject to catastrophic forgetting; 3) can perform mathematical and other symbolic operations with bulletproof reliability; and 4) is hallucination-free?
The cobbling together of existing systems hot-glue style is certainly useful, but I think a unified architecture is going to change everything.
I didn't expect this -- I was supposed to be sleeping now, but I guess I'll chat with whoever jumps in! Good thing I've got some white nights experience.
We hear you. We might end up renaming it. In the paper we have a footnote about the name choice -- it's meant to credit the foundational work of Newell and Simon that inspired this project.
I'd appreciate it! It's cool and I wish you success. Just hope that when someone says "We're using Symbolic AI" a year from now, it won't be even more ambiguous than today :D
One question, OP, how does cost for this work? Do you pay the LLM inference cost (quite literally if using an external API) every time you run a line that involves natural language computation? E.g. what happens if you call a "symbolic" function in a loop.
Yes, that's correct. If using say openai, then every semantic ops are API calls to openai. If you're hosting a local LLM via llama.cpp, then obviously there's no inference cost other than that of hosting the model.
Since code is generated by LLM these days, how does specific syntactic constructs like a Symbol which essentially carries the context and can be manipulated with python operators help when compared to a normal python code generated by LLM with all the checks and balances instructed by a human?
For example, I can write in this syntax to convert all fruits to vegetables or I can simply prompt an LLM to construct a program that takes a list of fruits and calls a LLM in the background to return the vegetables equivalent.
I am trying to understand the difference.
Yes. That seems to be the case.
While it may not be saving any time compared to generating general python code vs. specific symbolic code, the real value could be that it has an engine to enforce the contract on LLM responses with the library or even do the calls to the LLM as a common piece of code making it less error prone and bringing consistency in the interactions with the LLM.
Examples I found interesting:
Semantic map lambdas
comparison parameterized by context bitwise ops `interpret()` seems powerful.OP, what inspired you to make this? Where are you applying it? What has been your favorite use case so far?
It takes all the core relational operators and makes an easy semantic version of each as a python dataframe library extension . Each call ends up being a 'model' point in case you also want to do fancier things later like more learning based approaches. Afaict, snowflake and friends are moving in this direction for their cloud SQLs as well.
We ended up doing something similar for louie.ai , where you use AI notebooks/dashboards/APIs (ex: MCP) to talk to your data (splunk, databricks, graph db, whatever), and it'll figure out symbolic + semantic operators based on the context. Super helpful in practice.
My 80% case here is:
- semantic map: "get all the alerts from splunk index xyz, add a column flagging anything suspicious and another explaining why" <--- generates an enriched dataframe
- semantic map => semantic reduce: "... then summarize what you found" <--- then tells you about it in natural text
What's more recent is the DbC contribution which I think is unique. It literally solved anything agent related I've thrown at it -- especially because I can chain contracts together and the guardrails propagate nicely.
I've built most the custom tools myself. For instance, not only perplexity was rendered useless by openai's web search, but openai's web search itself is not good enough for what you can customize yourself. To this end, I've built my own deep research agent. Here's a thread with some results from the first day it was working: https://x.com/futurisold/status/1931751644233945216
I'm also running a company, and we've built an e2e document generation just from contracts (3 contracts chained together in this case). Here's an output (sorry about the PDF rendering, that's not what we serve, it's just what I've quickly hacked for local dev): https://drive.google.com/file/d/1Va7ALq_N-fTYeumKhH4jSxsTrWD...
This was the input:
---
Prompt:
> I want the files to be analyzed and I am interested in finding patterns; feel free to make suggestions as well. I want to understand how different providers use their system prompts, therefore things like: what kind of tags do they use – are they XML, markdown, etc, are they prone toward sycophancy or trying to manipulate the user, are they using tools and if so how, etc. I want the tech report to deconstruct and synthesize and compare the information, find interesting patterns that would be hard to spot.
Generated instructions:
(a) Query: Conduct a comparative analysis of system prompts across major AI providers (OpenAI, Google, Anthropic, xAI, etc.) to identify structural patterns, linguistic frameworks, and operational constraints that shape AI behavior and responses.
(b) Specific Questions:
1. What syntactic structures and formatting conventions (XML, markdown, JSON, etc.) are employed across different AI system prompts, and how do these technical choices reflect different approaches to model instruction?
2. To what extent do system prompts encode instructions for deference, agreeability, or user manipulation, and how do these psychological frameworks vary between commercial and research-focused models?
3. How do AI providers implement and constrain tool usage in their system prompts, and what patterns emerge in permission structures, capability boundaries, and function calling conventions?
4. What ethical guardrails and content moderation approaches appear consistently across system prompts, and how do implementation details reveal different risk tolerance levels between major AI labs?
5. What unique architectural elements in specific providers' system prompts reveal distinctive engineering approaches to model alignment, and how might these design choices influence downstream user experiences?
---
Contracts were introduced in March in this post: https://futurisold.github.io/2025-03-01-dbc/
They evolved a lot since then, but the foundation and motivation didn't change.
"The scope of contracts extends beyond basic validation. One key observation is that a contract is considered fulfilled if both the LLM’s input and output are successfully validated against their specifications. This leads to a deep implication: if two different agents satisfy the same contract, they are functionally equivalent, at least with respect to that specific contract.
This concept of functional equivalence through contracts opens up promising opportunities. In principle, you could replace one LLM with another, or even substitute an LLM with a rule-based system, and as long as both satisfy the same contract, your application should continue functioning correctly. This creates a level of abstraction that shields higher-level components from the implementation details of underlying models."
https://github.com/ExtensityAI/symbolicai/blob/main/examples...
https://arxiv.org/pdf/2402.00854
That said the neuro + symbolic integration here is, like most systems, pretty shallow/firewalled (taxonomically, Type 3 / Neuro;Symbolic — https://harshakokel.com/posts/neurosymbolic-systems). I think the real magic is going to come when we start heading toward a much more fundamental integration. We're actually working on this at my company (https://onton.com). How do we create a post-LLM system that: 1) features an integrated representation (neither purely symbolic nor dense floating point matrix); 2) can learn incrementally from small amounts of noisy data, without being subject to catastrophic forgetting; 3) can perform mathematical and other symbolic operations with bulletproof reliability; and 4) is hallucination-free?
The cobbling together of existing systems hot-glue style is certainly useful, but I think a unified architecture is going to change everything.
valid_sizes is undefined