Readit News logoReadit News
alandu commented on Show HN: DaLMatian – Text2sql that works   dalmatian.ai/download... · Posted by u/alandu
HanClinto · 2 years ago
Can you give me an example of the sort of thing you're talking about? I've been using Defog's sql-eval a little bit, but I'd be interested in knowing more about its shortcomings when evaluating these systems.

https://github.com/defog-ai/sql-eval

alandu · 2 years ago
An example question in that eval set is "How many publications were published between 2019 and 2021?". That's something GPT without any context can understand how to answer from a schema (which I assume has a column called publications). An example question that I'd get in my previous role at an ecommerce fraud detection company could be something like "what's the chargeback rate on the ATO segment". Neither chargeback rate nor ATO segment are defined in the database schema. Not only did they have different definitions depending on the context (e.g. which customer), the definition also change over time within the same context.
alandu commented on Show HN: DaLMatian – Text2sql that works   dalmatian.ai/download... · Posted by u/alandu
activatedgeek · 2 years ago
I see. Following up on this, for the sake of being explicit: was the bottleneck here getting all the data sources in place (perhaps for instance access permissions, legal, etc.), writing the SQL query, both, or something else?
alandu · 2 years ago
The bottleneck was mostly in writing the SQL query, which took a lot of time due to the messiness/complexity of the data
alandu commented on Show HN: DaLMatian – Text2sql that works   dalmatian.ai/download... · Posted by u/alandu
bdcravens · 2 years ago
Recommendation: your HN post shouldn't tell us more about the company and product than your website does.
alandu · 2 years ago
Thanks we will rethink how the product is presented on our website!
alandu commented on Show HN: DaLMatian – Text2sql that works   dalmatian.ai/download... · Posted by u/alandu
CharlesW · 2 years ago
It's not, and the "Talk to Sales" and the ToS (https://www.dalmatian.ai/terms) strongly suggest they're targeting enterprise customers with bespoke enterprise-y pricing.
alandu · 2 years ago
Right, we are not open source, but the IDE is free to use. The Slack integration and other product offerings that are in the works will be in the 'premium' version of the product that's sold to enterprise
alandu commented on Show HN: DaLMatian – Text2sql that works   dalmatian.ai/download... · Posted by u/alandu
pelagicAustral · 2 years ago
Why make it a full VSCode download instead of a plugin?
alandu · 2 years ago
There are other product additions in the works, like hooking it up to your locally opened Slack. The plugin would be limiting
alandu commented on Show HN: DaLMatian – Text2sql that works   dalmatian.ai/download... · Posted by u/alandu
moqca · 2 years ago
Cant get this to work. Instructions are very unclear. Was unable to open a snowflake connection. Uploaded schema in a csv file. No indication of what needs to be done next. Assume that manage context queries is where it pulls info from. Added a query and provided a dsecription. Tried Q&A, nothing happened
alandu · 2 years ago
If you open a .sql file into the workspace, the queries in that file will be auto parsed and used as context for Q&A. If you're willing, would love to help debug - could you email support@dalmatian.ai
alandu commented on Show HN: DaLMatian – Text2sql that works   dalmatian.ai/download... · Posted by u/alandu
HanClinto · 2 years ago
Have you run this against UNITE? I'm curious to see how it benchmarks against other text2sql tools:

https://github.com/awslabs/unified-text2sql-benchmark

alandu · 2 years ago
We have not come across any benchmark dataset that's actually worth evaluating on because the questions are not representative of real world enterprise problems. They don't reflect the degree of context needed to answer domain/business-specific questions accurately.
alandu commented on Show HN: DaLMatian – Text2sql that works   dalmatian.ai/download... · Posted by u/alandu
activatedgeek · 2 years ago
In AI/ML research, text to SQL always sounded to me of merely academic interest, in the sense that the outputs are easily verifiable and make for a good proof of concept of a language model's (or a translation model's) capabilities.

But looks like there are plenty of products coming out in this area, and it has me wondering: what is the actual big picture for enterprises here?

I would assume enterprises employ enough people to write yet another query for whatever use case.

- Is the expectation that in the future, we can bring the flexibility of SQL-like languages to people unfamiliar with SQL?

- Perhaps a salesperson unfamiliar with SQL would like to conduct an analysis. Is the volume and variety of such queries so high that optimizing for the turnaround time from an SQL query designed by data analyst to the salesperson to consume the results is so worthwhile?

Perhaps I am underestimating the scale of the problem but would love some insider perspective here.

alandu · 2 years ago
I used to get slammed with so many requests that my boss had to tell the sales team to reduce the number of questions and only ask highest priority ones. Analytics teams serve a lot of different teams in an org, and the requests can really pile up. I was basically a bottleneck, which was a lose-lose for me since I was slammed with work and for business stakeholders too since they had to either wait a long time for responses or were limited in what they could even ask.
alandu commented on Show HN: DaLMatian – Text2sql that works   dalmatian.ai/download... · Posted by u/alandu
laidoffamazon · 2 years ago
Two notes:

1) I appreciate that it's said to be local first but the fact that it depends on an OpenAI API usage is...kinda a big hole in that? The organization I work in wouldn't really accept this for approval, and from the title I was hoping that this would be a local-first fine tuned (or fine-tunable) LLM.

2) The about page stating that you met at Princeton is a huge bear signal for me. I don't think tools should be adopted based on how much of an elite (cognitive or financial or social or athletic or whatever) their creators are, and given the use of the OpenAI APIs I question why the "top ML conferences" bit is here at all.

alandu · 2 years ago
1 - yes our current solution does require you to be allowed to use ChatGPT/OpenAI. Unfortunately the accuracy using smaller models (even GPT-3.5) is poor. We don't see a local model (which will be much worse than GPT-3.5) even with fine tuning being anywhere close to good enough (would also require a really large number of queries). So we are relying on GPT-4 for now.

2 - agreed the background isn't why anyone should adopt a tool, just wanted to share our story. I would add that creating a good wrapper can actually be quite challenging, need to synthesize many pieces under constraints like memory, compute, speed, accuracy.

alandu commented on Show HN: DaLMatian – Text2sql that works   dalmatian.ai/download... · Posted by u/alandu
jddj · 2 years ago
The trend of these apps (admittedly, there are worse offenders than these guys) which stress how your data is completely safe, encrypted in transit, not stored on our servers, yours forever...by the way, everything is piped straight into OpenAI is a bit tiring.
alandu · 2 years ago
Just want to clarify that OpenAI does not train on the query code and schema info we send via API. It's equivalent to using https://chat.openai.com/ setting "Improve the model for everyone" (previously "Chat history & training") to off in Data Controls.

u/alandu

KarmaCake day17July 1, 2020View Original