Show HN: EVA – AI-Relational Database System

Show HN: EVA – AI-Relational Database System github.com/georgia-tech-d...

Hi friends,

We are building EVA, an AI-Relational database system with first-class support for deep learning models. Our goal with EVA is to create a platform that supports AI-powered multi-modal database applications operating on structured (tables, feature vectors, etc.) and unstructured data (videos, podcasts, pdf, etc.) with deep learning models. EVA comes with a wide range of models for analyzing unstructured data, including models for object detection, OCR, text summarization, audio speech recognition, and more.

The key feature of EVA is its AI-centric query optimizer. This optimizer is designed to speed up AI-powered applications using a collection of optimizations inspired by relational database systems. Two of the most important optimizations are:

+ Caching: EVA automatically reuses previous query results (e.g., inference results), eliminating redundant computation and saving you money on inference.

+ Predicate Reordering: EVA optimizes the order in which query predicates are evaluated (e.g., running faster, more selective deep learning models first), leading to faster queries.

Besides saving money spent on inference, EVA also makes it easier to write SQL queries to set up multi-modal AI pipelines. With EVA, you can quickly integrate your AI models into the database system and seamlessly query structured and unstructured data.

We are constantly working on improving EVA and would love to hear your feedback!

-- Text classification application in EVA CREATE TABLE IF NOT EXISTS MyCSV (id INTEGER UNIQUE, comment TEXT(30)); LOAD CSV 'csv_file_path' INTO MyCSV; CREATE UDF HFTextClassifier TYPE HuggingFace 'task' 'text-classification'; SELECT HFTextClassifier(comment) FROM MyCSV;

Very cool. Also, love seeing rambling wrecks from Georgia Tech here!

While this is a very cool project, making a very obvious demo that people can use to leverage it would make this stand out in the current ecosystem of tools like this.

jarulraj · 2 years ago

Thanks! Likewise :)

Thanks for the suggestion! I just added links to the demo applications earlier in the README. All applications are Jupyter notebooks that you can open in Google Colab.

* Examining the emotion palette of actors in a movie: https://evadb.readthedocs.io/en/stable/source/tutorials/03-e...

* Analysing traffic flow at an intersection: https://evadb.readthedocs.io/en/stable/source/tutorials/02-o...

* Classifying images based on their content: https://evadb.readthedocs.io/en/stable/source/tutorials/01-m...

* Recognizing license plates: https://github.com/georgia-tech-db/license-plate-recognition

* Analysing toxicity of social media memes: https://github.com/georgia-tech-db/toxicity-classification

dmix · 2 years ago

I personally wouldn’t put the Emotion one first on the GitHub README, that was the only one I opened, before clicking the license plate one and a) see it was a whole other GitHub demo and b) opened two files to see both doing parsing/loading models without any SQL before getting bored and closing the project.

Maybe I’m not the target market but seeing the 2nd and 3rd example in your list here, which actually has SQL query examples, were much more interesting and relevant IMO

shrikrishna · 2 years ago

It appears from initial reading that it must be possible to support pure NLP tasks with this, but there weren't examples for these in the documentation, so I'm not sure. Does it support NLP models?

Ex: Could I have a store of articles and run NLP tasks against it?

Great question! Yes, EVA supports NLP pipelines thanks to our recent integration of Hugging Face pipelines last month. Here is an illustrative text classification application:

EVA supports many other NLP pipelines, including summarization and text2text generation.

[2] is an illustrative notebook that presents an HF-based object segmentation pipeline (not NLP-based though). We would love to jointly explore how to best support your NLP pipeline. Please consider opening an issue with more details on your use case.

[1] https://github.com/georgia-tech-db/eva/blob/4fa52f893e7661d4...

[2] https://evadb.readthedocs.io/en/latest/source/tutorials/07-o...

ericlewis777 · 2 years ago

startupsfail · 2 years ago

Could you turn this into a psql extension? If this is integrated into an actual database that can be used in production, this may have a future. Otherwise no one will touch this, and it’d be yet another useless and cute experiment from the academia.

edit: thank you for clarifying, it looks like this is not a new database engine and is a cache/query layer.

Thanks for the helpful suggestion! EVA uses an SQL database system for managing structured data using sqlalchemy. It runs on PostgreSQL out of the box. You only need to provide the database connection url in the EVA configuration file.

Thanks for your candid comment. We take it very seriously. EVA is already being used in production by some collaborators and we would love to support more early adopters :) Please let me know if I can DM you to get more feedback.

Nice.

I’ve skimmed over the documentation and it wasn’t clear. It looked like the database was designed from scratch. If this is a caching/syntactic sugar over a mix of DB and inference queries, this is interesting and feels a lot less risky.

potatoman22 · 2 years ago

I'm having trouble understanding what this does. Does it let you compose models via a SQL-like syntax?

That’s correct! You can compose multiple models in a single query to set up useful AI pipelines.

Here is an illustrative query that chains together multiple models:

   -- Analyse emotions of faces in a video
   SELECT id, bbox, EmotionDetector(Crop(data, bbox)) 
   FROM MovieVideo JOIN LATERAL UNNEST(FaceDetector(data)) AS Face(bbox, conf)  
   WHERE id < 15;

unixhero · 2 years ago

Global Defence Initiative selected

v3ss0n · 2 years ago

Welcome back commander

amunutep · 2 years ago

This looks very interesting. I am thinking of testing it out to see its accuracy for text detection and extraction in multiple PDFs. This will sound like an amateur question, but what is the policy on the files used? Do you store them for data training? I am asking as , in the long term, I might use this on some more private files.

We are in the process of supporting a native `LOAD PDF` command. Meanwhile, you could convert each PDF into a series of images and load them using the `LOAD IMAGE` command. You could then run any text extraction user-defined function (e.g., `textract` [1]) over the loaded documents with additional filters based on your constraints (like PDF author or creation date). As EVA is designed for local usage, you can run it on local private files. We would love to jointly explore how to best support your text extraction pipeline. Please consider opening an issue with more details on your use case.

[1] https://textract.readthedocs.io/en/stable/python_package.htm...

catiopatio · 2 years ago

How are you guarding against prompt injection attacks, e.g. either in the queried data, or in untrusted query parameters?

Honestly, we have not extensively thought about prompt injection attacks -- the equivalent of SQL injection attacks in AI-Relational database systems :)

If you have any thoughts on addressing this, please do share! We will incorporate that in the LLM-based functions in EVA.

simonw · 2 years ago

I've written a whole series of posts about prompt injection that you might find useful: https://simonwillison.net/series/prompt-injection/

la64710 · 2 years ago

Very nice… any plans for supporting self hosted LLMs like BERT LLAMA etc?

Great question! We will be adding a ChatGPT-based user-defined function this week (https://github.com/georgia-tech-db/eva/pull/655/).

With LLM-based functions, EVA will support more interesting queries like this:

  SELECT ChatGPT(TextSummarizer(SpeechRecognizer(audio)),
         "Is this video related to the Russia-Ukraine war?")
  FROM VIDEO_CLIPS;

Here, EVA sends the audio of each video clip to a speech recognition model on Hugging Face. It then sends the recognized text to a text summarizer model. EVA executes both models on local GPUs. Lastly, EVA sends the text summary to ChatGPT as a part of the prompt. The ChatGPT UDF is executed remotely.

The critical feature of EVA is that the query optimizer factors the dollar cost of running models for a given AI task (like a question-answering LLM). It picks the appropriate model pipeline with the lowest price that satisfies the user's accuracy requirement.

Great but personally I am interested in locally runnable LLM models instead of sending data to the cloud service like chatGPT.

Deleted Comment