We are building EVA, an AI-Relational database system with first-class support for deep learning models. Our goal with EVA is to create a platform that supports AI-powered multi-modal database applications operating on structured (tables, feature vectors, etc.) and unstructured data (videos, podcasts, pdf, etc.) with deep learning models. EVA comes with a wide range of models for analyzing unstructured data, including models for object detection, OCR, text summarization, audio speech recognition, and more.
The key feature of EVA is its AI-centric query optimizer. This optimizer is designed to speed up AI-powered applications using a collection of optimizations inspired by relational database systems. Two of the most important optimizations are:
+ Caching: EVA automatically reuses previous query results (e.g., inference results), eliminating redundant computation and saving you money on inference.
+ Predicate Reordering: EVA optimizes the order in which query predicates are evaluated (e.g., running faster, more selective deep learning models first), leading to faster queries.
Besides saving money spent on inference, EVA also makes it easier to write SQL queries to set up multi-modal AI pipelines. With EVA, you can quickly integrate your AI models into the database system and seamlessly query structured and unstructured data.
We are constantly working on improving EVA and would love to hear your feedback!
Ex: Could I have a store of articles and run NLP tasks against it?
[2] is an illustrative notebook that presents an HF-based object segmentation pipeline (not NLP-based though). We would love to jointly explore how to best support your NLP pipeline. Please consider opening an issue with more details on your use case.
[1] https://github.com/georgia-tech-db/eva/blob/4fa52f893e7661d4...
[2] https://evadb.readthedocs.io/en/latest/source/tutorials/07-o...
While this is a very cool project, making a very obvious demo that people can use to leverage it would make this stand out in the current ecosystem of tools like this.
Thanks for the suggestion! I just added links to the demo applications earlier in the README. All applications are Jupyter notebooks that you can open in Google Colab.
* Examining the emotion palette of actors in a movie: https://evadb.readthedocs.io/en/stable/source/tutorials/03-e...
* Analysing traffic flow at an intersection: https://evadb.readthedocs.io/en/stable/source/tutorials/02-o...
* Classifying images based on their content: https://evadb.readthedocs.io/en/stable/source/tutorials/01-m...
* Recognizing license plates: https://github.com/georgia-tech-db/license-plate-recognition
* Analysing toxicity of social media memes: https://github.com/georgia-tech-db/toxicity-classification
Maybe I’m not the target market but seeing the 2nd and 3rd example in your list here, which actually has SQL query examples, were much more interesting and relevant IMO
edit: thank you for clarifying, it looks like this is not a new database engine and is a cache/query layer.
Thanks for your candid comment. We take it very seriously. EVA is already being used in production by some collaborators and we would love to support more early adopters :) Please let me know if I can DM you to get more feedback.
I’ve skimmed over the documentation and it wasn’t clear. It looked like the database was designed from scratch. If this is a caching/syntactic sugar over a mix of DB and inference queries, this is interesting and feels a lot less risky.
Here is an illustrative query that chains together multiple models:
[1] https://textract.readthedocs.io/en/stable/python_package.htm...
If you have any thoughts on addressing this, please do share! We will incorporate that in the LLM-based functions in EVA.
With LLM-based functions, EVA will support more interesting queries like this:
Here, EVA sends the audio of each video clip to a speech recognition model on Hugging Face. It then sends the recognized text to a text summarizer model. EVA executes both models on local GPUs. Lastly, EVA sends the text summary to ChatGPT as a part of the prompt. The ChatGPT UDF is executed remotely.The critical feature of EVA is that the query optimizer factors the dollar cost of running models for a given AI task (like a question-answering LLM). It picks the appropriate model pipeline with the lowest price that satisfies the user's accuracy requirement.
Deleted Comment