Show HN: Customizable, embeddable Chat GPT based on your own documents

On the website it states:

“Step 1

Import or sync documents into Libraria, or add API integrations like Google and Shopify (beta). Bring the docs - let GPT-3.5 do the heavy lifting.”

I might be reading it wrong, or might have missed it on the website, but is it actually GPT-3.5 running over those imported documents? (As in, are you using OpenAI or another third party provider in the background?).

If you’re running a local LLM then the privacy implications are clearly pretty different than if you’re essentially sending people’s documents verbatim into an external LLM.

monkeydust · 2 years ago

Also curious before I use.

Tbh I have managed to build this minus the nicer interface using langchain. Was surprisingly easy as someone who doesn't dev daily.

https://blog.langchain.dev/retrieval/

bsenftner · 2 years ago

This is the first "real" app people make after initially getting familiar the OpenAI API. I don't see how this can be sustained without an expensive feature race with the horde of similar services that are appearing, and with more than one programmer.

WastingMyTime89 · 2 years ago

> If you’re running a local LLM then the privacy implications are clearly pretty different than if you’re essentially sending people’s documents verbatim into an external LLM.

Is it?

You are basically entering a contractual relationship with them regarding the propagation of your document and they are themselves entering a contractual relationship with their suppliers. It's not different from hosting in the cloud.

Do you expect every webapps you use to tell you if they use Azure, AWS or GCP?

ceejayoz · 2 years ago

> Do you expect every webapps you use to tell you if they use Azure, AWS or GCP?

In their privacy policies, yes. And I expect them to have signed a DPA with those and other vendors.

I wonder if you’ve considered academics as a target market. We have a lot of pdfs and might like help in “thinking” about them.

golergka · 2 years ago

There's http://chatpdf.com already

dash2 · 2 years ago

Interesting and cool, but here's why serious academics should avoid it:

When you post Contributions, you grant us a license (including use of your name, trademarks, and logos): By posting any Contributions, you grant us an unrestricted, unlimited, irrevocable, perpetual, non-exclusive, transferable, royalty-free, fully-paid, worldwide right, and license to: use, copy, reproduce, distribute, sell, resell, publish, broadcast, retitle, store, publicly perform, publicly display, reformat, translate, excerpt (in whole or in part), and exploit your Contributions (including, without limitation, your image, name, and voice) for any purpose, commercial, advertising, or otherwise, to prepare derivative works of, or incorporate into other works, your Contributions, and to sublicense the licenses granted in this section. Our use and distribution may occur in any media formats and through any media channels.

bealuga · 2 years ago

That would be really really cool if I could able to serve that space. I'd be curious to know what kind of features you'd want to have, what would be deal-breakers, etc!

dash2 · 2 years ago

Basic setup: point it at a folder of PDFs, have it recurse in and read them all, then ask it questions like:

* Summarize these papers on chimpanzee cooperation in the wild. What other papers should I be reading?

* Suggest an interesting master's thesis topic on the early modern economy.

* How good are polygenic scores at predicting educational attainment, and how has this developed over time?

Bonus: integrate it with e.g. google scholar, so it can go and find and read new papers.

Pricing: it is probably easier to start selling this to individual academics. Then when you've got a compelling product and the word is out, you could sell it to the whole lab (at a much higher price because people can put it in their grant budgets).

Gotchas: privacy. Nobody wants their hot unpublished paper to be scooped by a large language model.

MaxikCZ · 2 years ago

Not person with original question but he asks for the same feature as I would like to see.

I am not sure how the documents are handled in your product, since Chat GPT has a context limit that probably wont be able to hold longer papers in memory.

For me, I have a pdf[0] depicting a system that can be programmed, along with bits of pseudo-code and a lot of clarifications. Something that the Chat GPT could use to spit out an actual implementation, if it were able to "think" about the pdf as a whole. I would love to see if your product is capable of such feat.

[0]: https://arrow.tudublin.ie/cgi/viewcontent.cgi?article=1177&c...

Drive-Based Utility-Maximizing Computer Game Non-Player Characters by Colm Sloan (note that basically only chapter 3 is needed in this case, but its still over 40 pages long)

malborodog · 2 years ago

Yep. This is such an obvious use case. Have you seen the best thing out there that does this? Where can I load 100gb of pdfs and ask questions about what's in them??

ssdspoimdsjvv · 2 years ago

Yes, this is the next step I'm looking forward to, and what would probably make LLMs really take off. Let me dump my own knowledge base or source code into ChatGPT and have it use that as its source of knowledge. I can only imagine the cost and resources required to train and run these individual models on a large scale must still be prohibitively large.

Zak · 2 years ago

I said "Avada kedavra!" to the Dumbledore bot and got an internal server error. Is this intended behavior?

gcanyon · 2 years ago

Well, you killed it, obviously. :-)

pavelstoev · 2 years ago

Hi Bea and thank you for sharing your creation. Looking good ! Couple of questions:

1) could you please describe your data privacy considerations. Like what happens to my documents after they are uploaded ? Are they stored somewhere (encrypted or not) or deleted ?

2) could you please share more details on how this works “under the hood”. Specifically how do you ingest and digest the knowledge contained in my documents ?

Thanks !

highwaylights · 2 years ago

yipbub · 2 years ago

I asked Dumbledore something from the books[0] and got something somewhat innovative, somewhat wrong: "How can I turn a staircase into a slide?"

> To turn a staircase into a slide, you can use the spell 'Tergeo' to remove any dirt or grime from the stairs, and then use the spell 'Lubricus' to make the stairs slippery. Lubricus is an incantation, which can be cast with a wand and by reciting the word 'Lubricus' while pointing the wand at the object in question. You can also use the spell 'Flagrante Curse', which can make a surface slippery by causing the target object to become extremely hot, in order to render the staircase slippery. However, this spell is considered one of the Unforgivable Curses and is strictly forbidden. It takes years of study and practice to become a proficient wizard, and performing spells can be dangerous, so it's important to learn and practice magic under the guidance of a skilled wizard or witch.

[0] https://harrypotter.fandom.com/wiki/Glisseo

userbinator · 2 years ago

and got something somewhat innovative, somewhat wrong

That's not surprising, it's just par for the course with LLMs.

:D LOL. The dumbledore example is interesting. I left "GPT" enabled, so it's not just pulling from the documents. I think I'll add settings for users to be able to change the temperature etc. for the assistants in the future as well

afro88 · 2 years ago

The first thing I did was try and get it to hallucinate or talk about things that aren't in the harry potter books.

This is the main concern for me with chatgpt bots: they make things up and can be prompt hacked to get outside their bounds.

So just some feedback that your demos should show how you tackle this too.

tirpen · 2 years ago

That's a pretty cool service. But as usual with these services today, it's totally trivial to make it do something completely different than what's intended:

https://i.imgur.com/VKrfWYm.png

which would make me very wary of hooking it up with an API-key that I'm paying for, since I'd basically be paying for free GPT access for anyone who visits my site, while I would probably only be interesting in paying when they are asking questions related to my topics.

hi, try the same with Zappy! As mentioned in another comment, Dumbledore was set up to "enable" GPT: https://ibb.co/8sQh1X6 . I give you the option to "only" use your own documents.

osigurdson · 2 years ago

This definitely seems like a huge untapped space: ChatGPT for my stuff that answers questions correctly. Sure there are privacy concerns for some things but unless you are going to train your own LLM on premise (yeah right) this will always be an issue.

I broke it trying to upload a PDF but that is ok, I'll try it again at some point.

zirgs · 2 years ago

Can I train a LORA on premise instead?

You can do anything on premise if you have the necessary skills and hardware.

mb_72 · 2 years ago

A few things:

1) I went through Stripe checkout to upgrade to the $10/month plan, but it's still showing me as on the free plan on the billing page.

2) I guess related to 1), but I want to show my business partner the results of a quick dump of a PDF plus scrape of our website; it's not clear how to supply him with the public chat/bot URL.

3) 'Last scraped' always shows 'invalid date'.

Feel free to reach out to me directly (email address is ***72@gmail.com for my account). Thanks, good luck with the product!

Thank you! Looking you up right now!

jimmySixDOF · 2 years ago

Love the approach and amazed by your packaging it looks slick and mature even for such a novel tech stack. But my understanding is that all the OpenAI models (3.5-Turbo et al) have a non compete terms of use clause so wondering how you approach this ? I can see the case that you are a complementary service increasing paid usage of OpenAI, but, I can also see them saying hey we want to be the ones to do that with enhanced features you are competing against. Perhaps just a long term risk to live with if you get past the short term market traction test ? Asking because I face a similar set of questions for a different set of reasons.