My name is Bea, I built a site called Libraria that uses GPT to do a few things
1. Let you spin up multiple assistants based on your own documents. You can make it public, private, or protected. It has its own subdomain and landing page. 2. Respond in full markdown always, so it can output images, links, code, and more 3. Let you upload articles on the fly within the Chat, so you can ask it questions 4. Make it embeddable in your site with one line of code 5. Let you update it for fun / with your branding 5. Enable syncing for any URLs you let us scrape, so that you can make sure it's always up to date 6. Let you upload multiple file types
I've been working on this for about a month now by myself and you can keep track of my feature updates here: https://libraria.dev/feature-updates
I would LOVE your feedback on anything, and If you're willing to try it out I'm looking for a few beta users that can provide me more continuous feedback that I would gladly waive the fee for!
1) could you please describe your data privacy considerations. Like what happens to my documents after they are uploaded ? Are they stored somewhere (encrypted or not) or deleted ?
2) could you please share more details on how this works “under the hood”. Specifically how do you ingest and digest the knowledge contained in my documents ?
Thanks !
“Step 1
Import or sync documents into Libraria, or add API integrations like Google and Shopify (beta). Bring the docs - let GPT-3.5 do the heavy lifting.”
I might be reading it wrong, or might have missed it on the website, but is it actually GPT-3.5 running over those imported documents? (As in, are you using OpenAI or another third party provider in the background?).
If you’re running a local LLM then the privacy implications are clearly pretty different than if you’re essentially sending people’s documents verbatim into an external LLM.
Tbh I have managed to build this minus the nicer interface using langchain. Was surprisingly easy as someone who doesn't dev daily.
https://blog.langchain.dev/retrieval/
Is it?
You are basically entering a contractual relationship with them regarding the propagation of your document and they are themselves entering a contractual relationship with their suppliers. It's not different from hosting in the cloud.
Do you expect every webapps you use to tell you if they use Azure, AWS or GCP?
In their privacy policies, yes. And I expect them to have signed a DPA with those and other vendors.
> To turn a staircase into a slide, you can use the spell 'Tergeo' to remove any dirt or grime from the stairs, and then use the spell 'Lubricus' to make the stairs slippery. Lubricus is an incantation, which can be cast with a wand and by reciting the word 'Lubricus' while pointing the wand at the object in question. You can also use the spell 'Flagrante Curse', which can make a surface slippery by causing the target object to become extremely hot, in order to render the staircase slippery. However, this spell is considered one of the Unforgivable Curses and is strictly forbidden. It takes years of study and practice to become a proficient wizard, and performing spells can be dangerous, so it's important to learn and practice magic under the guidance of a skilled wizard or witch.
[0] https://harrypotter.fandom.com/wiki/Glisseo
That's not surprising, it's just par for the course with LLMs.
This is the main concern for me with chatgpt bots: they make things up and can be prompt hacked to get outside their bounds.
So just some feedback that your demos should show how you tackle this too.
https://i.imgur.com/VKrfWYm.png
which would make me very wary of hooking it up with an API-key that I'm paying for, since I'd basically be paying for free GPT access for anyone who visits my site, while I would probably only be interesting in paying when they are asking questions related to my topics.
When you post Contributions, you grant us a license (including use of your name, trademarks, and logos): By posting any Contributions, you grant us an unrestricted, unlimited, irrevocable, perpetual, non-exclusive, transferable, royalty-free, fully-paid, worldwide right, and license to: use, copy, reproduce, distribute, sell, resell, publish, broadcast, retitle, store, publicly perform, publicly display, reformat, translate, excerpt (in whole or in part), and exploit your Contributions (including, without limitation, your image, name, and voice) for any purpose, commercial, advertising, or otherwise, to prepare derivative works of, or incorporate into other works, your Contributions, and to sublicense the licenses granted in this section. Our use and distribution may occur in any media formats and through any media channels.
* Summarize these papers on chimpanzee cooperation in the wild. What other papers should I be reading?
* Suggest an interesting master's thesis topic on the early modern economy.
* How good are polygenic scores at predicting educational attainment, and how has this developed over time?
Bonus: integrate it with e.g. google scholar, so it can go and find and read new papers.
Pricing: it is probably easier to start selling this to individual academics. Then when you've got a compelling product and the word is out, you could sell it to the whole lab (at a much higher price because people can put it in their grant budgets).
Gotchas: privacy. Nobody wants their hot unpublished paper to be scooped by a large language model.
I am not sure how the documents are handled in your product, since Chat GPT has a context limit that probably wont be able to hold longer papers in memory.
For me, I have a pdf[0] depicting a system that can be programmed, along with bits of pseudo-code and a lot of clarifications. Something that the Chat GPT could use to spit out an actual implementation, if it were able to "think" about the pdf as a whole. I would love to see if your product is capable of such feat.
[0]: https://arrow.tudublin.ie/cgi/viewcontent.cgi?article=1177&c...
Drive-Based Utility-Maximizing Computer Game Non-Player Characters by Colm Sloan (note that basically only chapter 3 is needed in this case, but its still over 40 pages long)
I broke it trying to upload a PDF but that is ok, I'll try it again at some point.
1) I went through Stripe checkout to upgrade to the $10/month plan, but it's still showing me as on the free plan on the billing page.
2) I guess related to 1), but I want to show my business partner the results of a quick dump of a PDF plus scrape of our website; it's not clear how to supply him with the public chat/bot URL.
3) 'Last scraped' always shows 'invalid date'.
Feel free to reach out to me directly (email address is ***72@gmail.com for my account). Thanks, good luck with the product!