That's kind of ironic; I made my current website layout by asking ChatGPT.
(But then, my current design isn't too different from the one I've been using for over a decade, and it's gone from fashionable to retro in that time).
This is the future of self support. Instead of those shitty chat bot state machines that only offer the same FAQ you just searched through, it now can infer all your company documentation to your users (external facing of course) so that they can find exactly what they are looking for. MDN docs would be easily searchable (I mean, they already are really accessible). Your company’s fizzbuzz wizbang-SNAPSHOT-bim.bam.boom.jar docs would actually make sense to humans and your engineers will no longer have to be in customer meetings!
Probably also not any more frequent than your average sales guy (at least those that overpromise _every_ feature to their leads/accounts and casually ask you to whip it up and ofc deploy it on a friday afternoon so the promise they made to the strategically and overall super important client isn’t revealed as utter lies).
If I have fairly fixed documentation and documents (won't be updated in months), what's the benefit of using a vector database (e.g. pinecone or supabase w/ vectors) rather than just saving the pickle (pkl) file and looking it up every time?
Shouldn't using the pickle file be much faster/more efficient?
If you have a small number of fixed documents e.g. <100k or so, then I agree that pickling the vectors or storing them as bytearrays would work better.
Once you reach a certain scale, it's helpful to potentially use distributed querying and/or different index types, even if you have a fairly static dataset. You can check out a billion-scale search benchmark we recently did here: https://zilliz.com/resources/milvus-performance-benchmark (you'll need to supply your email unfortunately). Here's the framework we used as well: https://github.com/zilliztech/vectordb-benchmark
This looks very nice — a great improvement over existing search engines for docs. It’d be great if it could also scan restructuredText and Asciidoc docs repos.
The way it went is: we built this as part of Motif for the past month, and our users loved it. Many asked for a way to add this feature to their existing sites, so we made a standalone platform that streamlines the process, and open sourced it :)
The usual knowledge about evaluating products applies.
If you go to the website of markprompt the people who made it already appear to be accomplished entrepreneurs, having worked on something called Motif, which I hadn't heard of but appears to be legit.
They also have a nice website and everything I read makes it sound like they know what they're doing.
These don't count for much but they count for something. I haven't investigated them much but I think this should be assessed like any other startup product.
Plugins are coming, and wrappers are still having to pay openAI and on top of that their own slice of the pie. Since wrappers aren't really cultivating the information themselves, nothing is stopping you from making your own either, the openAI API isn't a big difficult secret. You can even ask openAI to write your own integration for you!
You can probably also ask openAI about the nextjs/tailwind starter repo everyone of these wrappers keep relying on too.
Embeddings are created using OpenAI's ada model. They are stored in Supabase with the vector extension, which offers a simple way to compute vector similarities. Then the associated sections are added to the prompt context.
- https://github.com/microsoft/semantic-kernel/tree/main/sampl...
- https://www.producthunt.com/posts/gitterbot-io-conversationa...
- https://github.com/neuml/txtai/blob/master/examples/03_Build...
- https://github.com/openai/openai-cookbook/blob/main/examples...
(But then, my current design isn't too different from the one I've been using for over a decade, and it's gone from fashionable to retro in that time).
Not a frontend developer, but curious what prompt(s) you used?
I would really like to see it in action on a docbase I'm familiar with, though.
Edit: Just tried again, and it hangs on doc 55 out of 421.
Here's the site if anyone else wants to give it a go: https://github.com/fusionauth/fusionauth-site/
Shouldn't using the pickle file be much faster/more efficient?
Once you reach a certain scale, it's helpful to potentially use distributed querying and/or different index types, even if you have a fairly static dataset. You can check out a billion-scale search benchmark we recently did here: https://zilliz.com/resources/milvus-performance-benchmark (you'll need to supply your email unfortunately). Here's the framework we used as well: https://github.com/zilliztech/vectordb-benchmark
If you go to the website of markprompt the people who made it already appear to be accomplished entrepreneurs, having worked on something called Motif, which I hadn't heard of but appears to be legit.
They also have a nice website and everything I read makes it sound like they know what they're doing.
These don't count for much but they count for something. I haven't investigated them much but I think this should be assessed like any other startup product.
You can probably also ask openAI about the nextjs/tailwind starter repo everyone of these wrappers keep relying on too.
Deleted Comment