Readit News logoReadit News
tompec commented on Show HN: Turn any website into a knowledge base for LLMs   embedding.io/... · Posted by u/tompec
boredemployee · a year ago
does it embbed images as well? if not, do you plan to do so?
tompec · a year ago
It doesn't embed images, no. But that's a great idea for the roadmap!
tompec commented on Show HN: Turn any website into a knowledge base for LLMs   embedding.io/... · Posted by u/tompec
kaycebasques · a year ago
The more detail, the better. If `<section>` elements are found you chunk those? Do you do it recursively or do you stop after a certain level? And when section elements don't exist, you use `<h1>`, `<h2>`, etc. to infer logical chunks?
tompec · a year ago
Having looked at a lot of HTMLs, I noticed that sections are not really the default. I rely on headings (h1, h2, ...) to chunk each pages. Each chunk has its heading hierarchy attached to it. There are a lot of optimizations that could be done at that level.
tompec commented on Show HN: Turn any website into a knowledge base for LLMs   embedding.io/... · Posted by u/tompec
kaycebasques · a year ago
I spent a lot of time thinking about how to manage embeddings for docs sites. This is basically the same solution that I landed on but never got around to shipping as a general-purpose product.

A key question that the docs should answer (and perhaps the "How it works" page too): chunking. You generate an embedding for the entire page? Or do you generate embeddings for sections? And what's the size limit per page? Some of our docs pages have thousands of words per page. I'm doubtful you can ingest all that, let alone whether the embedding would be that useful in practice.

tompec · a year ago
I chunk pages and generate embeddings for each chunk. So there's no real size limit per page.
tompec commented on Show HN: Turn any website into a knowledge base for LLMs   embedding.io/... · Posted by u/tompec
rvz · a year ago
> Enterprise: Contact Us

If there is no certifications or compliance information then I don't think there is anything to discuss about any enterprise plan.

tompec · a year ago
Gotta start somewhere :)
tompec commented on Show HN: Turn any website into a knowledge base for LLMs   embedding.io/... · Posted by u/tompec
michaelmior · a year ago
This looks interesting, but I get a 404 on the iframe when I try to go into the chat.
tompec · a year ago
Sorry about that, a bit too much load at the moment
tompec commented on Show HN: Turn any website into a knowledge base for LLMs   embedding.io/... · Posted by u/tompec
dmje · a year ago
Agree with this. I also think the emphasis here (to OP) should be "I'd be willing to happily pay for it" - ie I'd rather be paying a reasonable amount each month for something that is going to remain active that have the large (current) disparity between "free" and "enterprise". I'd say make some middle tiers of (I don't know?) $5 / $10 / $20 a month for reasonable numbers of queries or whatever. Keep the "enterprise" offering there for the biggies, but offer us small players some hope that this will be sufficiently funded / supported.

Brilliant idea, btw, I like it :-)

tompec · a year ago
Thanks! I'm still figuring things out about pricing, but there will be small plans available.
tompec commented on Show HN: Turn any website into a knowledge base for LLMs   embedding.io/... · Posted by u/tompec
pryelluw · a year ago
Does this respect robots.txt?
tompec · a year ago
It does respect robots.txt when crawling. I'll add more details about this in the docs.
tompec commented on Show HN: Turn any website into a knowledge base for LLMs   embedding.io/... · Posted by u/tompec
khanan · a year ago
Can this be deployed on-prem or is it an cloud-toy?
tompec · a year ago
Currently just a cloud-toy.
tompec commented on Show HN: Turn any website into a knowledge base for LLMs   embedding.io/... · Posted by u/tompec
23B1 · a year ago
I tried it out. This would be extremely useful to me to the point I'd be willing to happily pay for it, as it's something I would have otherwise had to spend a long time hacking together.

1) The returned output from a query seems pretty limited in length and breadth.

2) No apparent way to adjust my prompts to improve/adjust the output e.g. not really 'conversational' (not sure if that is your intent)

Otherwise keep developing and be sure to push update notifications to your new mailing list! ;-)

tompec · a year ago
Thanks! The chat demo is actually just a small thing I put together as a preview of what can be done, but the main product is the API. But seeing that most users seem to like that, there's probably something there... If you want to email me at support at embedding.io with some requirements, I can see how to make that work for you.
tompec commented on Show HN: Turn any website into a knowledge base for LLMs   embedding.io/... · Posted by u/tompec
danirogerc · a year ago
Can I query multiple vectorized websites at once? Can I export vectorized websites and host them myself? Any chance to export them to a no-code format, like PDF?
tompec · a year ago
You can group as many websites as you want into a collection. Then query that collection. Not sure what you mean by exporting; you would like to export the vectors themselves? Or just the chunks of text from the websites?

u/tompec

KarmaCake day213May 6, 2017
About
thomas.io
View Original