https://news.ycombinator.com/showhn.html
"Off topic: blog posts, sign-up pages, newsletters, lists, and other reading material."
We made it, you can play with it :)
Moreover, I am not a fan of stock photos in articles and I find the generated illustrations even worse. Both quality and relevance to the accompanied text is very low.
I understand the value proposition of this but I can't get past the feeling of "this is just more auto-generated garbage". I am aware that a lot of online outlets do pretty much the same - they summarize a press release and make it their own, adding some images - and this tool is automating that process.
I just can't see myself ever reading an article like this, especially cause it's obvious to me that it's AI generated (paragraphs starting with "Moreover," , "In Summary,"). I'm a sample size of 1 though, and many other people might still read articles like this.
Another doubt I have is in regards to Google - do they like this kind of content? From what I understand they're not against AI but it has to provide valuable information. This seems to regurgitate existing information without extra commentary, so is this even helping SEO?
Apologies if this comes across as negative, I appreciate the work involved, just trying to give some honest feedback.
I linked to the shared report so that it's obvious what is actually being produced, rather than just a description of it.
This is just an example summary of a single web page, you could imagine it producing much more compelling content, for example combining multiple pieces of information, along with a twist, just as many humans do when publishing on the web. FlowChai could search hundreds of documents to produce a report (using RAG), so in that sense it can go beyond what a person could do in a reasonable time frame.
Speaking more generally, there's always room for multiple players, especially in specific niches.
People tend to "chunk" larger documents - the chunking strategy that's best is very dependent on what you are using the embeddings for. I've found it frustratingly hard to find really good guidance as to chunking strategies.
I've had good results for Q&A chunking my blog content up into paragraph sized chunks, as described here: https://simonwillison.net/2023/Oct/23/embeddings/#answering-... - but I'm not ready to say that's a universally good practice.
How to generate embeddings from the input query well is where one's focus should be IMO. An example: "don't mention x" being turned into filtering out / de-emphasizing chunks that align with the embedding for x.
I've been using these techniques along with pgvector and OpenAI's embeddings for https://flowch.ai and it works really well. A user uploads a document or uses the Chrome Extension on a webpage and FlowChai chunks up the content, generates embeddings, builds up a RAG context and then produces a report based on the user's prompt.
I hope that helps show a real world example. You're welcome to play with FlowChai for free to see how it works in practice at the application level.
Deleted Comment
https://travel.stackexchange.com/questions/122289/why-don-t-...