Readit News logoReadit News
llmllmllm commented on Husband and wife outed as GRU spies aiding bombings and poisonings across Europe   theins.ru/en/politics/271... · Posted by u/dralley
FabHK · 2 years ago
Examples/source? According to this stack overflow question, the US is (fairly) unique in not having immigration exit checks.

https://travel.stackexchange.com/questions/122289/why-don-t-...

llmllmllm · 2 years ago
The UK doesn't have exit checks.
llmllmllm commented on Mkcontext – Generate ChatGPT prompts from files   github.com/matthewrobertb... · Posted by u/llmllmllm
llmllmllm · 2 years ago
I just made this after realising that chatgpt 4 now has 32k input context so it's useful to inject multiple files into my prompts easily
llmllmllm commented on     · Posted by u/llmllmllm
TekMol · 2 years ago
This is a blog post with a "Create Free Account" button, not a Show HN:

https://news.ycombinator.com/showhn.html

"Off topic: blog posts, sign-up pages, newsletters, lists, and other reading material."

llmllmllm · 2 years ago
"Show HN is for something you've made that other people can play with. HN users can try it out, give you feedback, and ask questions in the thread."

We made it, you can play with it :)

llmllmllm commented on     · Posted by u/llmllmllm
vletal · 2 years ago
Please change the title, it is misleading. I actually read through the text first trying to find information about how GPT-4 is integrated with DALL-E3.

Moreover, I am not a fan of stock photos in articles and I find the generated illustrations even worse. Both quality and relevance to the accompanied text is very low.

llmllmllm · 2 years ago
This is an interface that has GPT-4 producing content while also producing images using DALL-E 3, we have integrated them.
llmllmllm commented on     · Posted by u/llmllmllm
topoftheforts · 2 years ago
I think the main link of this post should be the homepage and not the generated report, I was very confused at what I was looking at.

I understand the value proposition of this but I can't get past the feeling of "this is just more auto-generated garbage". I am aware that a lot of online outlets do pretty much the same - they summarize a press release and make it their own, adding some images - and this tool is automating that process.

I just can't see myself ever reading an article like this, especially cause it's obvious to me that it's AI generated (paragraphs starting with "Moreover," , "In Summary,"). I'm a sample size of 1 though, and many other people might still read articles like this.

Another doubt I have is in regards to Google - do they like this kind of content? From what I understand they're not against AI but it has to provide valuable information. This seems to regurgitate existing information without extra commentary, so is this even helping SEO?

Apologies if this comes across as negative, I appreciate the work involved, just trying to give some honest feedback.

llmllmllm · 2 years ago
Feedback is great, thank you.

I linked to the shared report so that it's obvious what is actually being produced, rather than just a description of it.

This is just an example summary of a single web page, you could imagine it producing much more compelling content, for example combining multiple pieces of information, along with a twist, just as many humans do when publishing on the web. FlowChai could search hundreds of documents to produce a report (using RAG), so in that sense it can go beyond what a person could do in a reasonable time frame.

llmllmllm commented on New models and developer products   openai.com/blog/new-model... · Posted by u/kevin_hu
llmllmllm · 2 years ago
While this makes some of what my startup https://flowch.ai does a commodity (file uploads and embeddings based queries are an example, but we'll see how well they do it - chunking and querying with RAG isn't easy to do well), the lower prices of models make my overall platform way better value, so I'd say overall it's a big positive.

Speaking more generally, there's always room for multiple players, especially in specific niches.

llmllmllm commented on Embeddings: What they are and why they matter   simonwillison.net/2023/Oc... · Posted by u/simonw
simonw · 2 years ago
OpenAI's embedding model works up to 8,000 tokens. The sentence-transformers ones are mostly smaller than that, though there's a new model that just came out that can handle 8,000: https://huggingface.co/jinaai/jina-embeddings-v2-base-en

People tend to "chunk" larger documents - the chunking strategy that's best is very dependent on what you are using the embeddings for. I've found it frustratingly hard to find really good guidance as to chunking strategies.

I've had good results for Q&A chunking my blog content up into paragraph sized chunks, as described here: https://simonwillison.net/2023/Oct/23/embeddings/#answering-... - but I'm not ready to say that's a universally good practice.

llmllmllm · 2 years ago
I have found success with chunking with 100 tokens, preceeded by the last 10 tokens of the previous chunk, and the first 10 tokens of the next chunk, 120 tokens total. I generate an embedding for each, then compare that to embedding(s) derived from the input query.

How to generate embeddings from the input query well is where one's focus should be IMO. An example: "don't mention x" being turned into filtering out / de-emphasizing chunks that align with the embedding for x.

I've been using these techniques along with pgvector and OpenAI's embeddings for https://flowch.ai and it works really well. A user uploads a document or uses the Chrome Extension on a webpage and FlowChai chunks up the content, generates embeddings, builds up a RAG context and then produces a report based on the user's prompt.

I hope that helps show a real world example. You're welcome to play with FlowChai for free to see how it works in practice at the application level.

Deleted Comment

u/llmllmllm

KarmaCake day21July 18, 2023View Original