I wanted to share a pet project of mine. I built HackYourNews [1] to scratch a personal itch: Knowing which stories to focus on while browsing aimlessly (though there is a certain joy in that, as well!)
HackYourNews uses OpenAI's gpt-3.5-turbo to summarize the destination article as well as the comments section. Summarization of the article is always cached, while summaries of the comments are regenerated if the comments count is >10% (or >10 comments) different.
While I styled the homepage to welcome HNers, my preferred view is the Mobile view, accessed from the navbar. This no-frills view honors OS-level dark mode and is easy to skim on any device.
Tried to keep the site minimal. The only JS is Cloudflare's privacy-preserving analytics [2], just to gauge interest.
This is the first time I'm releasing something to the wild.
Hope you find this useful!
The frontend is pure HTML+CSS.
The backend is NodeJS (Puppeteer) + Python with the excellent Microsoft Guidance [3] library to interface to OpenAI's API.
Slight feedback:
- Many "comments" summary start with boilerplate such as "This content discusses" which a bit annoying.
- It would be good to have a sense of "controversy" in the comments summary. Like some kind of general "mood".
Will work on improving the conciseness of the summary and also surface the mood of the discussion.
In my mind, the gold standard for engaging summaries is Seeking Alpha. As a random example, see https://seekingalpha.com/article/4633758-sell-amazon-before-...
If you could train the model to come up with well structured bullet points, the summaries would be amenable to scanning before committing to fully engage. This is just an idea, I am not sure what fraction of your readers would prefer bullet points.
> The article discusses the limitations of using LLMs (Language Model Models) and RAG (Retrieval-Augmented Generation) in AI systems due to the missing storage layer. The author points out two unstated assumptions: that similar vectors are relevant documents and that the vector index can accurately identify the top K vectors by cosine similarity. However, these assumptions are not always true, leading to the need for re-ranking and measuring the index's precision and recall. The comment section further explores the relationship between cosine similarity and relevance, as well as the use of different embeddings like Word2Vec and DistilBERT. Some commenters also discuss the benefits of using vector DBs for specific cases like customer chatbots. Overall, the article highlights the challenges and considerations in implementing LLMs and RAG in AI systems.
This makes me want to read the comments more because there’s some useful stuff, but I often would have gotten tripped up on the scale of the comment section and missed some of the more useful comments.
I think you're onto something. Congrats.
0 - https://hackernewsletter.com/
Dead Comment
Small comment: the "dehyped" title does not seem very useful. For most articles it is almost the same as the original title, just rephrased. It should summarize the conclusion or whatever the meat of the article actually is. Repeating the same thing or similar is just a waste of time and space.
Also it seems broken on the site itself, but that could be the AI getting confused on what the actual page title should be. It picked the top story as the dehyped title, which I guess is understandable but when generating a title for a news feed, it's wrong.
The prompt for dehyping was not specific enough, but that (and a related bug) has been fixed. The dehyped titles make (more) sense now and are not the same as the HN article title.