What I'd really like is a service that edits down YouTube videos by removing all the stock footage and talking head crap, then speeding up the audio to fit over the remaining novel information—whether that's new battlefield footage, electron micrographs, demonstrations of machining techniques, or just elephant toothpaste. The talking head filler seems like it would be easy to recognize, but stock footage recognition presumably would have a significant false negative rate, which is okay.
This would reduce some videos to just a transcription, which would be the ideal outcome, I think. The less of my limited time on Earth I waste watching some dumbshit reading a script at a camera, the better. Summarizing the transcript further like this site does might be occasionally useful, of course.
I use SponsorBlock (be sure to enable all categories of blocking such as filler content, not just sponsors), DeArrow (de-clickbaits thumbnails and titles) and a video speed changer extension to enable much of your stated functionality, though of course not all. I've saved likely years of watching due to this combination.
One of the best things in SponsorBlock is Highlight segments. If the video is 10 minutes of filler building up to a single interesting/exciting moment you can often see exactly on the timeline where to jump to.
How did your viewing experience change after using DeArrow? I'm curious because DeArrow has a great reputation, yet my gut feeling tells me that I should avoid watching clickbait videos altogether.
I don't watch YouTube but if I would / I would if I'd cut out anything with faces or speech & use an LLM to summarize what's technically relevant from the transcript in a way that fits length of what remains.
Pipeline such content, but use weighted random videos, with low weights for types of content with clickbait headings & perhaps blacklist for words like meme or lol in transcript to cut out things with stock footage. I am not sure of exact best way to remove it, actually, other than "using the transcript for some computational technique of probabilitistic stock footage prediction" which I bet would be most effective.
If that's what you want then why would you want a service like this? Surely there would be a non-video news sources of electron micrographs or elephant toothpaste for which there would probably be hundreds or thousands of LLM TL;DR things.
I think you don't have a very clear idea of what I am talking about. You cannot present electron micrographs as text. You can present individual electron micrographs as still images, but not animation. Similarly, video of elephant toothpaste can only be presented as video, even if still images can be arresting. There is no sense in which a textual description of machining techniques or a Ukrainian battlefield is a substitute for video footage of them. In 5 seconds they can convey information that no amount of linguistic description can. Sometimes that information is even true.
What I hate is when I'm trying to find such irreplaceable information and instead my search results are full of vapid stock footage and hubristic talking heads overconfidently reading a script out loud as they gaze at a video camera. It's like AI slop without the creativity.
Wow, I'm really surprised that my comment describing 95% of YouTubers as "some dumbshit" got voted up to +19. I guess I'm not the only old man shaking his fist at the surveillance capitalism incompetent confident shouty bullshitter cloud?
What if we summarize all the information in the world into a few hundred volumes of human knowledge, then summarize those into a 10,000 pages book, then that into a 10 long form essays, then those into a 100,000 chars blog post, then that into a pamphlet and finally we summarize one more time into a single tweet.
It would say something like, "This text attempts to summarize the entirety of human knowledge".
Still, IMO summarizing videos is useful. Even if the summary is not accurate or a 1:1 representation of the content, you can mostly get the gist of what is being said without being baited into watching advertisements.
Although, this site doesn't seem to do a great job at summaries. Kagi's universal summarizer has much better results, https://kagi.com/summarizer/index.html . However, it requires transcripts to be available for videos.
I think a lot of people are sort of missing the benefit of something like this.
How do you read a book effectively? You skim the table of contents. You skim the contents of each chapter and mark interesting paragraphs. Then you go through the book another 1-2 times, each time getting deeper into the text and cross-referencing information between different parts of the book.
What tools like this will do is allow us to apply this same workflow to videos, which can greatly enhance our understanding of videos we're interested in and help us contextualise it with the rest of our knowledge.
I've already been doing this and it's helped me expand my knowledge and understanding in ways that wouldn't have been possible without an unreasonable investment of time and effort.
Tried asking Claude to do that, ended up with something pretty beautiful:
Everything is made of atoms & energy, life evolves, math describes reality, knowledge builds on itself, humans need each other & Earth to survive – test ideas, learn from mistakes, be kind, stay curious.
This reminds me of the famous Library of Babel story, where the entire corpus of a language is imagined to live in a library. Like, every permutation of the characters of an alphabet for pages of a certain number of characters in books of a certain number of pages.
The reducto ab asurdum of this library is an alphabet of 0 and 1, a page size of 2 characters and a page count per book of 2.
I know you’re making a joke, but more seriously I think most yt videos have atrocious signal/noise ratios so information compression is likely very useful. Less so for many academic papers (although they have some pretty awful filler sometimes).
I was on YouTube a few weeks ago and saw a 20 minute video with a title that looked interesting. Under it was an AI summary that saved me 20 minutes, and had me skip the video completely. I wish that was under every video.
This week I got a notification about the AI added to YouTube to allow users to ask questions about a video. I haven’t had a chance to use it yet, but I can see that also being useful to get the main points from a long video. Up until now, I mainly use the popularity indicator on the progress bar. Since I watch most videos on my TV, it’s harder to use the AI, as I would need to pull out my phone, open the same video, and ask… that’s a bad workflow.
I do find it a little ridiculous that we need AI to summarize long videos full of fluff, when the only reason they are full of that fluff in the first place is YouTube’s own monetization policies which pushed the average video from 2-4 minutes to 10 minutes.
This is exactly the problem. There are so many 20 minute videos that should have been 2 minutes.
In a way, it's much easier to make the 20 minute video. Just hit record, rant an rave, stop recording and publish.
There are indeed justified long videos stuffed full with knowledge, insight and witty comments to make it fun.
Then there are "slow" videos but magical. Paul Sellers has a 30 min video on how to make mortise and tenons joint with hand tools. Just you and him in real time. You get a (recorded) private lesson from a master craftsman. It's magic. Every minute of it is knowledge transfer.
Some people inflate their video durations intentionally, but I think the majority of people truly think they're using the time wisely. Have you ever tried making a quick travel vlog of a vacation and ended up with a 15 minute short film? That B-roll at the airport was definitely critical to include!
I think the reality is that there are a lot of amateur video creators. Elevating the few talented creators through social engagement metrics isn't perfect, but I think it works well enough. Or at least more so than what these anodyne summarizations would give us.
Hi HN! I'm the author of this service. Thank you for your support.
There may have been some temporary downtime due to residential proxy running out of bandwidth. I have purchased additional bandwidth. (I run this service for free.)
There also may be some errors with particular videos because they are not accessible in certain regions. For now all requests to YouTube originate from United States, but open to change in the future to some kind of round-robin or fallback system.
I know it's not perfect. I developed the tool originally for my own use. It's open source and I'm open to any patches or pull requests.
Hey this is really cool, I literally had the same idea about a month ago but ultimately decided to not pursue it. Glad someone else did.
A few quick Qs -
1) Do you use the available auto-generated transcripts from youtube? Or do you do any audio parsing? I know transcripts aren't always available.
2) Do you have any plans to monetize in some way, do you think it would be possible? It's definitely a neat product but a tad generic and replicable, so I'm curious.
1.) We do no TTS of our own. We either use the original transcripts uploaded manually by the YouTuber or we use the auto-generated ones supplied by Google.
2.) No, I plan to keep it free as the operational costs are relatively minimal.
Tried it on 3 random videos I watched, and the results were... mostly good, albeit mixed.
On the one hand, it got my video about a Mario & Luigi: Brothership glitch dead right, immediately listing where you'd need to die to get an item early and what you'd get out of it.
It also did an okay job summarising a Zelda dungeon analysis video by someone I'm subscribed to, with some info on why that dungeon was well-designed that clearly came from the video.
Unfortunately, it did a poor job at summarising a video about plagiarism in the YouTube speedrunning essay space, associating the problem with smaller creators rather than the person the video was about and leaving out far too many details to be useful.
This seems to confirm my assumptions about how an AI summariser would work in general; if the original media is a straightforward piece about one easily understandable topic, then it'll do fine and work about as well as a human would. If it's a longer piece with multiple points backed by various examples, then it'll struggle to summarise it in a way that makes sense.
It's not that, it's "too long with low information density so ingest it more efficiently." That way we can spend our time on things that are more productive or enjoyable.
Idea hackneyed since LLM's appeared. Cool that implementation is open-source, though yt automatic captions are sometimes completely off-point, especially when people talking in the video don't have a diction of a tv show host.
I wonder if an idea found it's niche after all? Do you guys summarise you videos to short texts and that leaves you satisfied? For me video is video, I can relax, sit and watch/listen to it. With text it is different, it is a mental exercise to read and process it, so turning video into text feels like an essential downgrade. I would prefer watching at 1.5/2x speed instead of text summary if I want to finish it faster.
> For me video is video, I can relax, sit and watch/listen to it. With text it is different, it is a mental exercise to read and process it, so turning video into text feels like an essential downgrade.
Exact opposite for me. Reading goes at my pace, in silence. Video is much more invasive, so I avoid it except for the highest quality stuff.
Interesting, I was always thinking that audio/visual information is naturally much easier to consume. For instance: I can watch a video and count to 10 in my head at the same time – I will still get everything what was in that video – but with text it's a much harder task since the head is fully occupied with "narrating" the text what I'm reading, so reading in the end turns into podcast inside the head before actually get consumed.
I tried with a few Thunderf00t videos. He has good analysis, but the guys repeats everything too many times. Many are about silly impossible "inventions" / scams, but this is an experiment that he published in Nature Chemistry:
https://tldw.tube/?v=LmlAYnFF_s8 "High speed camera reveals why sodium explodes! --> "Coulombic explosion. (Sodium and water reaction)"
This would reduce some videos to just a transcription, which would be the ideal outcome, I think. The less of my limited time on Earth I waste watching some dumbshit reading a script at a camera, the better. Summarizing the transcript further like this site does might be occasionally useful, of course.
Pipeline such content, but use weighted random videos, with low weights for types of content with clickbait headings & perhaps blacklist for words like meme or lol in transcript to cut out things with stock footage. I am not sure of exact best way to remove it, actually, other than "using the transcript for some computational technique of probabilitistic stock footage prediction" which I bet would be most effective.
What I hate is when I'm trying to find such irreplaceable information and instead my search results are full of vapid stock footage and hubristic talking heads overconfidently reading a script out loud as they gaze at a video camera. It's like AI slop without the creativity.
What's stopping you from doing it?
1. The universe is vast, mostly empty, and runs on fundamental laws that we barely understand but exploit well.
2. Life is a self-replicating, entropy-defying phenomenon that emerged through chemistry, evolved through selection, and adapts through intelligence.
3. Humans are social primates who dominate the planet through cooperation, tool-making, storytelling, and an insatiable drive for meaning.
4. Societies form through shared beliefs, laws, and trade, but oscillate between progress and collapse due to power, greed, and ignorance.
5. Technology is humanity’s amplifier, accelerating knowledge, comfort, and destruction in equal measure, with unintended consequences at every turn.
6. Economies are trust-based systems of resource distribution, prone to cycles of boom, bust, innovation, and inequality.
7. Morality is a human construct, evolving with culture, often conflicting between collective well-being and individual freedom.
8. Knowledge is a fractal—deeper the dive, more there is to know—yet most wisdom is rediscovery of old truths in new contexts.
9. The future is uncertain but shaped by the tension between human ingenuity and our own worst tendencies.
10. The meaning of life? Whatever gets you up in the morning and lets you sleep at night.
entropy-exploiting phenomenon
is a much better description as life does not defy any fundamental laws.
* Rule of law is a good idea
* Dictatorship is a bad idea
* Allowing Germany to occpy Sudetenland in the Münich appeasement 1938 was a bad idea. [1]
* ...
[1] https://snyder.substack.com/p/appeasement-at-munich?triedRed...
But that said! If this service works I think I could use it. I can handle long articles, but have no time to watch YouTube clips.
It would say something like, "This text attempts to summarize the entirety of human knowledge".
Still, IMO summarizing videos is useful. Even if the summary is not accurate or a 1:1 representation of the content, you can mostly get the gist of what is being said without being baited into watching advertisements.
Although, this site doesn't seem to do a great job at summaries. Kagi's universal summarizer has much better results, https://kagi.com/summarizer/index.html . However, it requires transcripts to be available for videos.
How do you read a book effectively? You skim the table of contents. You skim the contents of each chapter and mark interesting paragraphs. Then you go through the book another 1-2 times, each time getting deeper into the text and cross-referencing information between different parts of the book.
What tools like this will do is allow us to apply this same workflow to videos, which can greatly enhance our understanding of videos we're interested in and help us contextualise it with the rest of our knowledge.
I've already been doing this and it's helped me expand my knowledge and understanding in ways that wouldn't have been possible without an unreasonable investment of time and effort.
Everything is made of atoms & energy, life evolves, math describes reality, knowledge builds on itself, humans need each other & Earth to survive – test ideas, learn from mistakes, be kind, stay curious.
Our understanding of reality is fundamentally shaped by the power of stories and narratives.
Humanity constantly seeks to impose order and structure on the world through systems and frameworks.
The inherent human drive to create and innovate defines our art, technology, and design.
We are bound by the complex interplay of connection, conflict, and cooperation in our relationships.
Time's relentless flow drives change, progress, and the unfolding narrative of history.
The vastness of the unknown perpetually challenges and defines the limits of human knowledge.
The search for purpose, values, and meaning is a central and ongoing human endeavor.
Abstract concepts and models are powerful tools for understanding and navigating reality.
All living things are interconnected within a complex web of life and ecological relationships.
The future of humanity presents both boundless potential and significant challenges to overcome.
The reducto ab asurdum of this library is an alphabet of 0 and 1, a page size of 2 characters and a page count per book of 2.
This week I got a notification about the AI added to YouTube to allow users to ask questions about a video. I haven’t had a chance to use it yet, but I can see that also being useful to get the main points from a long video. Up until now, I mainly use the popularity indicator on the progress bar. Since I watch most videos on my TV, it’s harder to use the AI, as I would need to pull out my phone, open the same video, and ask… that’s a bad workflow.
I do find it a little ridiculous that we need AI to summarize long videos full of fluff, when the only reason they are full of that fluff in the first place is YouTube’s own monetization policies which pushed the average video from 2-4 minutes to 10 minutes.
In a way, it's much easier to make the 20 minute video. Just hit record, rant an rave, stop recording and publish.
There are indeed justified long videos stuffed full with knowledge, insight and witty comments to make it fun.
Then there are "slow" videos but magical. Paul Sellers has a 30 min video on how to make mortise and tenons joint with hand tools. Just you and him in real time. You get a (recorded) private lesson from a master craftsman. It's magic. Every minute of it is knowledge transfer.
https://m.youtube.com/watch?v=aBodzmUGtdw
I think the reality is that there are a lot of amateur video creators. Elevating the few talented creators through social engagement metrics isn't perfect, but I think it works well enough. Or at least more so than what these anodyne summarizations would give us.
Deleted Comment
There may have been some temporary downtime due to residential proxy running out of bandwidth. I have purchased additional bandwidth. (I run this service for free.)
There also may be some errors with particular videos because they are not accessible in certain regions. For now all requests to YouTube originate from United States, but open to change in the future to some kind of round-robin or fallback system.
I know it's not perfect. I developed the tool originally for my own use. It's open source and I'm open to any patches or pull requests.
Enjoy!
A few quick Qs -
1) Do you use the available auto-generated transcripts from youtube? Or do you do any audio parsing? I know transcripts aren't always available.
2) Do you have any plans to monetize in some way, do you think it would be possible? It's definitely a neat product but a tad generic and replicable, so I'm curious.
2.) No, I plan to keep it free as the operational costs are relatively minimal.
Deleted Comment
> OPENAI_API_KEY
Choose one.
> OPENAI_API_HOST
On the one hand, it got my video about a Mario & Luigi: Brothership glitch dead right, immediately listing where you'd need to die to get an item early and what you'd get out of it.
It also did an okay job summarising a Zelda dungeon analysis video by someone I'm subscribed to, with some info on why that dungeon was well-designed that clearly came from the video.
Unfortunately, it did a poor job at summarising a video about plagiarism in the YouTube speedrunning essay space, associating the problem with smaller creators rather than the person the video was about and leaving out far too many details to be useful.
This seems to confirm my assumptions about how an AI summariser would work in general; if the original media is a straightforward piece about one easily understandable topic, then it'll do fine and work about as well as a human would. If it's a longer piece with multiple points backed by various examples, then it'll struggle to summarise it in a way that makes sense.
I've found the same problem with humans too, so it's not like an improvement over humans.
I agree that this mentality of works being “too long so don’t ingest it”, is not a healthy way to go about life and thinking in general.
Deleted Comment
I wonder if an idea found it's niche after all? Do you guys summarise you videos to short texts and that leaves you satisfied? For me video is video, I can relax, sit and watch/listen to it. With text it is different, it is a mental exercise to read and process it, so turning video into text feels like an essential downgrade. I would prefer watching at 1.5/2x speed instead of text summary if I want to finish it faster.
But most of the time I don't want to watch a video, I just want to get information. A text summary then would be strictly superior.
Dead Comment
It’s an idea that’s been around long before LLMs. Check out Yahoo under Marissa Mayer acquiring a news summary app. Though it is still hackneyed.
https://finance.yahoo.com/news/yahoo-acquires-summly-app-150...
Exact opposite for me. Reading goes at my pace, in silence. Video is much more invasive, so I avoid it except for the highest quality stuff.
https://tldw.tube/?v=LmlAYnFF_s8 "High speed camera reveals why sodium explodes! --> "Coulombic explosion. (Sodium and water reaction)"
Deleted Comment