It would be really neat if after pulling the captions, an LLM was used to reword the content into an idiomatic "blogpost" (since speech is typically different than writing). Using LLMs, we could even choose the level of summarization and the output tone!
As someone who strongly prefers reading to watching instructional videos, I'd pay for this service :)
I made something sorta like that specific to recipe videos. Basically converts recipe into an idiomatic format (inlines ingredients, detects and renders timers) and links each step in the recipe to its timestamp in the video for easy indexing while you're busy in the kitchen. (I spent too much time trying to scrub to that one spot where "how it's supposed to look" is shown while busy making it look that way)
Cool website. Much better than the SEO spam I came across earlier this week when I did a websearch for "pear qwerty horse" after seeing it in the tags under a binging with babish video.
Love the timers and jumping to sections of the video. Though, the second video I tried viewing didn't have linked steps.
The hyperlink "Food Wishes" at the top of the page is broken. It'd be nice too if there was a way on that page to request a new recipe (via video ID or whatever).
Except for the complaining GPT will do, and some censorship based on the whims of its' programming group. No thanks; I'll stick to scripts, where the video dictates the content.
It seems that many of "my script can do [something] with [information in a different form]" can be superseded by LLVMs already or in the near future and the quality is way better than what the scripts are capable of.
I just wonder what the price of this is. I can run most of these scripts on an old laptop. But for the LLVM I need a pricy an beefy computer or (even worse) a paid subscription to a big tech's service.
Step right up, folks! Gather around and feast your eyes on the magnificent creatures before us – the elephants! Now, what makes these majestic beings so fascinating, you ask? Well, let me tell you – it's all about their incredibly, unbelievably long... um, trunks! Yes, you heard that right. These gentle giants sport trunks that seem to stretch on for ages, and let me tell you, it's nothing short of impressive. So, as we stand here in awe of these marvelous creatures, remember, it's the little (or should I say, not-so-little) things like their remarkable trunks that make them truly stand out. And that, my friends, wraps up the lowdown on our pachyderm pals – fascinating trunks and all!
Is a scripted video significantly different to a written blogpost? It might be a symptom of the type of YT videos I watch, but most of them seem to be essay-style "intro/thesis/points 1, 2, 3/counterpoint/conclusion", and the only thing that hints at speech is the umming-and-arring of the presenter.
"Former chief-of-staff, Mark Meadows asking a federal judge to put his surrender on hold, while deciding whether to move his trial to federal court, and former DOJ official Jeffrey Clark, seeking the same, making a pretty remarkable argument in his filing."
That's someone doing a sort of play-by-play explanation of what viewers are seeing in a video. Compare to a purposefully written story:
"A federal judge in Georgia rejected a request by former White House chief of staff Mark Meadows to postpone his surrender and arrest in Fulton County, Georgia, as an attempt to move the case to federal court is litigated, according to a court order issued Wednesday."
It seems like there could be some value in an LLM that would rewrite the first into something more like the second.
can I self-promote here? we are not doing exactly the same but we are transcribing videos ourselves (no auto YT captions) If you want to read a high quality transcript & summarize videos, you can do that at https://alphy.app
On the YouTube website, if you click the "・・・" button next to share/clip/save, there is a "Show transcript" option and you can use your browsers in-page search to search in it
There’s a website designed for language learning from watching YouTube captions with inline translations and dictionary lookup. It also has support for searching videos by subtitle content. But it has a limited index and isn’t free for all features. I thought its source was available but I can’t find it now…
https://languageplayer.io/
Searching transcripts is really something YouTube itself should be doing as part of just regular search (and fed into Google search too). I have a feeling the regular search already does it to some extent, as the system presumably is tagging videos based on its caption extraction. However, that would only apply to somewhat broad topics, not specific combinations of words and the matching text is not surfaced in the UI.
I wanted, but when I press ENTER, it asks to register... I click cancel and notice PRICING page. I click on it and again it asks for login. That is NOT how one onboards users.
The death of creative thinking is management and processes. Assembly line work doesnt permit creativity. Google is heading the same route as IBM and other once great corporations made by creative people. When the discussion shifts from ideas to processes and triviliaities it's game over. Long live whatever replaces google. It's over.
There is a term in business for this I can’t remember. Where basically your money comes from business process X and so you protect X at all costs. Which makes it very difficult to innovate away from supporting X. It’s almost impossible for a company to pivot to making their money from Y. You see this classically in things like Sears, Woolworths etc. unable to keep up with the times.
The solution to this seems to be to basically start a new company in your company completely separate from your core company more or less. Meta seems to have initially done this very well with their pivot from Facebook to Instagram. Perhaps less great in their metaverse pivot but we’ll see. Google set themselves up for success at this with alphabet but I don’t think we’ve really seen them be able to have something they feel like they can really pivot into yet so business X continues to be the focus.
Hidden away yes, but it’s amazing. It creates a totally new way of consuming informational videos and everyone here should try it. You can scroll through the transcription vertically and tap on any bit and the video instantly jumps to that point - basically the transcription is the new seek bar. And about 1000x better at the job. No more skipping back to somewhere roughly where you stopped paying attention and just letting it play for a bit until you get back into the thread - now you can jump around with surgical precision. It’s like how you can easily skim back and forth in a text article, but with a video. It’s a total game changer and I find it bizarre that it’s so hidden away so most people won’t find it. It also works particularly well when casting from your phone to a TV, using the phone as the navigator. Oh and the transcriptions are about 99% accurate, which is good enough for me.
That’s really interesting - on my own website I’ve extracted the captions to my videos and I was thinking of wiring it up so you could navigate the videos. I may actually get round to doing this now.
My friend and I made something similar a few years ago as a college hackathon project - it features automatic scene transition detection and a rough editor before publishing the final results.
(The demo site is down, but you can clone the repo and run the code locally)
This is actually potentially helpful for me as a lawyer for generating a paper record, and something I was talking about (and meaning to write up a script for) the other day. Sometimes I want to use a Youtube video in a court filing (for example, as prior art in a patent case), and submitting a rough paper record of the video like this is helpful along with the actual video.
This is interesting. I think the scenario it should be used is for non sublte messages, such as sarcasm. I gave it a try with KRAZAM's video and the answer is hilarious when you consider the video intended exactly the opposite.
> In "The Hustle," the narrator shares their jam-packed daily routine that exemplifies their dedication to productivity. From an early morning workout to late-night preparations for the next day, their schedule is filled with various activities. They efficiently manage their time, incorporating work, social media updates, and even a well-deserved happy hour. The narrator's commitment to self-improvement is also evident through their habit of reading before bed and tweeting inspiring quotes. Overall, this video highlights the narrator's hustle and structured approach to maximizing their day.
I just had to write a research report for a funding agency with many subprojects and could not get one input so I took a short video from a pitch presentation and converted it first to captions using spech2text and then to a research summary and it was really impressive.
As someone who strongly prefers reading to watching instructional videos, I'd pay for this service :)
See example: https://rexipie.com/watch?v=JiJXdoTjw8M
Just s/youtube/rexipie/ in any recipe video URL.
(full disclosure the step/transcript linking is paid-only as it requires a GPT-4 call, everything else is available to demo on free tier)
I've gotta say, your website might be easier to use during cooking, since it provides the information in-line (especially serving sizes etc.)!
Love the timers and jumping to sections of the video. Though, the second video I tried viewing didn't have linked steps.
Not exactly a blog post format, but it must've saved me a hundred hours, no joke!
https://chat.openai.com/share/229e3ac8-3924-48e4-abd5-35bcb2...
I just wonder what the price of this is. I can run most of these scripts on an old laptop. But for the LLVM I need a pricy an beefy computer or (even worse) a paid subscription to a big tech's service.
Step right up, folks! Gather around and feast your eyes on the magnificent creatures before us – the elephants! Now, what makes these majestic beings so fascinating, you ask? Well, let me tell you – it's all about their incredibly, unbelievably long... um, trunks! Yes, you heard that right. These gentle giants sport trunks that seem to stretch on for ages, and let me tell you, it's nothing short of impressive. So, as we stand here in awe of these marvelous creatures, remember, it's the little (or should I say, not-so-little) things like their remarkable trunks that make them truly stand out. And that, my friends, wraps up the lowdown on our pachyderm pals – fascinating trunks and all!
[1] https://github.com/the-crypt-keeper/tldw
Is a scripted video significantly different to a written blogpost? It might be a symptom of the type of YT videos I watch, but most of them seem to be essay-style "intro/thesis/points 1, 2, 3/counterpoint/conclusion", and the only thing that hints at speech is the umming-and-arring of the presenter.
"Former chief-of-staff, Mark Meadows asking a federal judge to put his surrender on hold, while deciding whether to move his trial to federal court, and former DOJ official Jeffrey Clark, seeking the same, making a pretty remarkable argument in his filing."
That's someone doing a sort of play-by-play explanation of what viewers are seeing in a video. Compare to a purposefully written story:
"A federal judge in Georgia rejected a request by former White House chief of staff Mark Meadows to postpone his surrender and arrest in Fulton County, Georgia, as an attempt to move the case to federal court is litigated, according to a court order issued Wednesday."
It seems like there could be some value in an LLM that would rewrite the first into something more like the second.
Chunking on the example webpage[1] is poor.
[1] https://obra.github.io/Youtube2Webpage/example/
Ok, this may be an answer... but is there an online service that given YT URL would spit captions out for me? Or maybe a browser extension?
Maybe even youtube has a hidden link somwhere where I could see all the text?
This submission triggered me for reasearch and found this gem: https://filmot.com/
The guy who created it: https://www.reddit.com/r/linguistics/comments/oo8xbd/search_...
https://news.ycombinator.com/item?id=36009774
I encountered the "need" for this functionality a few years ago to find the video of a YouTuber specifically saying something.
Back then I used a website that's actually specifically dedicated to the YouTuber (Northernlion): https://babypig.men/nlss-search?q=Basmati
I'm surprised the website is still live!
They’ve had transcriptions for ages - but it’s hidden away and practically useless.
The things they could do with a bit of creative thinking…
The death of creative thinking is management and processes. Assembly line work doesnt permit creativity. Google is heading the same route as IBM and other once great corporations made by creative people. When the discussion shifts from ideas to processes and triviliaities it's game over. Long live whatever replaces google. It's over.
The solution to this seems to be to basically start a new company in your company completely separate from your core company more or less. Meta seems to have initially done this very well with their pivot from Facebook to Instagram. Perhaps less great in their metaverse pivot but we’ll see. Google set themselves up for success at this with alphabet but I don’t think we’ve really seen them be able to have something they feel like they can really pivot into yet so business X continues to be the focus.
(The demo site is down, but you can clone the repo and run the code locally)
https://gitlab.com/chocological00/bitcamp-2021
https://www.summarize.tech/
https://www.summarize.tech/www.youtube.com/watch?v=_o7qjN3KF...
> In "The Hustle," the narrator shares their jam-packed daily routine that exemplifies their dedication to productivity. From an early morning workout to late-night preparations for the next day, their schedule is filled with various activities. They efficiently manage their time, incorporating work, social media updates, and even a well-deserved happy hour. The narrator's commitment to self-improvement is also evident through their habit of reading before bed and tweeting inspiring quotes. Overall, this video highlights the narrator's hustle and structured approach to maximizing their day.