Came here to point videogrep out. It is a great tool!
I don't remember if this is out of the box for videogrep, but it is possible to generate "fine-grained" subtitle information using speech to text and some massaging. In other words, subtitles that match a specific word.
I worked on something similar to this using videogrep and "fine-grained" subtitle information using Seinfeld clips. My short experiment took as input a string and looked for the longest matching subtitle and created a clip out of longest matching subtitles of characters saying the contents of the input string. I couldn't figure out how to get diarization to work reliably back then, if anyone knows, please let me know!
One of the Daily Show's greatest strengths was their ability to quickly compile relevant clips. I always wondered if they had a service like this. I assumed that they were scraping CC text themselves. Maybe they just had a phenomenal research dept.
I often wonder this too about a radio show I listen to with several decades of archives. I wonder what the interface looks like, and how hard it really is to assemble clips. Having a database of thousands of hours of video and audio to search sounds cool!
Nice implementation, I like how it continuously plays. Similar concept to my company https://getyarn.io, where you can search into movies and tv shows :)
Allows you to search for a specific phrase or word and you get video results from movies (with timestamp), where this phrase/word is being used. Has around 2M phrases and the first 5 results are free, afterwards you need to become a sponsor.
Nope, no affiliation or connection at all. Just me being the average HN user posting something interesting I found :-) I have questions myself, especially how they haven't shut them down due to copyright issues.
Ooo this is so cool [0]. I need to know the implementation details. How are the clips stored? Are they dynamically generated with ffmpeg or something or is every line of dialogue clipped out ready to serve? How many films and what are the storage costs?
[1]: http://antiboredom.github.io/videogrep/
I don't remember if this is out of the box for videogrep, but it is possible to generate "fine-grained" subtitle information using speech to text and some massaging. In other words, subtitles that match a specific word.
I worked on something similar to this using videogrep and "fine-grained" subtitle information using Seinfeld clips. My short experiment took as input a string and looked for the longest matching subtitle and created a clip out of longest matching subtitles of characters saying the contents of the input string. I couldn't figure out how to get diarization to work reliably back then, if anyone knows, please let me know!
Crucially this is not a cloud product, I suspect for copyright reasons.
How has this not been immediately shut down due to copyright?
https://www.getyarn.io/yarn-clip/2d30e3f1-6bc1-4aa0-bdc4-6d5...
[0] https://www.playphrase.me/#/search?q=this+is+so+cool
I wonder how long that would run for if I was a sponsor.