Readit News logoReadit News
ThisNameIsTaken · 4 years ago
For those interested in playing around with something similar: it looks like a sort of large scale implementation of Sam Lavigne's Videogrep[1].

[1]: http://antiboredom.github.io/videogrep/

efferifick · 4 years ago
Came here to point videogrep out. It is a great tool!

I don't remember if this is out of the box for videogrep, but it is possible to generate "fine-grained" subtitle information using speech to text and some massaging. In other words, subtitles that match a specific word.

I worked on something similar to this using videogrep and "fine-grained" subtitle information using Seinfeld clips. My short experiment took as input a string and looked for the longest matching subtitle and created a clip out of longest matching subtitles of characters saying the contents of the input string. I couldn't figure out how to get diarization to work reliably back then, if anyone knows, please let me know!

ortusdux · 4 years ago
One of the Daily Show's greatest strengths was their ability to quickly compile relevant clips. I always wondered if they had a service like this. I assumed that they were scraping CC text themselves. Maybe they just had a phenomenal research dept.
realityking · 4 years ago
The Daily Show, and I believe most similar shows, use a product called Snapstream: https://www.snapstream.com/entertainment

Crucially this is not a cloud product, I suspect for copyright reasons.

josefresco · 4 years ago
I often wonder this too about a radio show I listen to with several decades of archives. I wonder what the interface looks like, and how hard it really is to assemble clips. Having a database of thousands of hours of video and audio to search sounds cool!
epaga · 4 years ago
This is incredibly impressive, especially the ability to immediately download the snippet as an mp4.

How has this not been immediately shut down due to copyright?

pkulak · 4 years ago
This looks like bog-standard fair use to me.
NKosmatos · 4 years ago
I had the same question...perhaps it has to do with the length of the videoclip.
geofree · 4 years ago
Nice implementation, I like how it continuously plays. Similar concept to my company https://getyarn.io, where you can search into movies and tv shows :)
treejanitor · 4 years ago
I wonder how many clips they have compared to the 10 million+ in Yarn...

https://www.getyarn.io/yarn-clip/2d30e3f1-6bc1-4aa0-bdc4-6d5...

NKosmatos · 4 years ago
Allows you to search for a specific phrase or word and you get video results from movies (with timestamp), where this phrase/word is being used. Has around 2M phrases and the first 5 results are free, afterwards you need to become a sponsor.
WalterGR · 4 years ago
Are you affiliated? Others and I have questions about implementation...
NKosmatos · 4 years ago
Nope, no affiliation or connection at all. Just me being the average HN user posting something interesting I found :-) I have questions myself, especially how they haven't shut them down due to copyright issues.
etcet · 4 years ago
Ooo this is so cool [0]. I need to know the implementation details. How are the clips stored? Are they dynamically generated with ffmpeg or something or is every line of dialogue clipped out ready to serve? How many films and what are the storage costs?

[0] https://www.playphrase.me/#/search?q=this+is+so+cool

WalterGR · 4 years ago
And how did they even get their source material?
jonheller · 4 years ago
This is one of the most impressive projects I've seen on HN in a long time. Nice work!
pkulak · 4 years ago
https://www.playphrase.me/#/search?q=you+son+of+a+bitch

I wonder how long that would run for if I was a sponsor.