Terminator:
"nothing clean right" - no results
"fuck you, asshole" - one result for Terminator, but the phrase occurs twice in the film.
Predator:
"If it bleeds, we can kill it" - Predator came up, and a few others (interesting).
Total Recall:
"Sue me, dickhead" - it got that one.
Commando:
"You're a funny guy Sully, I like you. That's why I'm going to kill you last." - no results.
I've often wondered how a database such as this can be used in other fields of programming like say, a text-to-speech engine[1] where using subtitles the algorithm can guess the context of the conversation to produce better results.
I actually worked on this exact problem as an intern job at our university. We used a huge corpus of communication (for example, we had access to all the emails every sent internally at Enron).
We used this as the basis to train a speech-to-text engine by automatically correcting likely-wrong interpretations. "I go loo school" would be corrected to "I go to school", for example. It worked remarkably well.
The basis of all these subtitles can be used, but there are far bigger (and better?) collections of data to be used to train these machine learning engines.
Very neat. I queried for a line I remembered as "a story Englishmen tell when they're down in the mouth", and it corrected this "Englishmen tell it [etc.]", identifying the movie as Beat the Devil.
I typed in "finality" as the search term. There's a scene in which this word is used where Nick Nolte gives a speech to the Hulk. It only came up with results that had "finaLLy" in the results(?)
http://www.quodb.com/#search/i'll%20be%20back
EDIT: reformatted.
[1] http://www.slate.com/articles/technology/technology/2009/03/...
We used this as the basis to train a speech-to-text engine by automatically correcting likely-wrong interpretations. "I go loo school" would be corrected to "I go to school", for example. It worked remarkably well.
The basis of all these subtitles can be used, but there are far bigger (and better?) collections of data to be used to train these machine learning engines.
http://www.quodb.com/#search/you%20look%20like%20shit
531 titles. Wow.
http://www.quodb.com/#search/we've%20got%20company
Incoming is the moral equivalent (and is much more popular), but is less impressive since it's only one word.
http://www.quodb.com/#search/incoming!
http://www.quodb.com/#search/i%20want%20you%20to%20hit%20me%...