Show HN: YouTube Full Text Search – Search all of a channel from the commandline

I love that a third party is stepping up here, but it's incomprehensible to me that Google doesn't do this themselves. They're a search company, and they own YouTube. The YouTube data — including the subtitle files — is already sitting there on their servers; they don't have to scrape it, they just have to index it. What are they even doing?

Fun thing to try: do a Google search with "site:youtube.com" in it. You get basically nothing, no matter what keywords you use. It seems that Google actually entirely ignores/excludes YouTube from their regular HTML indexing, and instead only relies on the YouTube backend to actively push content into (a special, separate part of) the search index. Which gets you "results from YouTube" and "video search" — but doesn't get you the ability to search youtube videos pages qua web pages. (Consider: you can find a post in a Reddit comment thread on Google. Can you find a post in a YouTube video comments section on Google?)

Heck, when I first heard about YouTube's autogenerated captions, my first thought was "oh, so this is Google building deep indexing of video through audio transcription, because they can't trust externally-provided subtitles, right?" But it's been 10 years, and I couldn't have been more wrong.

hysan · 2 years ago

I would posit that google has determined that it’s more profitable to keep viewers on YouTube via controlling the viewing experience vs whatever additional ad revenue they’d gain by making videos more easily indexable. I’m basing this hypothesis on YouTube’s consistent trend of removing features related to controlling your own viewing experience. For example, removing subscription collections.

HardlyCurious · 2 years ago

Actually, letting people search video text would enable less watching and that is probably the reason they aren't interested in it.

derefr · 2 years ago

How would making videos more indexible move people off of YouTube? Once you're there, you'd stay there, with all the current recommender algorithms still in effect; all the external indexability would do is give Google (and Bing, and everyone else) more reasons to lead you into that labyrinth.

roncesvalles · 2 years ago

Whenever I notice an "obvious" potential feature that could improve user experience in a Google product (there are MANY), I automatically assume it's because of one of two things:

a. Doing it won't get anybody a promo.

b. They've considered it and determined that it will lose revenue to a degree that is not justified by the usability improvement.

This is one of the pitfalls of having an ad-based product instead of a fee-based product. User experience is just no longer the top priority.

reaperducer · 2 years ago

This is exactly it. It's the same business logic that is employed in the regular Google Search.

Google doesn't make money from giving you correct search results. It makes money by keeping you searching for the results you want.

hackernewds · 2 years ago

I despise YouTube purely for inflicting mandatory Shorts on the user.

j33zusjuice · 2 years ago

I’m willing to bet that the features you’ve seen removed are taken out because their utilization is low, and there’s some associated maintenance cost. I’d also like to know what other features were removed. I had never heard of subscription collections, for example, and a search returned [this video](https://youtu.be/qGSHPhR8k8g) (I wish markdown worked on here) that says collections were a test feature (I’m guessing it didn’t do well enough to make it to prod).

StrangeATractor · 2 years ago

Have you tried searching through your call history on a Google phone? It's awful. You'd think they'd have a solid search built in but it's nearly useless. Especially considering you're usually searching for that number you only dialed once and isn't in your contacts and is for some reason excluded from your recents, so you go into your call history (strangely hidden behind a menu) and...there's no search function. WTF? You'd think you could filter by area code, date or a general time which the call took place, Google is a search company after all, right? Or am I subtly being nudged to use some of their more profitable products to try and find it again?

FOSS dialer recommendations are welcome btw.

zo1 · 2 years ago

As much as I like to generally, I wouldn't blame Google for this, rather I blame the entire field of "UX".

bob_theslob646 · 2 years ago

I have this same exact problem. I thought I was crazy. The call history on Android/Pixel is absolutely terrible.

whitemary · 2 years ago

It works as good as Windows search, no better

gniv · 2 years ago

Beyond conspiracy theories it's interesting to speculate why Google is not providing native search-in-subtitles and search-in-comments. The easiest explanation is that they don't trust the quality. They probably tried it and reached the conclusion that it doesn't improve search in any meaningful direction.

I know from experience that search in user reviews is very hard. Unless you really understand the review (which was tried via sentiment analysis) you cannot rank results well. But now with the new LLM models I think it would work better.

ramraj07 · 2 years ago

I’m pretty sure google uses the captions in search. As long as you search from within YouTube. I regularly search for keywords and find hourlong videos where it’s mentioned somewhere in the middle and nowhere in the description.

davidy123 · 2 years ago

I think Google is now thoroughly infected with Big-Company-itis. A couple departments would like to and know how to comprehensively use AI across many services, which the consumers would love (though some would be confused by it). But legal, marketing, and some guy in a department called "Annex B?" are preventing it. So then the people in those departments get bored and go somewhere else and their perspectives and skills are lost.

yosito · 2 years ago

At one point Google claimed their mission was to "organize the world's information and make it accessible". That's proven to be just as much of a joke as "Do no evil".

MildlySerious · 2 years ago

For all I care, they lost that claim with [Deleted video] - and by that I don't mean that they remove videos, for which there are countless valid reasons, but that there is no way to see what it was you liked, and that the lists you curate just deteriorate. There are many other, maybe more valid reasons they fail this mission, but that's the one that has plaguing me the longest.

zip1 · 2 years ago

It's quite intriguing that Google doesn't offer full-text search capabilities for YouTube, considering its position as a leading search company. However, I think there are several reasons for this, some of which may not be immediately apparent.

Firstly, if Google did offer this feature, it would likely be targeted by Search Engine Optimization (SEO) exploits. In essence, any time a new search parameter is introduced, there's a risk of it being manipulated to prioritize certain content—especially by those interested in gaming the system for increased visibility or monetary gain. If YouTube's search feature were to be plagued by such spamming, it could severely degrade the user experience and lead to Google having to strip it away. While not a guarantee, it's a probable outcome given the history of SEO misuse.

Secondly, YouTube's primary focus is on its recommendation algorithm rather than search functionality. With billions of videos hosted, the key goal is to keep users engaged by serving up content they're likely to enjoy, thereby increasing view times and ad revenue. The search feature, while useful, is not as integral to this objective. Further, offering full-text search could provide yet another avenue to manipulate the algorithm, which YouTube surely wants to avoid.

Finally, implementing and maintaining such a feature would require substantial resources. It would necessitate hiring teams of high-salaried employees to moderate and ensure fair use of the feature, adding considerable operational costs. Considering these factors, it seems that Google has made a strategic decision to avoid this feature for now.

That said, the fact that third-party solutions are emerging, such as the one shared here, shows that there's a demand for full-text search capabilities. It also underscores the potential that these solutions have when unencumbered by the constraints faced by a tech giant like Google. This provides a fascinating insight into the dynamic relationship between third-party developers and tech corporations and the way they can complement each other.

dingledork69 · 2 years ago

> With billions of videos hosted, the key goal is to keep users engaged by serving up content they're likely to enjoy, thereby increasing view times and ad revenue

Maybe for some users. I just use youtube to find a specific video I need (because people have stopped writing useful how-to's now that they can just make a 10 minute video covering about 1 minute worth of text), and a full text search would be so, so useful.

2020aj · 2 years ago

Regarding your second point... I think it's still important because recommendation algorithms work better when users can find content they enjoy outside of the recommended content. If they can't then the recommendations will become stale.

crazygringo · 2 years ago

Google already does this themselves. If you search for rare words (e.g. try "indubitably") it will absolutely pull up videos that have the word in the auto-generated transcript and nowhere else (not in descriptions, not in comments).

Also, using "site:youtube.com" on Google works perfectly for me. If I look up "site:youtube.com david letterman" it gives me the David Letterman channel, followed by a seemingly infinite number of Letterman clips. Precisely what I'd expect.

The only thing I can reproduce that you're complaining about is that Google (and YouTube) search don't seem to index YouTube user comments, in contrast to Reddit. But Google doesn't seem to index comments-attached-to-content anywhere on the internet -- not even comments on articles at mainstream publications like the New York Times. Which is probably more of a feature than a bug -- comments on both YouTube videos and news articles tend to be a lot of emotional reactions and repeated opinions which aren't worth searching at all. In contrast, many (not all) Reddit threads are often very informative and the "main content", so it makes sense Google indexes them.

So I don't really see anything to complain about here, from my perspective.

harshreality · 2 years ago

What I'd expect is that

"having a distribution that's both radially symmetric" site:youtube.com

would return 3b1b's "Why π is in the normal distribution" video, which has that in subtitles at 22:28.

Even without the site: term, all I get is an allreadable.com page that's scraped the subtitles for that video. Allreadable appears at first glance to be a site owned by someone in China and hosted on liquidweb.

dizhn · 2 years ago

Google doesn't even do a lot simpler things like searching by language or location. And the search is garbage. I am trying to learn Italian so that's what I am interested in but even when I enter a search term spelled correctly with its accents and everything I get anything from Brazilian Portuguese to French. They do a very helfpul translation of the term and return results that are unfortunately useless to me. (I would have loved to speak every language but I don't)

Jakob · 2 years ago

The default behaviour for multiple languages is bad, but the settings page for both region, and search results language work well. In case you haven’t seen the settings, yet.

gniv · 2 years ago

On web search, you can append &lr=lang_it to the URL. Maybe make it into a Chrome extension.

pmoriarty · 2 years ago

YouTube doesn't have to be good. It just needs to do the minimum it needs to keep users from switching to a different platform... which, because of the network effect of so many people and videos being on it, is not much at all.

If they had serious competition, they'd have to do more to keep users, but no such competition exists.

lettergram · 2 years ago

There’s twitch and I think Rumble is going to give it a run for it’s money.

I grant Rumble is only 1% the size of YouTube by viewership, but I think that’ll shift fairly rapidly and we can see 10% on rumble in 3 years.

My analysis https://austingwalters.com/an-analysis-on-rumble-nasdaq-rum/

stingraycharles · 2 years ago

There’s a company founded by a friend of mine called MediaDistillery, who are doing awesome stuff in this area. Real-time searchable massive video archives, with contextual understanding (e.g. “a WW2 fragment with a Jewish mother holding her baby”). Super useful for so many purposes.

And then there’s YouTube where you can’t even search subtitles. It makes me shake my head. Google seems to be at the forefront of AI, but doesn’t seem to be able to turn all that expertise into relevant products. Maybe the recent disruptions will shake them awake?

methou · 2 years ago

I mustn't be searching it right, I can only get this company: https://mediadistillery.com/ , the feature you mentioned is not accessible, and has that Ad-tech smell.

sharess · 2 years ago

Social media companies are in a constant tug-of-war against the end users when it comes to controlling what the users see. The ideal is that the user has absolutely no control over on what they see and the social media company can fully dictate content. That is what makes the money.

Allowing users to freely query content in their own websites is completely antithetical to what they are trying to do. YouTube is also very aggressive in preventing scraping and limiting the usage of the official API. Which is quite ironic considering the history of the company.

flenserboy · 2 years ago

Google: appeared open to start things off, then went whole hog on the MS embrace-and-extend philosophy, aiming to crush the life out of the entire web.

Pazzaz · 2 years ago

I think you're exaggerating a little when you say "site:youtube.com" doesn't work. If I search 'site:youtube.com apple watch' I get 143 000 000 results, and if I search something more specific like 'site:youtube.com "Featuring Dr James Grime"' I'll find exactly what I'm looking for. But you're correct that it doesn't seem to search video comments, only titles and descriptions.

derefr · 2 years ago

The problem I ran into is that, like I said, YouTube doesn't seem to get indexed by hypertext inward-edges from other sites like a regular website does — and so you can't search by how you recall a video being described in pages that link to it; instead, you have to remember how the video describes itself. Which it may not always do well.

dredmorbius · 2 years ago

Some of us recall that when Google+ launched it lacked any search whatsoever.

That this was the case with a company whose name is synonymous with online search was ... simply mind-boggling.

The platform eventually did get search (actually a few different implementations), which varied between mostly useless to actually reasonably functional, though I'll note that HN's Algolia-based search is vastly more useful on an ongoing basis.

G+'s content, to the extent it survives at all, is largely on the Internet Archive's Wayback Machine which ... lacks search.

ranting-moth · 2 years ago

Conflict of interests. They want you faffing around provisioning all those valuable clicks for them to sell.

rasz · 2 years ago

Google does it, Youtube does not.

Google will find a video when you search for a phrase that was said in it (as long as its bad speech recognition got it right), it will find a video with a text that shows up on an object with enough clarity to OCR (for example electronic component name on a pcb in the foreground). There is one plot twist - Google will not always do it when you search for VIDEO specifically :) but will gladly give you videos when searching text/images :)

Youtube search on the other hand will try

- suggest something you liked that has nothing to do with the search term entered

- popular videos at this moment

- videos mildly related to proper results. One of them had a horse in it? clearly you want more horses!

- videos with title mildly related to search term

- to ignore upload date filter when they panic (Christchurch mosque massacre).

For example YT search for "Si5351A" limited to this month will give 11 somewhat proper results mostly with "Si5351" (no A postfix) in title/description AND some dude DXing in Indonesian "Menerima Modif Radio Yaesu FTC 1540A Ke DDS System" because "Si5351A" is a "DDS" so its the same thing right? Its like when Im looking for "NSR Ro80" you should show me plenty of other cars because Ro80 is a car :). Searching for Si5351A without quote marks will show one additional video with Si5351B in the title.

Gets better, searching Google Video for Si5351A last month also gives ~11 results, but only 4 of those are direct YT links :]

mrazomor · 2 years ago

Probably because YouTube =! Google Search, while YouTube is still a subset of Google. So, going an extra mile for YouTube and not for others might put Google Search in anti competition issues.

Then again, I also find it absurd. YouTube is one of the most valuable parts of the Internet. And its lack of searchability is criminal. At least the YT search itself should make up for it. It's shame it doesn't.

derefr · 2 years ago

Google doesn't necessarily have to do anything special for YouTube, though. Google could "just" index YouTube videos as if they were any other web pages, in a standard way. It would then be YouTube's job, to make the data inside those video pages legible to Google's indexer. Where Google could enable this, by pushing for web standards to increase machine-legibility of video in HTML — e.g. standardized ARIA-accessible captions sources for the <video> element, etc.

If they got it set up such that in theory any web spider could come along and index a YouTube video — then there would be no anti-trust reason that Google couldn't just directly ingest the subtitle files off their own servers; it'd just be a bandwidth-saving optimization over the scraping process that they could otherwise do.

choppaface · 2 years ago

They do have search in video but the launch was kinda miffed

https://www.socialmediatoday.com/news/google-tests-text-sear...

jameshart · 2 years ago

How do you think monopoly regulators would like it if YouTube videos were indexed with higher accuracy and detail in google searches than Vimeo video?

So sure, google can say ‘here’s a standard way to provide subtitles for a video which we’ll index’, but then that becomes a complete SEO side channel - google needs to validate that the subtitles actually match the content. And that means their bots downloading the video itself. And google really doesn’t want to go out there and argue that video needs to be downloadable by bots, because that’s the whole YouTube-dl case right there.

ttctciyf · 2 years ago

> Google search with "site:youtube.com" in it. You get basically nothing

That is not my experience! I regularly resort to this when the crappy inbuilt youtube search, which prefers to throw out algorithmic recommendations over returning actual search results, fails to come up with the goods.

Do you really get no results for, say: https://www.google.com/search?&q=intitle%3A%22thomas+brinkma... ?

Deleted Comment

reaperducer · 2 years ago

Heck, when I first heard about YouTube's autogenerated captions

Off-topic, but since I don't use YouTube and you do, in your experience, how are the auto-generated captions? Are they accurate?

I've been unimpressed by speech-to-text engines in the past, so I'm interested to hear if this is a problem that Google's managed to solve.

andai · 2 years ago

As far as I can tell, Google (but not YouTube?) does search YouTube transcripts.

I have successfully Googled text in a video's transcript and found that video.

The transcripts themselves are pretty bad though (Google's using old tech).

They're usually good enough for auto-summarization though.

guerrilla · 2 years ago

> but it's incomprehensible to me that Google doesn't do this themselves.

How is it incomprehensible that they don't give a shit about what you want to see and only care about what's profitable to them for you to see?

fatneckbeard · 2 years ago

i used to imagine similar opportunities with google books. but they have done basically nothing with it. and that's been like 20 years.

if anyone could have disrupted the corrupt and unfair academic publishing world, it was Google. they just found it an uninteresting task. they preferred to work on G+, Stadia, Google Code, etc, https://killedbygoogle.com/

jimmySixDOF · 2 years ago

Rule #50 The better the Catalog, the Worse the Interface

Spotify and YouTube are the leading examples but there are definitely others.

whitemary · 2 years ago

YouTube profits from people scrubbing videos. Why on Earth would they want to offer full text search instead?

drumhead · 2 years ago

Its Google, the obvious seems to elude them even when its sitting in front of them.

freedomben · 2 years ago

I believe Bard has the capability of searching youtube transcripts