VLC tops 6B downloads, previews AI-generated subtitles

There's an art to subtitling that goes beyond mere speech-to-text processing. Sometimes it's better to paraphrase dialog to reduce the amount of text that needs to be read. Sometimes you need to name a voice as unknown, to avoid spoilers. Sometimes the positioning on the screen matters. I hope the model can be made to understand all this.

diggan · a year ago

> Sometimes it's better to paraphrase dialog to reduce the amount of text that needs to be read

Please no. Some subtitle companies do think like this, and it's really weird, like when they try to "convert" cultural jokes, and then add in a bunch of more assumptions regarding what cultures you're aware of depending on the subtitle language, making it even harder to understand...

Just because I want my subtitles in English, doesn't mean I want all typical discussed Spanish food names to be replaced by "kind of the same" British names, yet something like that is something I've come across before. Horrible.

flippyhead · a year ago

I totally get this. When I'm watching videos for the purpose of learning a language, I want all the actual words in the subtitles. But if I'm watching just ot enjoy, say in a language I don't care to learn, I don't mind someone creatively changing the dialog to how it probably would have been written in English. This happens with translations of novels all the time. People even seek out specific translators who they feel are especially talented at this kind of thing.

scarface_74 · a year ago

I know a little Spanish and even I get annoyed when the English subtitles don’t match what they said in Spanish. Of course I expect grammatically correct Spanish to be translated into grammatically correct English.

bdndndndbve · a year ago

It depends on the context! Trying to Americanize Godzilla, for instance, has largely failed because Godzilla is an allegory for the unique horror of nuclear bombing which Japan experienced. Making him just a lizard that walks through New York is kind of stupid.

Jokes are an example of something translators can do really well - things like puns don't work 1:1 across languages. A good translator will find a corresponding, appropriate line of dialogue and basically keep the intent without literally translating the words.

Food is kind of silly because it's tied to place - if a setting is clearly Spanish, or a character is Spanish, why wouldn't they talk about Spanish food? Their nationality ostensibly informs something about their character (like Godzilla) and can't just be fine/replaced.

lifthrasiir · a year ago

More precisely speaking, there are two kinds of subtly different subtitles with different audiences: those with auditory imparements and those with less understanding of given language. The former will benefit from paraphrasing while the latter will be actively disadvantaged due to the mismatch.

xattt · a year ago

There was an eminent Russian voiceover artist (goblin?) that translated pirated Western movies with his own interpretation.

His translations were nowhere near what the movie was about, but they were hilarious and fit the plot perfectly.

close04 · a year ago

> Spanish food names to be replaced by "kind of the same" British names

The purpose of a translation is after all to convey the meaning of what was said. So for example you'd want the English "so so" to be translated in Spanish as "más o menos" instead of repeating the translation of "so" twice. You don't want to just translate word for word, venir infierno o alta agua.

A lot of dialog needs language specific context, many expressions don't lend themselves to literal translation, or the translation in that language is long and cumbersome so paraphrasing is an improvement.

Like with anything else, the secret is using it sparingly, only when it adds value.

llm_nerd · a year ago

There is the art of subtitling, and then there is the technical reality that sometimes you have some content with no subtitles and just want a solution now, but the content didn't come with an SRT or better yet VTT and OpenSubtitles has no match.

They're using Whisper for speech to text, and some other small model for basic translation where necessary. It will not do speaker identification (diarization), and certainly isn't going to probe into narrative plot points to figure out if naming a character is a reveal. It isn't going to place text on the screen according to the speaker's frame place, nor for least intrusion. It's just going to have a fixed area where a best effort at speech to text is performed, as a last resort where the alternative is nothing.

Obviously it would be preferred to have carefully crafted subtitles from the content creator, translating if the desired language isn't available but still using all the cues and positions. Secondly to have some carefully crafted community subtitles from opensubtitles or the like, maybe where someone used "AI" and then hand positioned/corrected/updated. Failing all that, you fall to this.

eviks · a year ago

> better to paraphrase dialog to reduce the amount of text that needs to be read.

That's just bad destructive art, especially for a foreign language that you partially know.

> Sometimes you need to name a voice as unknown, to avoid spoilers.

Don't name any, that's what your own eye-ear voice recognition/matching and positioning are for (also reduces the amount of text)

> Sometimes the positioning on the screen matters.

This is rather valuable art indeed! Though unlikely fit to be modelled well

lxgr · a year ago

> Don't name any, that's what your own eye-ear voice recognition/matching and positioning are for

That’s tricky when one or more speakers aren’t visible.

MindSpunk · a year ago

Subtitles aren't just for foreign viewers though, they're also for native speakers who are now hearing impaired.

Aurornis · a year ago

AI subtitles are just text representation of the sound track.

There is no need for artistic interpretation, substituting words, or hiding information. If it’s in the audio, there’s no reason to keep it out of the subtitle.

An AI subtitle generator that takes artistic license with the conversion is not what anyone wants.

Hard_Space · a year ago

That doesn't work for idioms, certainly in Italian, which has multiple colorful metaphors which would be mystifying if translated directly.

entropie · a year ago

> Sometimes it's better to paraphrase dialog to reduce the amount of text that needs to be read.

Questionable. It drives me crazy to have subtitles that are paraphrase in a way that changes the meaning of statements.

raincole · a year ago

> Sometimes it's better to paraphrase dialog to reduce the amount of text that needs to be read

I really hope people to stop doing that.

mohamez · a year ago

This is horrible for people who learn languages using TV Shows and Movies. One of the most frustrating things I've encountered while learning German is the "paraphrase" thing, it makes practicing listening very hard, because my purpose wasn't to understand what was being said, but rather familiarizing my ear with spoken German.

So, knowing exactly the words being said is of utter importance.

thiht · a year ago

> Sometimes it's better to paraphrase dialog to reduce the amount of text that needs to be read

NO!

I speak and understand 90% of English but I still use subtitles because sometimes I don't understand a word, or the sound sucks, or the actor thought speaking in a very low voice was a good idea. When the subtitles don't match what's being said, it's a terrible experience.

mouse_ · a year ago

> Sometimes it's better to paraphrase dialog to reduce the amount of text that needs to be read.

Pretty sure this is a violation of the Americans with Disabilities Act, so illegal in the U.S. at least. Being Deaf doesn't mean you need "reduced" dialogue.

lm28469 · a year ago

As long as they're synced properly I don't care much, some movies/shows have really bad sound mix and it's not always possible to find good subs in the first place

lidavidm · a year ago

I suppose this feature should have been termed closed captioning and not subtitling. It seems you're not going to get much sympathy for human translation here.

latexr · a year ago

> There's an art to subtitling that goes beyond mere speech-to-text processing.

Agreed.

> Sometimes it's better to paraphrase dialog to reduce the amount of text that needs to be read.

Hard no. If it’s the same language, the text you read should match the text you listen to. Having those not match makes parsing confusing and slow.

> Sometimes you need to name a voice as unknown, to avoid spoilers.

Subtitles don’t usually mention who’s talking, because you can see that. Taking the source of a voice is uncommon and not something I expect these system to get right anyway.

the_clarence · a year ago

The best subtitles I've had were fantrads

latexr · a year ago

Amazon agrees.

https://www.reddit.com/r/amazonprime/comments/h922rg/primevi...

Very excited for this, but a waste of energy if everyone is needing to process their video in real time.

huijzer · a year ago

Why are we still talking about this? Computers are INCREDIBLY efficient and still become orders of magnitude more efficient. Computation is really negligible in the grand scheme of things. In the 80s some people also said that our whole world energy would go to computations in the future. And look today. It’s less than 1%. We do orders of magnitude more computations, but the computers have become orders of magnitude more efficient too.

As another way to look at this, where does this questioning of energy use end? Should I turn off my laptop when I go to the supermarket? When I go to the toilet? Should I turn off my lights when I go to the toilet?

My point is, we do a lot of inefficient things and there is certainly something to being more efficient. But asking “is it efficient” immediately when something new is presented is completely backwards if you ask me. It focusses our attention on new things even though many old things are WAY more inefficient.

magic_smoke_ee · a year ago

> Computers are INCREDIBLY efficient and still become orders of magnitude more efficient.

That's what a software engineer would say who views resources as unlimited and free.

simgt · a year ago

> In the 80s some people also said that our whole world energy would go to computations in the future. And look today.

Today we consume twice as much energy as we did in the 80s (and that's mostly coming from an increase in fossil fuels consumption). Datacenters alone consume more than 1% of global energy production, that doesn't include the network, the terminals, and the energy necessary to produce all of the hardware.

lxgr · a year ago

> Why are we still talking about this? Computers are INCREDIBLY efficient and still become orders of magnitude more efficient.

Because today is today, and if we can project that the energy consumption of doing a task n times on the client side outweighs the complexity of doing it once and then distributing the result somehow to all n clients, we should arguably still do it.

Sometimes it's better to wait; sometimes it's better to ship the improved version now.

> Computation is really negligible in the grand scheme of things.

Tell that to my phone burning my hand and running through a quarter of its battery for some local ML task every once in a while.

Mashimo · a year ago

> Should I turn off my laptop when I go to the supermarket?

Yes.

_flux · a year ago

What is the alternative?

The results could be cached, but it's probably unlikely that they would need to be used again later, as I imagine most videos are watched only once.

Another option would be to upload the generated subtitles to some service or p2p, but I believe that would also problem (e.g. privacy, who runs the service, firewalls for p2p, etc).

rvnx · a year ago

It can actually send to Google the information on what you are playing:

https://github.com/videolan/vlc/blob/f908ef4981c93a8b76805ad...

and to their own servers:

https://github.com/videolan/vlc/blob/f908ef4981c93a8b76805ad...

should could fetch subtitles as the same time ?

edit: cf, what "a3w" says too.

scarface_74 · a year ago

Tangentially related: funny enough, this site was just submitted yesterday to HN - https://exampl.page/

You can navigate to $foo.exampl.page and it will generate a website on the fly with text and graphics using AI. It will then save and cache the page.

It’s admittedly a useless but cool little demo.

c16 · a year ago

This is exactly the answer imo. We have subtitle files, which is in effect a cache. Process once, read many times.

So while I'm excited this feature is now available, having high quality subtitles cached in one place and generated by AI is the answer imo.

a3w · a year ago

VLC has a "check my media online" feature, next to "check for updates to VLC on startup" already. Could they offer subtitle downloads?

Previously, that was used for mp3 album covers or something?

magic_smoke_ee · a year ago

Authoritatively-correct subtitles rather than distributed generation and/or publication by anyone and everyone, including AI.

I don't know how many times I've seen subtitles that appear to be based on a script or were half-assed, and don't match the dialogue as spoken at all.

CaptainFever · a year ago

OpenSubtitles as a cache, maybe? (With their collaboration, of course.)