This was an fun read, as someone who's both a Korean BW player and a speech recognition researcher.
It's interesting to note that the original Korean transcription already has many errors, seemingly (and impressively) corrected by LLMs later on. For example, 12 안마당 빌드 (12 courtyard build) is actually 12 앞마당 빌드 (12 frontyard build), which might have been more understandable to BW players. Similarly 투에처리 빌드 (processing-at-two build? makes no sense lol) should have been transcribed 투해처리 빌드 (two-Hatchery build).
Therefore it may also be helpful to directly feed the slang dictionary into Whisper's inference process using contextual biasing. There are lots of ways to do this, but the simplest would be to increase the probability of slang words in the dictionary in the final prediction layer of Whisper by a constant factor. This is fairly easy to implement, for example by using HuggingFace's library: https://huggingface.co/docs/transformers/en/internal/generat...
I am a StarCraft fan and I have no idea what a courtyard or a frontyard is supposed to be! However I do know that the names of buildings, units, technologies, and strategies are usually heavily abbreviated in English. Perhaps the same is true in Korean? A 12 barracks build would usually just be called "12 rax", a two hatchery mutalisk build would be called "2 hatch muta", and a three hatchery hydralisk timing attack / all-in would be called "3 hatch hydra bust".
I believe the equivalent term used in English (exhibited in the new translation) is "natural", short for "natural expansion", which refers to the obvious location where the player should build their first expansion. It sounds like the term used in Korean for this concept literally means "front yard" rather than matching the English term.
A lot of Korean slang is a little different. Source: not Korean but have been in the English community a long time and picked some stuff up.
"1rax double" is equivalent to "1rax expand" or "1rax CC". They use multi or double to mean expand in the early game. Instead of "cheese" or "all-in" they use "pil-sal-gi" which means ace/joker card or "han-bang" which means an army or attack on few resources.
I am not sure what short-hand they use for barracks, gateway, etc.
Thanks for the added context on the builds! As "foreign" BW player and fellow speech processing researcher, I agree shallow contextual biasing should help. While not difficult to implement, most generally available ASR solutions don't make it easy to use. There's a PR in ctranslate2 implementing the same feature so that it could be exposed in faster-whisper: https://github.com/OpenNMT/CTranslate2/pull/1789
Been a while since I played the game, but in all translations I was confused why you'd want to build 12 spawning pools.
If I remember correctly, that building was for enabling creation of zerglings and other units in hatcheries (and also for researching upgrades) - but one building was enough to unlock those units in all hatcheries; and it did not produce any units itself, so building more of them wouldn't increase your unit output either.
You could in theory built several of them to research multiple upgrades in parallel, but there were only like 3 possible upgrades anyway, so it would make sense to build 12 of them.
The only reason I could think of would be as a sort of redundancy, so you can keep building zerglings even if the enemy destroys some of the pools. But 12 also seems excessive for that.
So what exactly was the motivation here?
Sorry if I'm talking rubbish here, as I said, it's been a while.
12 Pool means build your spawning pool at 12 supply, usually the number before the building is at which supply you build the building, and it assumes you're constantly producing workers.
Don’t let the title fool you: this is anextremely thorough and creative take on translating and making more approachable the commentary of StarCraft.
As the author rightly points out, in its 27 years of existence, commentary around the game has become a domain specific language. Not just Korean or English.
This approach of automated scripting and using AI to understand roughly what was said and then make it coherent is really cool.
LOL, as a non-native English speaker, reading this reminds me of EXACTLY the same problem of translating many things, but more precisely, computer articles and software development.
There’s a huge amount of terms that are difficult to translate (sharding? Hash?). The only real solution is to adopt them to your language, more or less adapted, which is what happens over time. But it requires a community that, to some degree, is able to cross the gap between the languages. In this case, learning English.
Talking about software development in Spanish (my native language) is a succession of imported terms from English.
I don’t think there’s a good way of doing that, and I’m interested to see how automatic translations deal with it, because the only way this can work is with a process of mixing both language in a social way and see what terms evolve from that process.
And you need, in the terms the post describes, people that know Korean at least in a non-fluent way. And the game itself, of course.
With Spanish we have the added complexity that there are different linguistic traditions around the world. For example, in Mexico I learned "depurar", an existing Spanish word that closely fits the meaning of "debug". However, many Spanish speakers simply say "debuguear", just directly borrowing the English word. In Mexico I also learned "desempeño" to describe the performance of a computer or software, but in Argentina I've heard "el performance" to say the same.
I think the most common thing is to just use English loanwords without trying to find existing Spanish words that fit the meaning.
In some sense, these terms are extremely trivial to adapt: the German term for sharding is just a literal borrow, just say 'sharding'.
What's almost impossible to translate are everyday words. German Brot has rather different connotations from the nasty stuff Brits call bread, but I don't think there's a better word available, and a straight-up borrow would feel fairly weird in most context. Much weirder than borrowing 'sharding' in a technical context.
> The only real solution is to adopt them to your language, more or less adapted, which is what happens over time.
You can see some good examples of that when you look at railway related terms in German. They used to be all English, because that's where we got the technology from. But over time they have been replaced with mostly German native-looking terms. (Well, native looking, but many of them like Lokomotive re-created from the same borrowing from Greek or Latin as in English. But eg station is now Bahnhof. And train is Zug.)
> German Brot has rather different connotations from the nasty stuff Brits call bread, but I don't think there's a better word available, and a straight-up borrow would feel fairly weird in most context. Much weirder than borrowing 'sharding' in a technical context.
I'm absolutely puzzled by this. Not British but I've been to both countries and can't say I noticed much difference in their bread.
What do you consider to be the key distinction between German and British bread? Why do you think it is such a dramatic change that you can't countenance using the same word?
I think the words' metaphorical meanings don't help much unless you already know what they mean. If you heard the word "sharding" for the first time and all you knew was that it had something to do with computers, I think you'd have a hard time guessing that it means "partitioning rows of a database across multiple servers to reduce load".
Kinda funny that in an article about translations, the author gets signal-to-noise completely backwards. A high signal to noise (over 9000) is very good. It means you are getting a lot of signal with very little noise. Decreasing signal to noise means getting more noise.
I was able to understand the Google Translate version well, but I am very familiar with the intricacies of BW and zerg 12hatch openers.
Chatgpt and Claude did an incredible job translating the korean text:
Claude:
Today I'll teach you about the 12 Hatchery build. I'll explain the types of 12 Hatchery builds, their advantages and disadvantages, and the build orders in a simple but detailed way.
Against Protoss, this is the build you use when you want to start with the most economic advantage. Against Terran, there are several builds you can do with 12 Hatchery, so I'll explain some of the most commonly used builds.
The first is the two-hatchery build that starts with 12 Hatchery:
12 Hatchery
11 Spawning Pool
10 Gas
This build uses early gas, and it's often used when you want to quickly transition into a three-hatchery build with three gas bases.
The second build is:
12 Hatchery
12 Pool
12 Gas
This build allows for moderately fast tech tree and moderately fast three-hatchery expansion. This build is commonly known as the "safe three-hatchery" build, and you can think of it as a build that enables both quick Mutalisks and quick third base.
Dumb question from someone who only played money-maps as a kid:
What do the numbers in front of the building mean? 12 Hatcheries seems like… well, 12 seems like a possible but implausible number of hatcheries to build (hypothetically it is possible of course). And 12 spawning pools is obviously not useful. So that makes me think it is the position in the build order list. But, they list other builds, like:
> The second is the 12 Hatch, 12 Pool, 12 Gas
Which doesn’t make a ton of sense in with that parsing. I mean it must not be a straight list. Maybe it is a tree, and 12 is the depth for this building? But that seems late, I can’t think of 11 buildings to build before gas. Maybe they include units too? Or maybe just drones/overlords?
IIRC it started with "4-pooling" which is when, as Zerg, you build a spawning pool while only having 4 workers (it's been years, forget what they're called), rebuild your 4th worker and then start making zerglings to achieve a super-early attack (a "rush").
Then your opponent calls you all sorts of vile names and questions your sexuality, etc.
That's only if you manage to get the first two zerglings out faster than it takes for opponent's SCVs to arrive at your base and kill your drones (that's the name of Zerg workers) :).
It denotes how much supply you should have when you start the building. All of your supply at this stage comes from workers, so it's also an indication how many workers you should train.
There might be a video where this happens, but I think it's more likely that you're misremembering it; there was a somewhat famous game cast by Husky where Cella[1] (a professional player) was joking around on the Internet playing 2v2 (2 teams of 2 players each).
He asked his partner what strategy he should use, the person responded with "13 gate" (meaning: keep building probes until you have 13, then build your first warpgate). Cella pretended to misunderstand and instead built 13 warpgates, which is a horrible strategy, but they still won the game. They only won because his partner could barely defend him in early game while he was building the warpgates. After surviving early game, it wasn't a fair fight even with a horrible strategy, because it's a professional against "normal" people on the Internet.
I don't think the video exists anymore, Husky famously removed his whole channel with a lot of StarCraft 2 early history, but I found this Reddit thread[2] talking about the game (WeRRa was Cella's team at the time, that's why they call him CellaWeRRa).
In the game you build buildings and units. The units take up "supply" which there is a limit on. At the beginning of the game you mostly just building workers (unless you scout your opponent is going for an extremely early attack), who mine resources and construct buildings.
The numbers indicate the supply you should be at when you build the structure.
so a 12 hatch 12 pool 12 gas means you get to 12 workers and then build those 3 buildings in that order as soon as you have the resources for those.
For zerg the workers actually become the building, so I assume you hit 12, build the hatchery, build another worker, build the spawning pool, build another worker, and then build your gas refinery.
Yes as zerg the lost supply is counted, so you can either go 12 hatch 11 pool 10 gas or 12-12-12 if you want to be a little bit more economically greedy at the expense of making it much harder to hold 8rax in ZvT as an example.
As you get later into the game people who play more seriously also use the in-game clock, or timing a building placement relative to how complete a different building is to determine building timing. This helps with subtleties like whether you lost your scouting worker or not (-1 supply), if the early game got really weird because you had to build more units to hold some aggression, etc.
People already explained that's it's how much supply you have.
In practice this is easier for people to use than actual clock timings, because it's more robust to delays or interference. If you remember "third rax at 30 supply" then even if you're playing a little slow, you will still know roughly when to build that. But if you memorized exact clock timings and now you might be 20+ seconds behind, it's hard to know when you should fit in the new building.
It's not perfect of course, and if you get cheesed and the game goes weird then you'll have to start improvising rather than relying on just supply timings, a lot of times after a cheese where neither side definitely wins, the balance between tech and economy is now very non-standard and you can't rely on conventional rules of thumb anymore.
Even when Google Translate got pretty good I was not really able to effectively translate Chinese or Japanese text about Go (the game). I had similar issues to the ones mentioned in this post. Many Chinese and Japanese words (e.g., "ko") have a very specific meaning in the context of Go, but they also have regular meanings (e.g., "robbery") in more normal contexts, so Google Translate would translate text in a generic way, which made everything unintelligible. With modern LLMs, I can now preface my translation requests with instructions such as "I am going to ask you to translate some Chinese text accompanying weiqi diagrams. Your translations should be idiomatic and not shy away from Go jargon. For example, 拆 = extension, 夹 = pincer, 刺 and 觑 = peep.", and it does a fantastic job, enough for me to basically read anything I want. It was lucky for me that evidently enough Go material already existed in the training set that I didn't have to do anything more special.
(Some chess corrections, in case the author is reading: the moves at the start of chess games are called openings in English, not openers; there are not distinct white-piece and black-piece openings, although of course an individual player will probably study a given opening from the point of view of one side or the other; their study is considered fundamental all the way up to the highest level, in fact more so as you increase in skill; and the Sicilian variation in question is the Najdorf, not Najdork.)
It's interesting to note that the original Korean transcription already has many errors, seemingly (and impressively) corrected by LLMs later on. For example, 12 안마당 빌드 (12 courtyard build) is actually 12 앞마당 빌드 (12 frontyard build), which might have been more understandable to BW players. Similarly 투에처리 빌드 (processing-at-two build? makes no sense lol) should have been transcribed 투해처리 빌드 (two-Hatchery build).
Therefore it may also be helpful to directly feed the slang dictionary into Whisper's inference process using contextual biasing. There are lots of ways to do this, but the simplest would be to increase the probability of slang words in the dictionary in the final prediction layer of Whisper by a constant factor. This is fairly easy to implement, for example by using HuggingFace's library: https://huggingface.co/docs/transformers/en/internal/generat...
"1rax double" is equivalent to "1rax expand" or "1rax CC". They use multi or double to mean expand in the early game. Instead of "cheese" or "all-in" they use "pil-sal-gi" which means ace/joker card or "han-bang" which means an army or attack on few resources.
I am not sure what short-hand they use for barracks, gateway, etc.
If I remember correctly, that building was for enabling creation of zerglings and other units in hatcheries (and also for researching upgrades) - but one building was enough to unlock those units in all hatcheries; and it did not produce any units itself, so building more of them wouldn't increase your unit output either.
You could in theory built several of them to research multiple upgrades in parallel, but there were only like 3 possible upgrades anyway, so it would make sense to build 12 of them.
The only reason I could think of would be as a sort of redundancy, so you can keep building zerglings even if the enemy destroys some of the pools. But 12 also seems excessive for that.
So what exactly was the motivation here?
Sorry if I'm talking rubbish here, as I said, it's been a while.
Deleted Comment
As the author rightly points out, in its 27 years of existence, commentary around the game has become a domain specific language. Not just Korean or English.
This approach of automated scripting and using AI to understand roughly what was said and then make it coherent is really cool.
There’s a huge amount of terms that are difficult to translate (sharding? Hash?). The only real solution is to adopt them to your language, more or less adapted, which is what happens over time. But it requires a community that, to some degree, is able to cross the gap between the languages. In this case, learning English.
Talking about software development in Spanish (my native language) is a succession of imported terms from English.
I don’t think there’s a good way of doing that, and I’m interested to see how automatic translations deal with it, because the only way this can work is with a process of mixing both language in a social way and see what terms evolve from that process.
And you need, in the terms the post describes, people that know Korean at least in a non-fluent way. And the game itself, of course.
I think the most common thing is to just use English loanwords without trying to find existing Spanish words that fit the meaning.
What's almost impossible to translate are everyday words. German Brot has rather different connotations from the nasty stuff Brits call bread, but I don't think there's a better word available, and a straight-up borrow would feel fairly weird in most context. Much weirder than borrowing 'sharding' in a technical context.
> The only real solution is to adopt them to your language, more or less adapted, which is what happens over time.
You can see some good examples of that when you look at railway related terms in German. They used to be all English, because that's where we got the technology from. But over time they have been replaced with mostly German native-looking terms. (Well, native looking, but many of them like Lokomotive re-created from the same borrowing from Greek or Latin as in English. But eg station is now Bahnhof. And train is Zug.)
I'm absolutely puzzled by this. Not British but I've been to both countries and can't say I noticed much difference in their bread.
What do you consider to be the key distinction between German and British bread? Why do you think it is such a dramatic change that you can't countenance using the same word?
That is very silly: just because German bread is different from British bread doesn't make the word "Brot" almost impossible to translate.
Chatgpt and Claude did an incredible job translating the korean text:
Claude:
That will download up to 720p quality.
What do the numbers in front of the building mean? 12 Hatcheries seems like… well, 12 seems like a possible but implausible number of hatcheries to build (hypothetically it is possible of course). And 12 spawning pools is obviously not useful. So that makes me think it is the position in the build order list. But, they list other builds, like:
> The second is the 12 Hatch, 12 Pool, 12 Gas
Which doesn’t make a ton of sense in with that parsing. I mean it must not be a straight list. Maybe it is a tree, and 12 is the depth for this building? But that seems late, I can’t think of 11 buildings to build before gas. Maybe they include units too? Or maybe just drones/overlords?
Then your opponent calls you all sorts of vile names and questions your sexuality, etc.
I vaguely remember a Husky video where he actually did a "9 pool" with building 9 spawning pools.
He asked his partner what strategy he should use, the person responded with "13 gate" (meaning: keep building probes until you have 13, then build your first warpgate). Cella pretended to misunderstand and instead built 13 warpgates, which is a horrible strategy, but they still won the game. They only won because his partner could barely defend him in early game while he was building the warpgates. After surviving early game, it wasn't a fair fight even with a horrible strategy, because it's a professional against "normal" people on the Internet.
I don't think the video exists anymore, Husky famously removed his whole channel with a lot of StarCraft 2 early history, but I found this Reddit thread[2] talking about the game (WeRRa was Cella's team at the time, that's why they call him CellaWeRRa).
[1] https://liquipedia.net/starcraft2/Cella
[2] https://www.reddit.com/r/starcraft/comments/dyjk9/cellawerra...
The numbers indicate the supply you should be at when you build the structure.
so a 12 hatch 12 pool 12 gas means you get to 12 workers and then build those 3 buildings in that order as soon as you have the resources for those.
For zerg the workers actually become the building, so I assume you hit 12, build the hatchery, build another worker, build the spawning pool, build another worker, and then build your gas refinery.
As you get later into the game people who play more seriously also use the in-game clock, or timing a building placement relative to how complete a different building is to determine building timing. This helps with subtleties like whether you lost your scouting worker or not (-1 supply), if the early game got really weird because you had to build more units to hold some aggression, etc.
In practice this is easier for people to use than actual clock timings, because it's more robust to delays or interference. If you remember "third rax at 30 supply" then even if you're playing a little slow, you will still know roughly when to build that. But if you memorized exact clock timings and now you might be 20+ seconds behind, it's hard to know when you should fit in the new building.
It's not perfect of course, and if you get cheesed and the game goes weird then you'll have to start improvising rather than relying on just supply timings, a lot of times after a cheese where neither side definitely wins, the balance between tech and economy is now very non-standard and you can't rely on conventional rules of thumb anymore.
Deleted Comment
(Some chess corrections, in case the author is reading: the moves at the start of chess games are called openings in English, not openers; there are not distinct white-piece and black-piece openings, although of course an individual player will probably study a given opening from the point of view of one side or the other; their study is considered fundamental all the way up to the highest level, in fact more so as you increase in skill; and the Sicilian variation in question is the Najdorf, not Najdork.)