From Uzbek To Klingon, The Machine Cracks The Code
from the statistical-machine-translation dept
Move over Babelfish, there’s a new translation technology in town. The NY Times has discovered that in just the past few years there have been some fairly impressive
advancements in statistical machine translation. Traditional machine translation systems involve a bilingual programmer who can help map the languages, but with statistical machine translation, you just feed the system identical texts from multiple languages and let the machine figure it out. It sounds like, for now, the technology works in some cases, and is probably most useful in developing fast translation systems that might miss more nuanced language issues. Some of those who believe in the traditional methods scoff at the idea that the statistical method will ever be useful for anything more than very basic translations. However, with the rate of improvement over the past few years, it wouldn’t be surprising to see statistical machine translation systems improve even more in the near future.
Comments on “From Uzbek To Klingon, The Machine Cracks The Code”
There was one passage in there that made me lose all credibility —
“If we can learn how to translate even Klingon into English, then most human languages are easy by comparison,” he said.
Klingon was invented by English-speaking sci-fi fans, with their narrow imagination. It does not represent another culture.
The fact remains that languages have concepts that are not understood in other languages. As with the cliche about Inuits having 20 words for snow, the Japanese have about 20 ways to refer to “self”, as well as nuanced honorifics that simply do not translate into English. American Sign Language with its parallel-speaking forms have no equivalents in written or spoken English. Even in its simple form, a literal translation of American sign language sentence would go something like “Past night movie me see wow me.”
Since ASL is so conceptually different, a new writing system called Signwriting is also evolving.
Re: Oh please
Of course Klingon represents a different culture. No one ever said it had to be a real culture. It’s actually a pretty interesting little language. Kind of like a cross between Hebrew, Russian, and Spanish with thick pronunciation keys. It’s pretty developed for a fictional language.
Re: Re: Oh please
So, a bunch of sci-fi dorks took a bunch of foreign-sounding sounds and made up a “language”. Only within a Western context, too. Pretty lame.
Re: Re: Re: Oh please
No, “a bunch of sci-fi dorks” did not “take a bunch of foreign-sounding sounds and made up a language.” Where does this stuff come from?
Klingon was developed by a professional linguist, Mark Orkand, who did his graduate work in linguistics at Berkeley (his undergrad degree is from Georgetown), on contract to Paramount Pictures for Star Trek III. He is certainly not a “sci-fi dork,” but he is most certainly a very accomplished professional linguist. Although the language is technically “invented,” it has specific and fairly complex rules of grammar, syntax and morphology. There is even an invented culture to support specific terminology, complex multiple meanings, etc. so the semantics are well developed.
Nuances Don't Matter Much
If you’re translating technical or scientific documents, language nuances won’t matter too much. Japanese honorifics probably won’t be used much when discussing particle physics or nanotechnology. I doubt that in discussing object oriented programming, the 20 words for snow used by the inuits will be encountered.
Nuances obviously matter when translating literary fiction or popular culture artifacs. I doubt this translation engine would handle most manga well. But it should work at least as well as BabelFish and perhaps, eventually, much better.
Re: Nuances Don't Matter Much
>If you’re translating technical or scientific documents, language nuances won’t matter too much. Japanese honorifics probably won’t be used much when discussing particle physics or nanotechnology.
I’ve done some technical Japanese-to-English translation work before. The problem is in the level of ambiguity allowed in Japanese language, which can make an English translation of a scientific paper sound vague, contradictory, or stupid. Often, one has to talk to the author directly to nail down an exact translation, though he may not be happy with the way it’s expressed.
>I doubt that in discussing object oriented programming, the 20 words for snow used by the inuits will be encountered.
Depends — if someone writes an Inuit manual and decides to use the 20 words for “snow” to illutrate polymorphism, then an English speaker is in trouble.
>But it should work at least as well as BabelFish and perhaps, eventually, much better.
BabelFish won’t be too hard to beat.
Re: Re: Nuances Don't Matter Much
Technical Japanese translation into English?
Between the specialized kanji and the fscking loanwords, it’s a pretty hopless…
Hell, even the movie “roadshowes” are the last on the planet to be released (of course that’s probably just the Japanese movie industry cashing their monopoli in)… some of the alternative titles are funny though…
Re: Nuances Don't Matter Much
Actually, “language nuances” matter a very great deal in translating scientific and technical documents. For example, a literal (mis)translation of a common phrase in Russian electronics reads, “the scheme is under reverse link.” This is very commonly encountered in MT systems, even statistical ones. You might wonder what the “scheme” is, or what a “reverse link” means. In fact, as anybody who knows electronics and Russian could tell you, the sentence should read, “Feedback was applied to the circuit.”
Even technical terms have hundreds of meanings (“skhema” in Russian has about 3 dozen) all of which depend on subtle clues in the text. Even skilled humans, who are pre-wired for ambiguity, have difficulty sorting them out.
20 words for snow
It doesn’t matter how many words for snow the Inuit have any more than it matters how many words for music (aka “genres”) Americans/Westerners have. There is often no literal meaning in our own language for words that we use on a regular basis. How do we figure out what they are supposed to mean? Usually we don’t consult a dictionary in the middle of conversation. Statistical machine translation may be as good a method of translation as the ones we employ on a daily basis. While there may be more precise idioms for a particular referent in one dialect than in another, words that don’t translate directly can be ported over into a reasonably flexible language; programs that can aid in translation may hasten the merging of tody’s languages into a universal pan-human language.
Re: 20 words for snow
>It doesn’t matter how many words for snow the Inuit have any more than it matters how many words for music (aka “genres”) Americans/Westerners have….There is often no literal meaning in our own language for words that we use on a regular basis.
That sums up a street hustler’s anti-intellectual attitude, but in the world of serious translation, such attitudes do not fly. Translations have important legal, scientific, and medical consequences. The wrong instructions given to a patient can kill him. The wrong wording for treaties can lead to wars. The wrong instructions for operating a crane can kill lots of people, and this has happened before.
>words that don’t translate directly can be ported over into a reasonably flexible language; programs that can aid in translation may hasten the merging of tody’s languages into a universal pan-human language.
People have tried this with Esperanto or other Klingon-league languages, but they have no real world use.
Re: Re: 20 words for snow
[…] Klingon-league languages, but they have no real world use […]
except for insulting newbies on-line…
…and letting the police try to find a translator who can read you your rights…