How Refugee Applications Are Being Lost In (Machine) Translation

from the AI-not-I dept

As you may have noticed, headlines are full of the wonders of chatbots and generative AI these days. Although often presented as huge breakthroughs, in many ways they build on machine learning techniques that have been around for years. These older systems have been deployed in real-life situations for some time, which means they provide valuable information about the possible pitfalls of using AI for serious tasks. Here is a typical example of what has been happening in the world of machine translation when applied to refugee applications for asylum, as reported on the Rest of the World site:

A crisis translator specializing in Afghan languages, Mirkhail was working with a Pashto-speaking refugee who had fled Afghanistan. A U.S. court had denied the refugee’s asylum bid because her written application didn’t match the story told in the initial interviews.

In the interviews, the refugee had first maintained that she’d made it through one particular event alone, but the written statement seemed to reference other people with her at the time — a discrepancy large enough for a judge to reject her asylum claim.

After Mirkhail went over the documents, she saw what had gone wrong: An automated translation tool had swapped the “I” pronouns in the woman’s statement to “we.”

That’s a tiny difference, and one that today’s machine translation programs can easily miss, especially for languages where training materials are still scarce. And yet the consequences of the shift from singular “I” to plural “we” can have life-changing consequences – in the case above, whether asylum was granted to a refugee fleeing Afghanistan. There are other problems too:

Based in New York, the Refugee Translation Project works extensively with Afghan refugees, translating police reports, news clippings, and personal testimonies to bolster claims that asylum seekers have a credible fear of persecution. When machine translation is used to draft these documents, cultural blind spots and failures to understand regional colloquialisms can introduce inaccuracies. These errors can compromise claims in the rigorous review so many Afghan refugees experience.

In the future it is likely that the number of people seeking asylum will increase, not least because of environmental refugees who are fleeing lands made uninhabitable by climate change. Their applications for asylum elsewhere are likely to involve a wider range of lesser-known languages. Turning to machine translation will be a natural move by the authorities, since it takes time and resources to recruit specialist human translators.

The new generation of AI tools and their high-profile abilities will encourage this trend, as well as their use to evaluate applications and to make recommendations about whether they should be accepted. The Rest of the World article points out that OpenAI, the company that is behind ChatGPT, updated its user policies in late March with the following as “Disallowed usage of our models”:

High risk government decision-making, including:

  • Law enforcement and criminal justice
  • Migration and asylum

Governments trying to save money will doubtless use them anyway. It will be important for courts and others dealing with asylum claims to bear this in mind when there seem to be serious discrepancies in refugees’ applications. They may be all in the (machine’s) mind.

Follow me @glynmoody on Mastodon.

Filed Under: , , , , , , , , , ,
Companies: openai

Rate this comment as insightful
Rate this comment as funny
You have rated this comment as insightful
You have rated this comment as funny
Flag this comment as abusive/trolling/spam
You have flagged this comment
The first word has already been claimed
The last word has already been claimed
Insightful Lightbulb icon Funny Laughing icon Abusive/trolling/spam Flag icon Insightful badge Lightbulb icon Funny badge Laughing icon Comments icon

Comments on “How Refugee Applications Are Being Lost In (Machine) Translation”

Subscribe: RSS Leave a comment
19 Comments
Darkness Of Course (profile) says:

Intelligent it ain't

Modern AI is simply deep learning ML, a style if you will of building a more complex ML solution. LLM was difficult prior because it required processing vast quantities of data. Now, modern solutions are possible.

But, they are intended to be used in conversation. They search for relevant tokens, and push out a reply. Additional prompts are seen within the previous replies. All to produce a good feeling/sounding conversation.

Much like a gracious fool the conversation seems fine, but a person knowledgeable in a specific field will spot the errors. The problem raises its head when the innocent are entranced by the sound of the reply, even the wrong ones.

Rich (profile) says:

Government

As the Government leans more and more on AI to do its job for it (Facial Recognition, crime prediction, prison sentencing, bail, parole, etc…), I can only see this sort of thing getting much, much worse. The fewer actual people involved, the more this sort of thing will snowball.

AI systems will be autotranslating Government responses to AI translated citizen requests, and unpredictable levels damage will ensue from the epic levels of confusion created by this technologically twisted telephone game.

Reminds me of something the Great Ron Bennington once said:
“Idiocracy was the funniest movie I couldn’t laugh at”

Anonymous Coward says:

An automated translation tool had swapped the “I” pronouns in the woman’s statement to “we.”

That’s not necessarily specific to translation. This commenter was told in high school to avoid writing “I” in formal contexts, and later had to redo a college work-study report because it wasn’t clear what, if anything, “I” did. It turned out that several other “rules” we learned were bullshit too.

This case is kind of interesting because there’s precedent for referring to oneself as “we”. In English it’s sometimes called the “royal we” and tends to be seen as pompous, so it’s not often used. But other languages differ; e.g., quoting Wikipedia, “In Hindustani and other Indo-Aryan languages, the majestic plural is a common way for elder speakers to refer to themselves”—in which case “we” would arguably be the most accurate translation.

“Disallowed usage of our models”:

…Right. Did they write that in every language, or just the one that we know the users in this case were unfamiliar with? Can this tool actually detect someone trying to use it in prohibited ways?

Anonymous Coward says:

Re: Re:

Making everyone on the planet learn one language isn’t politically, economically, or socially feasible.

Realistically, many countries are already making all their students learn English, to ensure the countries remain relevant to international business and tourism. Some, such as Norway, get very good results; others, such as Japan, less so.

Anonymous Coward says:

Re: Re: Re:

There’s a difference between people deciding it is economically advantageous to learn a language and having a language imposed. Unless you want to argue that capitalism is inherently coercive?

Esperanto has had 150 years to establish itself and has fewer than 200 000 speakers. If you’re evaluating it as a solution to the problem of international communication, it has categoriacally failed.

BernardoVerda (profile) says:

Re: Re: Re:2

Imposed? Where do you get “imposed” from? It’s funny how critics of concept of an International Auxiliary Language in general, and Esperanto in particular, are so prone to start off with this tediously tendentious misrepresentation. The whole point of Esperanto is to have a very easy, reasonably neutral, common second language (ie. not a “replacement” for anyone’s existing native language or “mother tongue”). In any case, such an IAL would certainly be less “imposed” on anybody than English, French, Spanish or Portuguese.

Credible estimates of how many Esperanto speakers there are vary wildly, partly because the definition of “Esperanto speaker” in such estimates varies wildly, and partly because it’s simply hard to do (for the much the same reasons as it’s difficult to try to calculate a good estimate of how many people “play chess”). An estimate of 200,000 is quite low, 2 million is quite credible, over 4 million seems rather high, but isn’t impossible, if one accepts “speaker” as being somewhere around CEFR B1/B2 level.

In any case, Esperanto is clearly an “established” language, with more speakers than some European languages that no one would dispute as “real” languages. It’s a living language, used around the world in Europe, Asia, Africa, and the Americas.

As far as “evaluating [Esperanto] as a solution to the problem of international communication, it has categorically failed” goes, it has actually worked rather well, wherever it has actually been tried. The reasons Esperanto hasn’t been employed on a larger scale have had much more to do with language politics, than any inherent problems with Esperanto itself.

BernardoVerda (profile) says:

Re: Re:

Nah. The Esperanto revisionists (starting with Ido) never managed to get the majority, or even a significant minority of actual Esperanto users to follow along — and then they splintered into even more revisions (in part because they could never win over a meaningful number of new users), which would splinter yet again.

Because of course, it turned out that though the revisionists agreed that original (and still official) Esperanto ever so clearly “wasn’t good enough”, and a “wonderful idea, but poorly implemented, needs fixing”, they had rather differing ideas on what exactly* was wrong** with it, or what should be changed and why — one advocate’s notion of an obvious improvement was inevitably another proponent’s notion of a really backwards, objectively terrible idea. (Rinse and repeat.)

Meanwhile, most actual users were going “Hey, it’s pretty darn easy, and it works pretty darn well — and none of the “improved” versions offer nearly enough benefit (if any) to make a switch worth the trouble.” And so they went on their merry way, writing novels and poetry and travel guides and literary translations and meteorological papers… and publishing magazines and creating labor union organizations and post-war family-reunification organizations and…

*) for example:

  • this is “too regular, too mechanical — a good international language needs to be more naturalistic”, versus “this is too naturalistic, a constructed language should be absolutely, perfectly regular and perfectly, mathematically reversible — it needs to be more mechanically simplistic.”
    or
  • “we absolutely need to get rid of that “complicated” accusative case — who even needs a case at all when you can just use stricter word order instead?” versus “no case at all means you need a whole bunch of extra, complicating rules and/or prepositions to govern word order — and you lose very valuable flexibility and fluency; really, why sacrifice proven utility for a purely theoretical objection to having cases at all?”

**) besides being different from the “civilized” European national languages they were already familiar with (and yeah, they often even said the “civilized” part out loud).

Add Your Comment

Your email address will not be published. Required fields are marked *

Have a Techdirt Account? Sign in now. Want one? Register here

Comment Options:

Make this the or (get credits or sign in to see balance) what's this?

What's this?

Techdirt community members with Techdirt Credits can spotlight a comment as either the "First Word" or "Last Word" on a particular comment thread. Credits can be purchased at the Techdirt Insider Shop »

Follow Techdirt

Techdirt Daily Newsletter

Subscribe to Our Newsletter

Get all our posts in your inbox with the Techdirt Daily Newsletter!

We don’t spam. Read our privacy policy for more info.

Ctrl-Alt-Speech

A weekly news podcast from
Mike Masnick & Ben Whitelaw

Subscribe now to Ctrl-Alt-Speech »
Techdirt Deals
Techdirt Insider Discord
The latest chatter on the Techdirt Insider Discord channel...
Loading...