Techdirt's think tank, the Copia Institute, is working with the Trust & Safety Professional Association and its sister organization, the Trust & Safety Foundation, to produce an ongoing series of case studies about content moderation decisions. These case studies are presented in a neutral fashion, not aiming to criticize or applaud any particular decision, but to highlight the many different challenges that content moderators face and the tradeoffs they result in. Find more case studies here on Techdirt and on the TSF website.

Content Moderation Case Study: Google's Photo App Tags Photos Of Black People As 'Gorillas' (2015)

from the automation-isn't-the-answer dept

Summary: In May 2015, Google rolled out its “Google Photos” service. This service allowed users to store their images in Google’s cloud and share them with other users. Unlike some other services, Google’s photo service provided unlimited storage for photos under a certain resolution, making it an attractive replacement for other paid services.

Unfortunately, it soon became apparent the rollout may have outpaced internal quality control. The built-in auto-tagging system utilizing Google’s AI began tagging Black people as “gorillas,” resulting in backlash from users and critics who believed Google’s algorithm was racist.

Google’s immediate response was to apologize to users. The Twitter user who first noticed the tagging error was contacted directly by Google, which began tackling the problem that made it out of beta unnoticed. Google’s Yonatan Zunger pointed out the shortcomings of AI when auto-tagging photos, noting the company’s previous problems with mis-tagging people (of all races) as dogs and struggles with less-than-ideal lighting or low picture resolution. In fact, Google’s rollout misstep mirrored Flickr’s own struggles with auto-tagging photos, which similarly resulted in Black people being labeled as “ape” or “animal.”

Decisions to be made by Google:

  • Would more diversity in product development/testing teams increase the chance issues like this might be caught before services go live?
  • Can additional steps be taken to limit human biases from negatively affecting the auto-tag AI?
  • Should more rigorous testing be performed in the future, given the known issues with algorithmic photo tagging?

Questions and policy implications to consider:

  • Does seemingly inconsequential moderation like this still demand some oversight by human moderators?
  • Will AI ever be able to surmount the inherent biases fed into it by those designing and training it?

Resolution: As of 2018, Google was still unable to completely eliminate this problem. Instead, it chose to eliminate the problematic tags themselves, resulting in no auto-tags for terms like “gorilla,” “chimp,” “chimpanzee,” and “monkey.” An investigation by Wired showed searches of Google Photos images returned zero results for these terms. Google said it was working on “longer-term fixes” but put no end date on when those fixes would arrive. It also acknowledged those terms had been blocked by Google and would remain blocked until the problem was solved.

Originally published at the Trust & Safety Foundation website.

Filed Under: , , , ,
Companies: google

Rate this comment as insightful
Rate this comment as funny
You have rated this comment as insightful
You have rated this comment as funny
Flag this comment as abusive/trolling/spam
You have flagged this comment
The first word has already been claimed
The last word has already been claimed
Insightful Lightbulb icon Funny Laughing icon Abusive/trolling/spam Flag icon Insightful badge Lightbulb icon Funny badge Laughing icon Comments icon

Comments on “Content Moderation Case Study: Google's Photo App Tags Photos Of Black People As 'Gorillas' (2015)”

Subscribe: RSS Leave a comment
Anonymous Coward says:

The thing I still don’t get about this "mistake" is that AI/ML learns by being rewarded for successes and/or punished for failures. What determines success or failure is done by humans. In order for Google Photos AI to have learned that a black face == gorilla someone had to tag photos with "gorilla" for the software to learn from.

Where the bias/racism comes in is that whoever fed the AI photos of gorillas did not also feed it photos of black people tagged "person" so that the AI could learn the difference. Garbage in, garbage out. And a thoroughly trained AI is hard to retrain, a lot like people (ML is modeled on human learning after all).

Google never did take proper responsibility for that fuckup. I’m not sure the proper amount of shame stuck to them either. Good on you for keeping that colossal screw up alive.

This comment has been deemed insightful by the community.
OldMugwump (profile) says:

Re: I'm not even sure it's a "fuckup"

I have a lot of sympathy for Google here (or anybody else who tries this).

This is machine learning, where the machine learns what things are called based on already-tagged photos created by humans. The machine doesn’t decide for itself what a "gorilla" is; it learns based on the tags it sees.

Using that approach and scanning the Internet for tagged photos, a search on "idiot" is going to pull up photos of politicians, because lots of people tag politicians they don’t agree with as "idiots". That doesn’t mean those politicians ARE idiots, it just means that some people call them that. (That is, some significant number more than a few random noise tags that show no pattern.)

Same for "gorilla". As we all know there are lots of racists online who tag photos of black people with names of various apes. The machine is ultimately going to learn that, and that association is going to show up in the machine’s search results.

It’s not the machine’s fault, it’s not Google’s fault, it’s the fault of the racists who post such tags.

It’s not practically feasible to filter out such common, but racist, tags. The only reason this kind of machine learning works at all is because the machine can learn from (literally) millions of pre-existing examples online without manual intervention. If humans have to filter the training set to remove racist (or any other kind of biased) tags, the whole thing becomes impractical.

To the extent we’re going to use this technology at all, we have to accept that biases in the training set are going to show up in the outputs.

Or just choose not to use the tech.

Anonymous Coward says:

Re: Re: Re: You should adjust your expectations.

> "It’s not practically feasible to filter out such common, but racist, tags. "

But it is practical and feasible to sell a defective product to the unsuspecting public.

The "product" here is not "given a picture, tell me what is depicted". The product is "given a picture, tell me what word(s) people most commonly associate with pictures that have similar characteristics." That is a VASTLY different thing.

For instance, do you imagine that "black people" will label pictures of their family as "black people"? Of course not! They’ll label them mom, dad, Uncle Henry, Aunt Jerome, Army Cadet, Space Marine Costume, Hollywood Star, or whatever. So tell me, where does the label "black people" that you are expecting, come from?

Defective? No. Sabotaged? Yes. "This is why we can’t have nice things."

Anonymous Coward says:

Re: Re: Re:2 You should adjust your expectations.

"Defective? No."

If I buy something and it does not meet what was advertised, I consider it to be defective and return it.
If the seller tells me it was designed that way I still want my money back no matter the excuses.
Am I being unreasonable here? I don’t think so.

Possibly, the advertised functionality is not technically feasible at this time, or ever. Why pretend that it s?

Arthur Moore (profile) says:

Re: Re: Re:

Sure they do. You may have noticed face detection and auto white balance on cameras. On the production side, using smart filters to do things like background removal means that we can get pretty good results without having to pay someone to rotoscope every frame by hand.

These sorts of things reduce the barrier to entry and mean that one person can do what once took a team months. They’re invaluable tools, especially for anyone who isn’t a blockbuster movie.

As an example, if the face detector doesn’t recognize a skin color or face shape then it doesn’t work with some actors. Can you imagine that conversation?

"We can’t cast you, our tools don’t work with your skin color." "Prepare to loose a major lawsuit."

Anonymous Coward says:

Re: Re: Re:

Film —- and digital cameras. Not film cameras and digital cameras. The problem isn’t AI, but a bias in how the systems were tested and tuned for accurate depiction.

In addition to what Arthur Moore said, but (color) film and the process of interpreting data from CCDs (or whatever the image sensor is in cameras these days) requires definition and tuning. Neither simply "capture raw reality". And there seems to be a trade-off with phone cameras, as they are mostly made for capturing faces, leaving many other things with wrong color entirely. They also seem to have been trained on skin tones, causing others to appear different than they really are, although that may or may not have improved in recent years. Color film had and probably still has similar issues, where the color balance works pretty good for some skin tones while washing out or overly darkening others, or leaving them with strange hues. It is entirely common, for instance, to see film photographs, especially older ones, showing black people many shades darker than their actual skin tone. And lets not even get into printing / lithography and photo "touch-up" decisions.

smbryant (profile) says:

Re: Missing paragraph?

Ah, from the original article:

Unfortunately, it soon became apparent the rollout may have outpaced internal quality control. The built-in auto-tagging system utilizing Google’s AI began tagging Black people as "gorillas," resulting in backlash from users and critics who believed Google’s algorithm was racist.

Zonker says:

Interesting in light of this article that just days ago Google fired one of their leading AI ethics researchers, Dr. Timnit Gebru, for essentially refusing to retract on demand a research paper she co-authored and complaining about it just before taking a planned vacation.

Dr Gebru is a well-respected researcher in the field of ethics and the use of artificial intelligence.

She is well-known for her work on racial bias in technology such as facial recognition, and has criticised systems that fail to recognise black faces.

Anonymous Coward says:

Re: Re:

She honestly comes across as a complete deva and drama queen in the article. Posting to her own clique subgroup "You are not worth having any conversations about this, since you are not someone whose humanity… is acknowledged or valued in this company," Talk about a martyr complex – even if you don’t believe the claims of Google that it was her tableflipping over bureaucracy for reviewing papers they implicitly advocate there isn’t any treatment cited which justifies that level of hysteronics.

Ed (profile) says:

Re: Re: Re:

Exactly. Unfortunately, she and her supporters are trying to conflate her leaving with racism when it is not so. Google let her go because she was insubordinate and causing problems, thinking herself to be irreplaceable. She gave Google an ultimatum that if they didn’t let her do what she wanted she would leave. Google said, "Bye", so she flipped out and brought the wrath of the woke internet down on Google.

crade (profile) says:

Eliminating any problematic tags seems like the best solution for me.

"Will AI ever be able to surmount the inherent biases fed into it by those designing and training it?"

Not sure it matters when you do this sort of this at a basically infinite scale and your AI is never going to be perfect, you should assume you will eventually get every combination of mistaken matches out there. It doesn’t matter if there is any inherent bias or not, the AI could have no bias, be nearly perfect and the majority of mistaken matches could be innocuous and it wouldn’t help you any when you hit a bad one.
I think if you can identify which tags are going to be incredibly offensive and make you look terrible, removing them as options is a perfectly fine solution.

That One Guy (profile) says:

'Should we test this on non-whites?' '... nah.'

Just… how? How does something that huge make it through testing without being caught, did they test literally zero non-white faces or did they just get insanely lucky on the ones they did test?

Whatever the case gotta say, having a multi-billion dollar company not able to solve this does not exactly create confidence in similar tech offered by smaller companies and employed by cities and/or schools.

crade (profile) says:

Re: 'Should we test this on non-whites?' '... nah.'

They are running a probability based guessing game round to infinity times. Their goal is matching correctly "often" and potentially improve that percentage over time, not to match correctly always.

They should expect every possible mismatch of tags to come up. They should be looking at each of their tags and thinking "what’s the worst that could happen" because it will

OldMugwump (profile) says:

Re: 'Should we test this on non-whites?' '... nah.'

Test what, exactly?

The only way they’d find this out before rolling it out is to search on "gorilla" and discover the result.

Testing it on non-white faces probably worked fine, same as on white faces.

Unless they anticipated the racist "gorilla" result in advance, and went looking for it.

Even for racists, that’s an unlikely thing to try. All the more so for a non-racist person who never would have thought of testing for that outcome in the first place.

That One Guy (profile) says:

Re: Re: 'Should we test this on non-whites?' '... nah.'

Scan a selection of faces of various races in various lighting conditions and make sure the results came out as expected is the first thing that comes to mind, though I suppose they could have done so and it was just dumb luck that they didn’t test any photos that came back with the wrong tag.

Anonymous Coward says:

Re: Re: Re: 'Should we test this on non-whites?' '... nah.'

Scan a selection of faces of various races in various lighting conditions and make sure the results came out as expected…

I think this case is more one of training the AI on a public corpus that was garbage, instead of a hand curated one. If you have 5 pictures tagged "black person" and 50 tagged "gorilla", which is the AI going to choose?

nasch (profile) says:

Re: 'Should we test this on non-whites?' '... nah.'

There could be tens of millions of black faces correctly tagged, and dozens or hundreds or thousands with these bad tags. If the numbers are anything like that, it would be easy to get through testing without encountering any of the problem scenarios. I don’t know if that’s how it went down, or if testing really was inadequate, but from what I’ve read there’s not enough information to tell which.

Anonymous Coward says:

Way to go Quality Control department!
(assuming they were tasked with testing this thing)

The general public has for some time now been the QA dept for many a corner cutting enterprise. Get it out by Friday and let the chips fall where they may. Management may say these things but if it actually happens look out cause it’s not their fault at all just because they told you to not test it.

ECA (profile) says:

90% of the problem

Most of you have seen it, but another part is Humans being able to SHOW the computer what to look for.
There are a few tricks here that I dont know if you know about Digital cameras.
All of them use IR for the Black and white texture(Not totally sure about that anymore). And some even use UV to measure distance from reflection. Taking most of these pictures and Shifting the spectrum Could make it abit easier to ID most people.
If this isnt really being done, then Tall the camera makers to do it. Even if using radar/pulse wave to range a picture Take a sound picture of the face as well as record the UV and IR signals. this should give enough detail in most cases to ID most anyone. At least of the Police ID.

Anonymous Coward says:

"AI" is way overused and doesn’t imply what the unfamiliar think it does. A more accurate term is Machine Learning (ML).

ML works by setting up an empty "brain" (really just a database of past trials and results) and feeding it tons of pre-answered inputs. The machine tries to produce an answer and is either punished or rewarded depending on accuracy (and whatever other metrics you might wish to apply such as speed). Punishment and reward are simply scores applied to each trial.

With each successive trial the machine uses past results to influence how it tries again. With sufficient training (read: thousands and thousands of trials) the machine gets better and better at achieving a higher score with each trial.

ML’s purpose is to find a pattern that can get from input to result in the fewest steps. Then you can feed it brand new inputs without predetermined solutions and let the pattern find it for you. Unfortunately the results are not as accurate as a human no matter how many inputs and how many trials we throw at the machines. There are just too many factors we use to do things like analyze photos and machines are too hard to train at that level of detail. As a result, computer facial recognition won’t achieve human accuracy for a very very long time and then hopefully it will do a better job because we kinda suck at it, too.

Add Your Comment

Your email address will not be published.

Have a Techdirt Account? Sign in now. Want one? Register here

Comment Options:

Make this the or (get credits or sign in to see balance) what's this?

What's this?

Techdirt community members with Techdirt Credits can spotlight a comment as either the "First Word" or "Last Word" on a particular comment thread. Credits can be purchased at the Techdirt Insider Shop »

Follow Techdirt

Techdirt Daily Newsletter

Techdirt Deals
Techdirt Insider Discord
The latest chatter on the Techdirt Insider Discord channel...
Older Stuff
15:43 Content Moderation Case Study: Facebook Struggles To Correctly Moderate The Word 'Hoe' (2021) (21)
15:32 Content Moderation Case Study: Linkedin Blocks Access To Journalist Profiles In China (2021) (1)
16:12 Content Moderation Case Studies: Snapchat Disables GIPHY Integration After Racist 'Sticker' Is Discovered (2018) (11)
15:30 Content Moderation Case Study: Tumblr's Approach To Adult Content (2013) (5)
15:41 Content Moderation Case Study: Twitter's Self-Deleting Tweets Feature Creates New Moderation Problems (2)
15:47 Content Moderation Case Studies: Coca Cola Realizes Custom Bottle Labels Involve Moderation Issues (2021) (14)
15:28 Content Moderation Case Study: Bing Search Results Erases Images Of 'Tank Man' On Anniversary Of Tiananmen Square Crackdown (2021) (33)
15:32 Content Moderation Case Study: Twitter Removes 'Verified' Badge In Response To Policy Violations (2017) (8)
15:36 Content Moderation Case Study: Spam "Hacks" in Among Us (2020) (4)
15:37 Content Moderation Case Study: YouTube Deals With Disturbing Content Disguised As Videos For Kids (2017) (11)
15:48 Content Moderation Case Study: Twitter Temporarily Locks Account Of Indian Technology Minister For Copyright Violations (2021) (8)
15:45 Content Moderation Case Study: Spotify Comes Under Fire For Hosting Joe Rogan's Podcast (2020) (64)
15:48 Content Moderation Case Study: Twitter Experiences Problems Moderating Audio Tweets (2020) (6)
15:48 Content Moderation Case Study: Dealing With 'Cheap Fake' Modified Political Videos (2020) (9)
15:35 Content Moderation Case Study: Facebook Removes Image Of Two Men Kissing (2011) (13)
15:23 Content Moderation Case Study: Instagram Takes Down Instagram Account Of Book About Instagram (2020) (90)
15:49 Content Moderation Case Study: YouTube Relocates Video Accused Of Inflated Views (2014) (2)
15:34 Content Moderation Case Study: Pretty Much Every Platform Overreacts To Content Removal Stimuli (2015) (23)
16:03 Content Moderation Case Study: Roblox Tries To Deal With Adult Content On A Platform Used By Many Kids (2020) (0)
15:43 Content Moderation Case Study: Twitter Suspends Users Who Tweet The Word 'Memphis' (2021) (10)
15:35 Content Moderation Case Study: Time Warner Cable Doesn't Want Anyone To See Critical Parody (2013) (14)
15:38 Content Moderation Case Studies: Twitter Clarifies Hacked Material Policy After Hunter Biden Controversy (2020) (9)
15:42 Content Moderation Case Study: Kik Tries To Get Abuse Under Control (2017) (1)
15:31 Content Moderation Case Study: Newsletter Platform Substack Lets Users Make Most Of The Moderation Calls (2020) (8)
15:40 Content Moderation Case Study: Knitting Community Ravelry Bans All Talk Supporting President Trump (2019) (29)
15:50 Content Moderation Case Study: YouTube's New Policy On Nazi Content Results In Removal Of Historical And Education Videos (2019) (5)
15:36 Content Moderation Case Study: Google Removes Popular App That Removed Chinese Apps From Users' Phones (2020) (28)
15:42 Content Moderation Case Studies: How To Moderate World Leaders Justifying Violence (2020) (5)
15:47 Content Moderation Case Study: Apple Blocks WordPress Updates In Dispute Over Non-Existent In-app Purchase (2020) (18)
15:47 Content Moderation Case Study: Google Refuses To Honor Questionable Requests For Removal Of 'Defamatory' Content (2019) (25)
More arrow