Google Says Clearview's Site Scraping Is Wrong; Clearview Reminds Google It Scrapes Sites All The Time

from the twospidermans.jpg dept

Clearview’s business model has resulted in some mutual finger pointing. The most infamous of facial recognition tech companies outsources its database development. Rather than seeking input from interested parties, it scrapes sites for pictures of faces and whatever personal info accompanies them. The scraped info forms the contents of its facial recognition database, putting law enforcement only a few app clicks away from accessing over 3 billion images.

The companies being scraped have claimed this is a violation of their terms of service, if not actually illegal. It’s not clear that it’s actually illegal, even if it does violate the restrictions placed on users of these services. Twitter has already sent a cease-and-desist to Clearview, but it will probably take a court to make this stick. Unfortunately, Clearview’s actions could lead to some damaging precedent if Twitter forces the issue. Given the number of sites affected by Clearview’s scraping efforts, it’s probably only a matter of time before this gets litigious.

But the finger pointed by Google at Clearview hasn’t obtained the reaction Google may have hoped for. As CBS News reports, Clearview has returned fire by comparing its business model to Google’s business model.

Google and YouTube have sent a cease-and-desist letter to Clearview AI, a facial recognition app that scrapes images from websites and social media platforms, CBS News has learned.

[…]

[Clearview CEOP Hoan] Ton-That also argued that Clearview AI is essentially a search engine for faces. “Google can pull in information from all different websites,” he said. “So if it’s public and it’s out there and could be inside Google search engine, it can be inside ours as well.”

He’s not wrong. Google’s bots crawl the internet non-stop, building a database for its search engine. But there is one key difference: website owners can opt out of Google’s indexing.

“Most websites want to be included in Google Search, and we give webmasters control over what information from their site is included in our search results, including the option to opt-out entirely. Clearview secretly collected image data of individuals without their consent, and in violation of rules explicitly forbidding them from doing so,” [YouTube spokesperson Alex Thomas] said in the statement to CBS News.

There’s no way to opt out of Clearview’s “service,” other than just not existing on the internet. Ton-That is correct in assuming there’s very little legal exposure in scraping publicly-available images from the net, but these statements don’t make him or his company any more sympathetic. Ton-That is serving up untested AI to as many law enforcement agencies as possible, encouraging them to test drive the app using faces of friends and family even as the company states the software should only be used for approved law enforcement purposes.

It also claims an accuracy rate of 99.6% for searches, but that number hasn’t been rigorously tested. What appears to be happening is a mass rollout of untested AI to law enforcement agencies via demo/trial accounts. Clearview claims to be working with over 600 law enforcement agencies but very few agencies have stated publicly they’ve used Clearview to perform investigations.

Clearview’s packaging of public information into a law enforcement app is unpleasant, but likely legal. The same thing goes on behind the scenes of multiple data aggregators that sell info and analytics directly to government agencies. The main difference here is Clearview hasn’t been shy about its desire to pitch a cheap app/database to law enforcement even as its product remains unproven and untested. And it puts cops a lot closer to their dystopian dream of being able to demand identification from anyone they run into on the streets.

Filed Under: , , , ,
Companies: clearview, clearview ai, google, twitter

Rate this comment as insightful
Rate this comment as funny
You have rated this comment as insightful
You have rated this comment as funny
Flag this comment as abusive/trolling/spam
You have flagged this comment
The first word has already been claimed
The last word has already been claimed
Insightful Lightbulb icon Funny Laughing icon Abusive/trolling/spam Flag icon Insightful badge Lightbulb icon Funny badge Laughing icon Comments icon

Comments on “Google Says Clearview's Site Scraping Is Wrong; Clearview Reminds Google It Scrapes Sites All The Time”

Subscribe: RSS Leave a comment
21 Comments
Anonymous Coward says:

Fair Use?

Thinking about it the true difference between Google and Clearview aside from ethics of helping to index the internet in a mutually beneficial way vs selling dubiously accurate facial recognition to force the burden of false positives on innocents might fall under a broad concept of Fair Use.

Google essentially "snippets" the content for display and indexing regardless of the internal implementation. Meanwhile Clearview takes the content and adds it to their training set. They still both gather and process public data but one is more expansive in its gathering. It may not be defined the same legally but it highlights how arbitrary not only the law but the underlying concepts are. Especially when compared to internet archives who essentially discards the processing more or less.

Dan (profile) says:

Re: Re: Fair Use?

That mechanism has become useless. People who don’t even know such a thing exists, can now put up a website in 30 minutes with the tools offered. That is, unless the Go Daddy’s, et al. turn on ‘no web crawling’ as the default. It’s another arms race, just like when telcom offered robocalling and then call blocking became a thing as a result.

Anonymous Coward says:

Re: Re:

This is not accurate at all. Google scrapes information from websites which isn’t wholly used for its search engine algorithms.

Google collects, and combines other data points, and sells this information to advertisers, who then utilize the information so Google can profit because its business model is selling ads, not a search engine.

For those who believe Google respects the "robots.txt" file, you’re misinformed. Google respects the file to only omit the page from its search algorithms.

It does not prevent is spiders from collecting information on the pages.

crade (profile) says:

I don’t see why Google wouldn’t be happy with this reply.. Having the ability to opt out isn’t just a key difference it is the difference between right or wrong.. Between collecting data with consent (at least the consent of the site hosting the data, the end user is another story) and without.

This is not accidental.. Clearview doesn’t allow opting out because their "service" is not a service but is straight exploitation and every site would just opt out if they could.

That One Guy (profile) says:

'But wait, there's more!'

And it puts cops a lot closer to their dystopian dream of being able to demand identification from anyone they run into on the streets.

It’s actually much worse, as facial recognition tech(assuming it even worked) means they don’t need to demand identification. All they need to do is run your face through whatever database they’re using and you’ve unwittingly provided your identification, with no chance to refuse.

It’s somewhat similar to the encryption debacle, being the difference between them having to get the key/password from someone who can protest and potentially fight back when it actually matters, versus them already having a key/password and the only ability to object being after the fact when it’s already too late.

Anonymous Coward says:

He’s not wrong. Google’s bots crawl the internet non-stop, building a database for its search engine. But there is one key difference: website owners can opt out of Google’s indexing.

Why would that be a "key" difference?

1) Opt-out systems are bullshit in general.
2) If Clearview did obey robots.txt, users could only block them from getting pictures off of sites controlled by those users. They’d have no control of Clearview’s access to sites hosted by Google, Twitter, etc.

Dan (profile) says:

Shooting the wrong horse...

I find it amusing that everyone is blaming Clearview. (Not that I’m condoning their actions. Far from it…) The underlying issue is who is getting the data…law enforcement. "Well gee, if the cops don’t have this information, they can’t burst into homes with a battering ram, and possibly kill people."

In the land of sanity, people would be arguing that law enforcement shouldn’t be doing that to start with. Then again, is way easier to go after one company then go after a much bigger organization that is not only their customer, but is also causing the actual problem.

Leave a Reply to Anonymous Coward Cancel reply

Your email address will not be published. Required fields are marked *

Have a Techdirt Account? Sign in now. Want one? Register here

Comment Options:

Make this the or (get credits or sign in to see balance) what's this?

What's this?

Techdirt community members with Techdirt Credits can spotlight a comment as either the "First Word" or "Last Word" on a particular comment thread. Credits can be purchased at the Techdirt Insider Shop »

Follow Techdirt

Techdirt Daily Newsletter

Ctrl-Alt-Speech

A weekly news podcast from
Mike Masnick & Ben Whitelaw

Subscribe now to Ctrl-Alt-Speech »
Techdirt Deals
Techdirt Insider Discord
The latest chatter on the Techdirt Insider Discord channel...
Loading...