Clearview Forbids Users From Scraping Its Database Of Images It Scraped From Thousands Of Websites
from the don't-scrape-me-bro dept
Clearview continues to dominate the “Most Hated” category in the facial recognition tech games. And with Amazon tossing aside its “Rekognition” program for the time being (it’s spelled with a K because the AI tried to spell “recognition” correctly and failed), Clearview has opened up what could be an insurmountable lead.
Clearview has been sued, investigated, banned by law enforcement agencies, and suffered numerous self-inflicted wounds. Underneath Clearview’s untried and untested AI lies an underbedding composed of the internet. The ~4 billion images in Clearview’s database have been scraped from public posts and accounts hosted by thousands of websites and dozens of social media platforms.
There’s nothing inherently wrong with scraping sites to make use of information hosted there. In fact, this often controversial power can sometimes be used for good. The last thing we need is Clearview’s questionable tech convincing legislators, prosecutors, and courts that scraping sites is something only criminals do.
Clearview called out Google’s apparent hypocrisy on the subject of site scraping when Google sent a cease-and-desist demanding it stop harvesting images and data from Google’s online possessions. But Clearview is apparently unable to recognize its own hypocrisy. While it’s cool with site scraping when it can benefit from it, it frowns upon others perpetrating this “harm” on its own databases.
Eerily reminiscent of Disney’s take on the public domain (good when Disney uses it, bad when Disney’s copyrights are set to expire) is Clearview’s take on site scraping. Its user agreement [PDF] with the Evansville, Indiana police department (obtained by MuckRock user J Ader) contains this paragraph:
The use of automated systems or software to extract the whole or any part of the Service or Website, the Information or data on or within the Service or the Website, including image search results or source code, for any purposes (including uses commonly known as “scraping”) is strictly prohibited.
Also bundled in this package of public records is Clearview’s laughable “accuracy” test. It compares itself to Rekognition and its highly publicized failure. When Amazon’s tech was tested, it misidentified several DC legislators as criminals, especially those that weren’t white and male.
Clearview touts its own success in this document [PDF], which covers a non-independent test of its AI performed in 2019. Here are the results:
The test compared the headshots from all three legislative bodies against Clearview’s proprietary database of 2.8 billion images (112,000 times the size of the database used by the ACLU). The Panel determined that Clearview rated 100% accurate, producing instant and accurate matches for every one of the 834 federal and state legislators in the test cohort.
LOL. This is proof of nothing. Anyone with access to a reverse image search could perform this test with the same accuracy. While Amazon’s AI was tested against arrestees’ mugshots, Clearview’s was tested against photos and info scraped from social media profiles and public websites. Of course it was able to positively identify politicians, most of whom maintain multiple social media accounts and websites. It would only be notable if the AI had failed to perform this simple task given the wealth of information it had to work with.
In conclusion, Clearview sucks. Its tech is unproven and its policy on scraping is the apex of hypocrisy. On the other hand, the company seems to be harvesting criticism as fast as its harvesting web content, so the prognosis on its continued survival remains refreshingly bleak.