Using AI To Identify Car Models In 50 Million Google Street Views Reveals A Wide Range Of Demographic Information

from the you-are-what-you-drive dept

Google Street View is a great resource for taking a look at distant locations before travelling, or for visualizing a nearby address before driving there. But Street View images are much more than vivid versions of otherwise flat maps: they are slices of modern life, conveniently sorted by geolocation. That means they can provide all kinds of insights into how society operates, and what the differences are geographically. The tricky part is extracting that information. An article in the New York Times reports on how researchers at Stanford University have applied artificial intelligence (AI) techniques to 50 million Google Street View images taken in 200 US cities. Since analyzing images of people directly is hard and fraught with privacy concerns, the researchers concentrated on a proxy: cars. As an academic paper published by the Stanford team notes (pdf):

Ninety five percent of American households own automobiles, and as shown by prior work cars are a reflection of their owners’ characteristics providing significant personal information.

First, the AI system had to be trained to find cars in the Google Street Map images. That’s something that’s easy for humans to do, but hard for computers, while the next stage of the work — identifying car models — is much easier using AI. As another paper reporting on the research (pdf) explains:

the fine-grained object recognition task we perform here is one that few people could accomplish for even a handful of images. Differences between cars can be imperceptible to an untrained person; for instance, some car models can have subtle changes in tail lights (e.g., 2007 Honda Accord vs. 2008 Honda Accord) or grilles (e.g., 2001 Ford F-150 Supercrew LL vs. 2011 Ford F-150 Supercrew SVT). Nevertheless, our system is able to classify automobiles into one of 2,657 categories, taking 0.2 s per vehicle image to do so. While it classified the automobiles in 50 million images in 2 wk, a human expert, assuming 10 s per image, would take more than 15 y to perform the same task.

The difference between the two weeks taken by the AI software, and the 15 years a human would need, means that it is possible to analyze much larger data collections than before, and to extract new kinds of information. This is done by using existing datasets, for example the American Community Survey, which is performed by the US Census Bureau each year, to train the AI system to spot correlations between cars and demographics. The New York Times article lists some of the results that emerge from mining and analyzing the Google Street Map images, and adding in metadata from other sources:

The system was able to accurately predict income, race, education and voting patterns at the ZIP code and precinct level in cities across the country.

Car attributes (including miles-per-gallon ratings) found that the greenest city in America is Burlington, Vt., while Casper, Wyo., has the largest per-capita carbon footprint.

Chicago is the city with the highest level of income segregation, with large clusters of expensive and cheap cars in different neighborhoods; Jacksonville, Fla., is the least segregated by income.

New York is the city with the most expensive cars. El Paso has the highest percentage of Hummers. San Francisco has the highest percentage of foreign cars.

The researchers point out that the rise of self-driving cars with on-board cameras will produce even more street images that could be fed into AI systems for analysis. They also note that walking around a neighborhood with a camera — for example, in a smartphone — would allow image data to be gathered very simply and cheaply. And as AI systems become more powerful, it will be possible to extract even more demographic information from apparently innocuous street views. Although that may be good news for academic researchers, datamining offline activities clearly creates new privacy problems at a time when people are already worried about what can be gleaned from datamining their online activities.

Follow me @glynmoody on Twitter or identi.ca, and +glynmoody on Google+

Filed Under: , ,
Companies: google

Rate this comment as insightful
Rate this comment as funny
You have rated this comment as insightful
You have rated this comment as funny
Flag this comment as abusive/trolling/spam
You have flagged this comment
The first word has already been claimed
The last word has already been claimed
Insightful Lightbulb icon Funny Laughing icon Abusive/trolling/spam Flag icon Insightful badge Lightbulb icon Funny badge Laughing icon Comments icon

Comments on “Using AI To Identify Car Models In 50 Million Google Street Views Reveals A Wide Range Of Demographic Information”

Subscribe: RSS Leave a comment
18 Comments
Peter (profile) says:

Is the project finally finished?

“find[ing] cars in the Google Street Map images [is] easy for humans to do, but hard for computers, while the next stage of the work — identifying car models — is much easier using AI.”

During the last months, Google captcha required up to several dozen mouse clicks identifying cars or traffic signs before it finally accepted that a user might be human – essentially, Google turned a considerable part of the world’s population into mechanical turks to help with their project.

Is Google’s project now finished, so we can back to clicking once or twice to prove we are not machines?

Anonymous Coward says:

Re: Is the project finally finished?

I thought they were just being extra careful about bot IDing of car images to exploit Captchas. Now it seems potentially more sinister.

Just goes to show the orgs who got Captcha functionality could just be being used as well. If it cost anything to implement the Captcha, you’d be paying Google to make use of your users’ judgement too.

Not sure if the org would think it was a fair trade. It would at least make their users/customers a bit more leery of Captchas, if not incensed that a verification system might be co-opting their eyes for Google’s purposes.

Pete Austin says:

There are lots of other proxies

My father always judged a neighborhood by its cars.

Satellite dishes are similarly valuable. For example, the number of satellite dishes on a building is a proxy for the number of apartments, and the type (e.g. in the UK, Sky dishes vs larger hotbird dishes) shows the occupants’ family origin.

Street trees and the type of local shops (most famously in the UK, a Waitrose) are good too.

PaulT (profile) says:

Re: Re: Re: Significance?

This is a proof of concept, and an end in and of itself. There’s a lot more data to be collected overall than the DMV can provide, and you don’t have to bother leaving a trail of requests of anything pesky like that. You can just collect publicly available metadata whenever you wish, and process that to tell you what you want.

Anonymous Coward says:

“But Street View images are much more than vivid versions of otherwise flat maps: they are slices of modern life, conveniently sorted by geolocation.”

Yes, they’re immensely helpful when casing peoples homes and other property, planning heists etc., without ever appearing suspicious on video cameras in the process. Kudos to google for RobberyAid®

PaulT (profile) says:

Re: Re:

People like you claim this a lot, but I fail to see how photo that can be months/years old taken from the road outside would be of any real value to a would-be robber. Especially compare to, say, driving down the street and looking at the details that the Street View pictures could never show, or at least ensuring you have an up-to-date photo. Surely criminals would want to at least do a drive-by, rather than hoping that photo from 3 years ago isn’t missing any new security installations for them to be surprised by?

Instead of whining about Google, maybe you could tell me why it’s even remotely useful? I know this requires intellectual honesty and a desire to actually communicate, but maybe one of you people might be interested in such a thing.

btr1701 (profile) says:

Cars

> Chicago is the city with the highest level of income
> segregation, with large clusters of expensive and cheap
> cars in different neighborhoods.

High-end cars aren’t a reliable indicator of income. There are plenty of hood-rats whose homes are a filthy squalor, with roaches running everywhere and malnourished kids sleeping on the floor, who nevertheless have top-of-the-line rides sitting in their driveways. It’s all about what’s important to people and for those dirtbags, the car matters more than anything else.

Valkor says:

Re: Cars

It’s ok that you didn’t read the linked articles. That’s a lot of work. Did you even read the post here? The car image data was compared to EXISTING data, and it checked out.

Moral judgments aside, this is about trends in areas. (Kind of like how BMI is good for populations, but terribly unreliable for individuals.)

Add Your Comment

Your email address will not be published. Required fields are marked *

Have a Techdirt Account? Sign in now. Want one? Register here

Comment Options:

Make this the or (get credits or sign in to see balance) what's this?

What's this?

Techdirt community members with Techdirt Credits can spotlight a comment as either the "First Word" or "Last Word" on a particular comment thread. Credits can be purchased at the Techdirt Insider Shop »

Follow Techdirt

Techdirt Daily Newsletter

Ctrl-Alt-Speech

A weekly news podcast from
Mike Masnick & Ben Whitelaw

Subscribe now to Ctrl-Alt-Speech »
Techdirt Deals
Techdirt Insider Discord
The latest chatter on the Techdirt Insider Discord channel...
Loading...