Privacy

by Glyn Moody


Filed Under:
ai, privacy, street view

Companies:
google



Using AI To Identify Car Models In 50 Million Google Street Views Reveals A Wide Range Of Demographic Information

from the you-are-what-you-drive dept

Google Street View is a great resource for taking a look at distant locations before travelling, or for visualizing a nearby address before driving there. But Street View images are much more than vivid versions of otherwise flat maps: they are slices of modern life, conveniently sorted by geolocation. That means they can provide all kinds of insights into how society operates, and what the differences are geographically. The tricky part is extracting that information. An article in the New York Times reports on how researchers at Stanford University have applied artificial intelligence (AI) techniques to 50 million Google Street View images taken in 200 US cities. Since analyzing images of people directly is hard and fraught with privacy concerns, the researchers concentrated on a proxy: cars. As an academic paper published by the Stanford team notes (pdf):

Ninety five percent of American households own automobiles, and as shown by prior work cars are a reflection of their owners' characteristics providing significant personal information.

First, the AI system had to be trained to find cars in the Google Street Map images. That's something that's easy for humans to do, but hard for computers, while the next stage of the work -- identifying car models -- is much easier using AI. As another paper reporting on the research (pdf) explains:

the fine-grained object recognition task we perform here is one that few people could accomplish for even a handful of images. Differences between cars can be imperceptible to an untrained person; for instance, some car models can have subtle changes in tail lights (e.g., 2007 Honda Accord vs. 2008 Honda Accord) or grilles (e.g., 2001 Ford F-150 Supercrew LL vs. 2011 Ford F-150 Supercrew SVT). Nevertheless, our system is able to classify automobiles into one of 2,657 categories, taking 0.2 s per vehicle image to do so. While it classified the automobiles in 50 million images in 2 wk, a human expert, assuming 10 s per image, would take more than 15 y to perform the same task.

The difference between the two weeks taken by the AI software, and the 15 years a human would need, means that it is possible to analyze much larger data collections than before, and to extract new kinds of information. This is done by using existing datasets, for example the American Community Survey, which is performed by the US Census Bureau each year, to train the AI system to spot correlations between cars and demographics. The New York Times article lists some of the results that emerge from mining and analyzing the Google Street Map images, and adding in metadata from other sources:

The system was able to accurately predict income, race, education and voting patterns at the ZIP code and precinct level in cities across the country.

Car attributes (including miles-per-gallon ratings) found that the greenest city in America is Burlington, Vt., while Casper, Wyo., has the largest per-capita carbon footprint.

Chicago is the city with the highest level of income segregation, with large clusters of expensive and cheap cars in different neighborhoods; Jacksonville, Fla., is the least segregated by income.

New York is the city with the most expensive cars. El Paso has the highest percentage of Hummers. San Francisco has the highest percentage of foreign cars.

The researchers point out that the rise of self-driving cars with on-board cameras will produce even more street images that could be fed into AI systems for analysis. They also note that walking around a neighborhood with a camera -- for example, in a smartphone -- would allow image data to be gathered very simply and cheaply. And as AI systems become more powerful, it will be possible to extract even more demographic information from apparently innocuous street views. Although that may be good news for academic researchers, datamining offline activities clearly creates new privacy problems at a time when people are already worried about what can be gleaned from datamining their online activities.

Follow me @glynmoody on Twitter or identi.ca, and +glynmoody on Google+


Reader Comments

Subscribe: RSS

View by: Time | Thread


  • icon
    Peter (profile), 17 Jan 2018 @ 11:36pm

    Is the project finally finished?

    "find[ing] cars in the Google Street Map images [is] easy for humans to do, but hard for computers, while the next stage of the work -- identifying car models -- is much easier using AI."

    During the last months, Google captcha required up to several dozen mouse clicks identifying cars or traffic signs before it finally accepted that a user might be human - essentially, Google turned a considerable part of the world's population into mechanical turks to help with their project.

    Is Google's project now finished, so we can back to clicking once or twice to prove we are not machines?

    reply to this | link to this | view in chronology ]

    • identicon
      Anonymous Coward, 18 Jan 2018 @ 8:01am

      Re: Is the project finally finished?

      I thought they were just being extra careful about bot IDing of car images to exploit Captchas. Now it seems potentially more sinister.

      Just goes to show the orgs who got Captcha functionality could just be being used as well. If it cost anything to implement the Captcha, you'd be paying Google to make use of your users' judgement too.

      Not sure if the org would think it was a fair trade. It would at least make their users/customers a bit more leery of Captchas, if not incensed that a verification system might be co-opting their eyes for Google's purposes.

      reply to this | link to this | view in chronology ]

  • This comment has been flagged by the community. Click here to show it
    identicon
    Anonymous Coward, 18 Jan 2018 @ 1:00am

    Seriously, give up. This website blows. It was ok for like a day or so, but nobody gives a shit.

    reply to this | link to this | view in chronology ]

  • identicon
    Pete Austin, 18 Jan 2018 @ 4:29am

    There are lots of other proxies

    My father always judged a neighborhood by its cars.

    Satellite dishes are similarly valuable. For example, the number of satellite dishes on a building is a proxy for the number of apartments, and the type (e.g. in the UK, Sky dishes vs larger hotbird dishes) shows the occupants' family origin.

    Street trees and the type of local shops (most famously in the UK, a Waitrose) are good too.

    reply to this | link to this | view in chronology ]

  • icon
    Ninja (profile), 18 Jan 2018 @ 4:40am

    Dear Ai, you are not prepared.

    reply to this | link to this | view in chronology ]

  • identicon
    Anonymous Coward, 18 Jan 2018 @ 4:50am

    Significance?

    Interesting, but how is the usefulness of this information much different from that of plain old automobile registration data?

    reply to this | link to this | view in chronology ]

    • icon
      PaulT (profile), 18 Jan 2018 @ 5:17am

      Re: Significance?

      You can't get registration data by driving round the street taking photos.

      reply to this | link to this | view in chronology ]

      • identicon
        Anonymous Coward, 18 Jan 2018 @ 12:55pm

        Re: Re: Significance?

        You can't get registration data by driving round the street taking photos.

        No, it's even easier to get it already collected from the DMV.

        reply to this | link to this | view in chronology ]

        • icon
          PaulT (profile), 19 Jan 2018 @ 1:44am

          Re: Re: Re: Significance?

          This is a proof of concept, and an end in and of itself. There's a lot more data to be collected overall than the DMV can provide, and you don't have to bother leaving a trail of requests of anything pesky like that. You can just collect publicly available metadata whenever you wish, and process that to tell you what you want.

          reply to this | link to this | view in chronology ]

  • identicon
    steve, 18 Jan 2018 @ 5:13am

    "spot correlations between cars and demographics."

    Didn't the lease game hit the USA?

    In the UK looking at the cars use to be a great way to spot the difference between the good and bad areas.

    These days areas I wouldn't want to walk through are full of leased white 4x4s , the area with 100000 miles junkers will normally be the better .

    reply to this | link to this | view in chronology ]

  • identicon
    Anonymous Coward, 18 Jan 2018 @ 6:04am

    "But Street View images are much more than vivid versions of otherwise flat maps: they are slices of modern life, conveniently sorted by geolocation."

    Yes, they're immensely helpful when casing peoples homes and other property, planning heists etc., without ever appearing suspicious on video cameras in the process. Kudos to google for RobberyAid®

    reply to this | link to this | view in chronology ]

    • icon
      PaulT (profile), 18 Jan 2018 @ 6:41am

      Re:

      People like you claim this a lot, but I fail to see how photo that can be months/years old taken from the road outside would be of any real value to a would-be robber. Especially compare to, say, driving down the street and looking at the details that the Street View pictures could never show, or at least ensuring you have an up-to-date photo. Surely criminals would want to at least do a drive-by, rather than hoping that photo from 3 years ago isn't missing any new security installations for them to be surprised by?

      Instead of whining about Google, maybe you could tell me why it's even remotely useful? I know this requires intellectual honesty and a desire to actually communicate, but maybe one of you people might be interested in such a thing.

      reply to this | link to this | view in chronology ]

    • identicon
      Anonymous Coward, 18 Jan 2018 @ 10:02am

      Re:

      Congrats you’ve won the Out Of The Blue Award For Most Detatched From Reality Comment.

      reply to this | link to this | view in chronology ]

  • icon
    btr1701 (profile), 18 Jan 2018 @ 11:37am

    Cars

    > Chicago is the city with the highest level of income
    > segregation, with large clusters of expensive and cheap
    > cars in different neighborhoods.

    High-end cars aren't a reliable indicator of income. There are plenty of hood-rats whose homes are a filthy squalor, with roaches running everywhere and malnourished kids sleeping on the floor, who nevertheless have top-of-the-line rides sitting in their driveways. It's all about what's important to people and for those dirtbags, the car matters more than anything else.

    reply to this | link to this | view in chronology ]

    • identicon
      Valkor, 18 Jan 2018 @ 3:06pm

      Re: Cars

      It's ok that you didn't read the linked articles. That's a lot of work. Did you even read the post here? The car image data was compared to EXISTING data, and it checked out.

      Moral judgments aside, this is about trends in areas. (Kind of like how BMI is good for populations, but terribly unreliable for individuals.)

      reply to this | link to this | view in chronology ]


Add Your Comment

Have a Techdirt Account? Sign in now. Want one? Register here
Get Techdirt’s Daily Email
Use markdown for basic formatting. HTML is no longer supported.
  Save me a cookie
Follow Techdirt
Techdirt Gear
Shop Now: Copying Is Not Theft
Advertisement
Report this ad  |  Hide Techdirt ads
Essential Reading
Techdirt Deals
Report this ad  |  Hide Techdirt ads
Techdirt Insider Chat
Advertisement
Report this ad  |  Hide Techdirt ads
Recent Stories

Close

Email This

This feature is only available to registered users. Register or sign in to use it.