Netflix $1 Million Award Shows The Value Of Collaboration... But Kicks Up New Privacy Questions

from the good...-and-bad dept

Back in July, we wrote about how the Netflix $1 million prize showed how much further research efforts could get by collaborating, rather than hoarding. Now that the official prize has been awarded, we're hearing even more about that point:
The blending of different statistical and machine-learning techniques "only works well if you combine models that approach the problem differently," said Chris Volinsky, a scientist at AT&T Research and a leader of the Bellkor team [which won]. "That's why collaboration has been so effective, because different people approach problems differently."
Indeed. There's plenty of research out there showing the leaps that are made in innovation when people with different approaches collaborate. Yet, with so much of a focus on "patents" representing "innovation," the opposite occurs. The patent system is all about hoarding information and making it harder to collaborate by putting tollbooths in the process. Many of the final "teams" involved a whole bunch of different approaches. Imagine if each one had a patent on their method. Think of how expensive that kind of innovation would be. Then, realize that there are plenty of technologies that face that exact problem today.

In the meantime, Paul Ohm is raising some serious questions about people's privacy on the new Netflix Prizes that are being announced. While Netflix claims that the data is anonymized, we've seen before that anonymous datasets are almost never anonymous, and in Netflix's case, the details are pretty bad:
Although I give Netflix a pass for its past privacy breach, I am astonished to learn from the New York Times that the company plans a second act:
The new contest is going to present the contestants with demographic and behavioral data, and they will be asked to model individuals' "taste profiles," the company said. The data set of more than 100 million entries will include information about renters' ages, gender, ZIP codes, genre ratings and previously chosen movies. Unlike the first challenge, the contest will have no specific accuracy target. Instead, $500,000 will be awarded to the team in the lead after six months, and $500,000 to the leader after 18 months.
Netflix should cancel this new, irresponsible contest, which it has dubbed Netflix Prize 2. Researchers have known for more than a decade that gender plus ZIP code plus birthdate uniquely identifies a significant percentage of Americans (87% according to Latanya Sweeney's famous study.) True, Netflix plans to release age not birthdate, but simple arithmetic shows that for many people in the country, gender plus ZIP code plus age will narrow their private movie preferences down to at most a few hundred people. Netflix needs to understand the concept of "information entropy": even if it is not revealing information tied to a single person, it is revealing information tied to so few that we should consider this a privacy breach.
Ohm also points out that this prize almost certainly violates the law:
Because of this, if it releases the data, Netflix might be breaking the law. The Video Privacy Protection Act (VPPA), 18 USC 2710 prohibits a "video tape service provider" (a broadly defined term) from revealing "personally identifiable information" about its customers. Aggrieved customers can sue providers under the VPPA and courts can order "not less than $2500" in damages for each violation. If somebody brings a class action lawsuit under this statute, Netflix might face millions of dollars in damages.

Additionally, the FTC might also decide to fine Netflix for violating its privacy policy as an unfair business practice.
It seems rather surprising that Netflix's lawyers did not consider this.

Filed Under: innovation, netflix prize, privacy
Companies: netflix


Reader Comments

Subscribe: RSS

View by: Time | Thread


  1. identicon
    Anonymous Coward, 22 Sep 2009 @ 6:07pm

    Re: Re: Re:

    I wonder if changing it from age to age group (5-10 year ranges) would be enough to satiate those with privacy concerns... Could probably work out something similar with zip codes

Add Your Comment

Have a Techdirt Account? Sign in now. Want one? Register here



Subscribe to the Techdirt Daily newsletter




Comment Options:

  • Use markdown. Use plain text.
  • Remember name/email/url (set a cookie)

Follow Techdirt
Insider Shop - Show Your Support!

Advertisement
Report this ad  |  Hide Techdirt ads
Essential Reading
Techdirt Deals
Report this ad  |  Hide Techdirt ads
Techdirt Insider Chat
Advertisement
Report this ad  |  Hide Techdirt ads
Recent Stories
Advertisement
Report this ad  |  Hide Techdirt ads

Close

Email This

This feature is only available to registered users. Register or sign in to use it.