from the good...-and-bad dept
Back in July, we wrote about how the Netflix $1 million prize showed how much further research efforts could get by collaborating, rather than hoarding. Now that the official prize has been awarded, we’re hearing even more about that point:
The blending of different statistical and machine-learning techniques “only works well if you combine models that approach the problem differently,” said Chris Volinsky, a scientist at AT&T Research and a leader of the Bellkor team [which won]. “That’s why collaboration has been so effective, because different people approach problems differently.”
Indeed. There’s plenty of research out there showing the leaps that are made in innovation when people with different approaches collaborate. Yet, with so much of a focus on “patents” representing “innovation,” the opposite occurs. The patent system is all about hoarding information and making it harder to collaborate by putting tollbooths in the process. Many of the final “teams” involved a whole bunch of different approaches. Imagine if each one had a patent on their method. Think of how expensive that kind of innovation would be. Then, realize that there are plenty of technologies that face that exact problem today.
In the meantime, Paul Ohm is raising some serious questions about people’s privacy on the new Netflix Prizes that are being announced. While Netflix claims that the data is anonymized, we’ve seen before that anonymous datasets are almost never anonymous, and in Netflix’s case, the details are pretty bad:
Although I give Netflix a pass for its past privacy breach, I am astonished to learn from the New York Times that the company plans a second act:
The new contest is going to present the contestants with demographic and behavioral data, and they will be asked to model individuals’ “taste profiles,” the company said. The data set of more than 100 million entries will include information about renters’ ages, gender, ZIP codes, genre ratings and previously chosen movies. Unlike the first challenge, the contest will have no specific accuracy target. Instead, $500,000 will be awarded to the team in the lead after six months, and $500,000 to the leader after 18 months.
Netflix should cancel this new, irresponsible contest, which it has dubbed Netflix Prize 2. Researchers have known for more than a decade that gender plus ZIP code plus birthdate uniquely identifies a significant percentage of Americans (87% according to Latanya Sweeney’s famous study.) True, Netflix plans to release age not birthdate, but simple arithmetic shows that for many people in the country, gender plus ZIP code plus age will narrow their private movie preferences down to at most a few hundred people. Netflix needs to understand the concept of “information entropy”: even if it is not revealing information tied to a single person, it is revealing information tied to so few that we should consider this a privacy breach.
Ohm also points out that this prize almost certainly violates the law:
Because of this, if it releases the data, Netflix might be breaking the law. The Video Privacy Protection Act (VPPA), 18 USC 2710 prohibits a “video tape service provider” (a broadly defined term) from revealing “personally identifiable information” about its customers. Aggrieved customers can sue providers under the VPPA and courts can order “not less than $2500” in damages for each violation. If somebody brings a class action lawsuit under this statute, Netflix might face millions of dollars in damages.
It seems rather surprising that Netflix’s lawyers did not consider this.
Filed Under: innovation, netflix prize, privacy