Why Netflix Never Implemented The Algorithm That Won The Netflix $1 Million Challenge

from the times-change dept

You probably recall all the excitement that went around when a group finally won the big Netflix $1 million prize in 2009, improving Netflix’s recommendation algorithm by 10%. But what you might not know, is that Netflix never implemented that solution itself. Netflix recently put up a blog post discussing some of the details of its recommendation system, which (as an aside) explains why the winning entry never was used. First, they note that they did make use of an earlier bit of code that came out of the contest:

A year into the competition, the Korbell team won the first Progress Prize with an 8.43% improvement. They reported more than 2000 hours of work in order to come up with the final combination of 107 algorithms that gave them this prize. And, they gave us the source code. We looked at the two underlying algorithms with the best performance in the ensemble: Matrix Factorization (which the community generally called SVD, Singular Value Decomposition) and Restricted Boltzmann Machines (RBM). SVD by itself provided a 0.8914 RMSE (root mean squared error), while RBM alone provided a competitive but slightly worse 0.8990 RMSE. A linear blend of these two reduced the error to 0.88. To put these algorithms to use, we had to work to overcome some limitations, for instance that they were built to handle 100 million ratings, instead of the more than 5 billion that we have, and that they were not built to adapt as members added more ratings. But once we overcame those challenges, we put the two algorithms into production, where they are still used as part of our recommendation engine.

Neat. But the winning prize? Eh… just not worth it:

We evaluated some of the new methods offline but the additional accuracy gains that we measured did not seem to justify the engineering effort needed to bring them into a production environment.

It wasn’t just that the improvement was marginal, but that Netflix’s business had shifted and the way customers used its product, and the kinds of recommendations the company had done, had shifted too. Suddenly, the prize winning solution just wasn’t that useful — in part because many people were streaming videos rather than renting DVDs — and it turns out that the recommendation for streaming videos is different than for rental viewing a few days later.

One of the reasons our focus in the recommendation algorithms has changed is because Netflix as a whole has changed dramatically in the last few years. Netflix launched an instant streaming service in 2007, one year after the Netflix Prize began. Streaming has not only changed the way our members interact with the service, but also the type of data available to use in our algorithms. For DVDs our goal is to help people fill their queue with titles to receive in the mail over the coming days and weeks; selection is distant in time from viewing, people select carefully because exchanging a DVD for another takes more than a day, and we get no feedback during viewing. For streaming members are looking for something great to watch right now; they can sample a few videos before settling on one, they can consume several in one session, and we can observe viewing statistics such as whether a video was watched fully or only partially.

The viewing data obviously makes a huge difference, but I also find it interesting that there’s a clear distinction in the kinds of recommendations people that work if people are going to “watch now” vs. “watch in the future.” I think this is an issue that Netflix probably has faced on the DVD side for years: when people rent a movie that won’t arrive for a few days, they’re making a bet on what they want at some future point. And, people tend to have a more… optimistic viewpoint of their future selves. That is, they may be willing to rent, say, an “artsy” movie that won’t show up for a few days, feeling that they’ll be in the mood to watch it a few days (weeks?) in the future, knowing they’re not in the mood immediately. But when the choice is immediate, they deal with their present selves, and that choice can be quite different. It would be great if Netflix revealed a bit more about those differences, but it is already interesting to see that the shift from delayed gratification to instant gratification clearly makes a difference in the kinds of recommendations that work for people.

Filed Under: , , ,
Companies: netflix

Rate this comment as insightful
Rate this comment as funny
You have rated this comment as insightful
You have rated this comment as funny
Flag this comment as abusive/trolling/spam
You have flagged this comment
The first word has already been claimed
The last word has already been claimed
Insightful Lightbulb icon Funny Laughing icon Abusive/trolling/spam Flag icon Insightful badge Lightbulb icon Funny badge Laughing icon Comments icon

Comments on “Why Netflix Never Implemented The Algorithm That Won The Netflix $1 Million Challenge”

Subscribe: RSS Leave a comment
frosty840 says:

Sure there's a difference

It’s explained in the quote. The difference is that the rental algorithm has to work because satisfactory rentals depend on the suggestion being correct, so you need a 1:1 recommendation:success ratio.

The streaming algorithm, on the other hand, only has to achieve, say, a 5:1 recommendation:success ratio because if you get recommended five movies, stream two or three minutes of each, hate four and only watch one, then you’ve still watched a good movie, and you’re probably satisfied.

It’s a transformation from customer satisfaction correlating directly with the recommendation algorithm under a rental business model to a much looser correlation under the streaming model.

Benjo (profile) says:

Re: Sure there's a difference

I agree with you, but I still think accuracy for streaming recommendations can be pretty important. If it takes you five recommendations on average to find one show you like, chances are your faith in the recommendation is lower than say if it only took you one to three (on average). This in turn lowers the value of their recommendation system, which lowers the value of their service.

Call me Al says:

“That is, they may be willing to rent, say, an “artsy” movie that won’t show up for a few days, feeling that they’ll be in the mood to watch it a few days (weeks?) in the future, knowing they’re not in the mood immediately”

This is very much the case for me. I use Lovefilm over in the UK (similar to Netflix) and there is a film I keep adding to my queue, they keep sending me and I keep sending back. I really want to watch it but I’m just never in the right mood but I hate holding onto the DVDs long term so send them back in short order.

Also looking at my queued films vs what I watch on instant there is a noticible difference. I watch a lot of trash on the instant player, films that I would never buy but that I’m happy to try out of curiosity at that moment. The queue though is full of films which are rated well and which I generally look forward to receiving in the post.

Obviously there is also the consideration that some films which I would watch on instant if available are not there so end up queued.

Anonymous Coward says:

This is called crowdsourcing. Companies do this in order to avoid actually having to – gasp! – pay for an actual employee.

People work for free under the guise of a contest, and the best work is chosen and the author is given some bullshit prize for placation.

Designers are the ones who fall for it, given lack of experience, but programmers and computer scientists?

mattarse (profile) says:

Re: Re: Re:

Exactly – most of my colleagues all have a hobby that is in the same field, but not exactly applicable to their jobs. We enjoy the challenge.
I remember reading about this prize when it was still in competition and at least a few of the challengers didn’t seem to think they had a chance to win, but kept working on it making incremental progress.

Benjo (profile) says:

Re: Re:

I’m sure the Engineers that came up with the Machine learning algorithm don’t give 2 damns if Netflix used their algorithm, since they got paid.

What’s interesting about this is that their suggestion system for streaming videos still affects the company’s value prop. And the whole point of the machine learning algorithm is that based on a sample set of millions of input data, they can extrapolate a rating system that holds true for billions of future cases (at least better than their current system).

What I’m saying is they have even more data now, and they could probably pay some engineers to build a new machine learning algorithm for streaming videos, with a very good chance of making a much more accurate system. I’d rather see them spend money on that then trying to launch their own cable channel.

Benjo (profile) says:

Re: Re: Re:

Also, looking back at the privacy concerns for how they conducted these contests, I’m really surprised Netflix didn’t just keep the data themselves.

I believe when you are coming up with a machine learning algorithm, it is considered bad practice to personally look at the data set you are given. You really only need to know the format and size of the data.

If it was a more collaborative effort, they could have worked with the research teams and basically run the algorithms themselves, giving results back to the teams. This way, the data would never leave Netflix, and there would be no privacy issues.

Mike (user link) says:

Good point on the psychological differences between choosing videos for your future self vs your current self.

Made me immediately think of Natalie Tran’s observations on current self vs past self. If you haven’t seen the video it’s here:

One of the most practical KDDCUP competitions – a pity the competition didn’t match more closely the engineering requirements.

pr (profile) says:

I think they have it nailed.

They describe my behavior exactly. My DVD list had a bunch of deep, culturally significant things I would like to be able to say I’ve seen. Sometimes they would sit around for a month before we sent them back unwatched. Now whenever I want ten minutes of thoughtless pleasure I just slap on an episode of “Shaun the Sheep.”

I usually watch them one skit at a time. I hope they don’t take partially watched to be evidence that I don’t like it, it’s just that that sort of comedy is better in small doses.

dev says:

Canadian selection

in canada, the issue is more of a lack of quality to select from. netflix is constantly recommend things that are barely three stars. when i joined i hoped netflix would recommend new things to me that i hadn’t heard of, instead i get straight to tv knockoffs.

I wish when i searched for a streaming movie, that it could play the movie trailer.

netflix is like searching through the discount dvd bin at walmart.

Asset Tracking Software (user link) says:

Actually, that makes sense

In business they always say you base decisions on the future and not on the past.

My initial response was to say that they spent $1 million getting a better algorithm and they didn’t use all of it, what were they thinking?

But then, if it didn’t make sense to use it when they finally got the algorithm then don’t use it. It would be stupid in this situation to use it.

BlueRaja (user link) says:

Garbage in, garbage out

Garbage in, garbage out. If I have four people in my family all rating movies, they’re not getting a correlation between how a person will rate similar movies, they’re just getting a bunch of seemly random ratings. Without fixing this, no algorithm could ever do well.

Netflix could have fixed this years ago about allowing multiple users under one account, but they never cared. That’s why the ratings on Netflix have turned to crap, and (that lack of caring) is why Netflix is itself turning to crap.

Sephen says:

steaming netflix movies cannot be recomended accurately

They cannot recommend accurately for me because movies I’m interested in are *not* streamed; that’s why I gave up on that, and when they did the split I stayed with dvds rather than both dvds and streaming or streaming alone.

The movies they offer for streaming could more accurately be offered steaming, as in a steaming pile of dog poop. Until they consistently offer movies I’d be interested in, they can keep their steaming content; I don’t want it.

Hamza Sohrab says:

The Netflix Prize was an open competition for the best collaborative filtering algorithm to predict user ratings for films, based on previous ratings without any other information about the users or films, i.e. without the users or the films being identified except by numbers assigned for the contest.
This was never implemented because it did not do as it was planned to do as an algorithm .

Add Your Comment

Your email address will not be published. Required fields are marked *

Have a Techdirt Account? Sign in now. Want one? Register here

Comment Options:

Make this the or (get credits or sign in to see balance) what's this?

What's this?

Techdirt community members with Techdirt Credits can spotlight a comment as either the "First Word" or "Last Word" on a particular comment thread. Credits can be purchased at the Techdirt Insider Shop »

Follow Techdirt

Techdirt Daily Newsletter

Techdirt Deals
Techdirt Insider Discord
The latest chatter on the Techdirt Insider Discord channel...
Older Stuff
09:00 Awesome Stuff: Monitor Everything (5)
09:00 Awesome Stuff: Cool Components (1)
12:42 Tech Companies Ask European Commission Not To Wreck The Internet -- And You Can Too (4)
09:00 Awesome Stuff: Play & Listen (1)
09:00 Awesome Stuff: Beyond Chiptunes (12)
09:00 Awesome Stuff: Updated Classics (3)
09:00 Awesome Stuff: Celebrating Cities (1)
09:00 Awesome Stuff: Crafts Of All Kinds (5)
09:00 Awesome Stuff: One Great Knob (13)
09:00 Awesome Stuff: Simple Geeky Toys (2)
09:00 Awesome Stuff: Gadgets For The New Year (18)
09:00 Awesome Stuff: A Post-Holiday Grab Bag (0)
13:34 How Private-Sector Innovation Can Help Those Most In Need (21)
09:00 Awesome Stuff: Towards The Future Of Drones (17)
09:00 Awesome Stuff: Artisanal Handheld Games (5)
09:00 Awesome Stuff: A New Approach To Smartphone VR (5)
09:00 Awesome Stuff: Let's Bore The Censors (37)
09:00 Awesome Stuff: Open Source For Your Brain (2)
09:00 Awesome Stuff: The Final Piece Of The VR Puzzle? (6)
09:00 Awesome Stuff: The Internet... Who Needs It? (15)
09:00 Awesome Stuff: The Light Non-Switch (18)
09:00 Awesome Stuff: 3D Printing And Way, Way More (7)
13:00 Techdirt Reading List: Learning By Doing (5)
12:43 The Stagnation Of eBooks Due To Closed Platforms And DRM (89)
09:00 Awesome Stuff: A Modular Phone For Makers (5)
09:00 Awesome Stuff: Everything On One Display (4)
09:00 Awesome Stuff: Everything Is Still A Remix (13)
09:00 Awesome Stuff: Great Desk Toy, Or Greatest Desk Toy? (6)
09:00 Awesome Stuff: Sleep Hacking (12)
09:00 Awesome Stuff: A Voice-Operated Household Assistant (19)
More arrow