Could Apple's MusicMatch Be A Tool To Identify Infringers?

from the wouldn't-that-be-interesting dept

When Apple announced its iCloud Music Match service, a lot of people started suggesting that it “legitimized” infringement, or in some manner created an amnesty program for infringers: pay $25/year and all of your infringing copies get replaced by licensed copies from Apple. In fact, this really pissed off some copyright holders. The whole thing struck me as a huge exaggeration, because no one was looking at the files on your hard drive to see if they were infringing anyway.

But… what if they started now? And what if they started now because they could… thanks to Apple’s iCloud Music Match.

I still think this is a huge stretch (and I’ll explain why below), but Slashdot points us to a story from someone suggesting that rather than “amnesty” for unauthorized file sharers, Music Match could be used to track down infringers. My initial response was that this was totally crazy, because it wouldn’t know if the non-iTunes-bought tracks were authorized or not. After all, you could have bought them elsewhere or ripped them. However, the author of the article, Daniel Nolte, does a good job walking through both scenarios and explaining why Apple might still be able to figure out what files were likely infringing:

Although the ?DRM free? MP3 now being provided from many of the the major music download companies can be played anywhere, each download is watermarked with header information specific to the exact purchase and purchaser.  This article from Techcrunch gives more details on ?dirty? MP3s.  Consequently, if you purchase a ?DRM free? MP3 file from iTunes and then share it, and the person(s) who received it saves it to their iCloud, then Apple will know both (i) who shared their copy and (ii) whose copy is illegal.  For files from other watermarked retailers, the same information would only require coordination with the other site.

Next consider music purchased from sites that sell legal but ?clean? MP3s without watermarks.  These files will have unique MD5 or SHA-2 signatures that can distinguish them to a particular company.  They will certainly have different signatures than the watermarked versions (because the addition of the watermark) and they will be unique from versions of the same song encoded by others.  When Apple?s servers detect a number of copies far in excess of the ?clean? mp3 company?s reported sales, they will know where to suspect illegal copying.

Then there will be MP3s that individuals created themselves from, for example, ?ripping? their CD collections.  While these are not watermarked to the individual, they appear to be unique for each ?rip?.  To confirm this, I ran a test with fresh installations of the exact same CD ripping software on two different computers.  I then had them rip the same track from the exact same CD using the unchanged system default settings on both computers.  The MD5 hashes did not match. Small differences between the two reads, the internal timestamps, the system metadata, etc. likely resulted in the mismatch.  It will almost certainly also be different from the file hashes from legal download sites, both those that watermark and those that do not.    In short, if you and thousands other people have MP3s of the same song with the same file hash value, you will not be able to credibly claim it occurred because all of you ripped it from your CD collections.

The obvious next response is that Apple would never let its data be used in this manner, as it would kill the service. But Nolte notes that might not matter either:

Apple would not have to.  They would simply have to comply with an information demand from the RIAA, who has had no problem with being seen as the bad guy in hardball enforcement against file sharing.  Moreover consider this:

  1. Apple is the largest music retailer on the planet.
  2. Apple believes, possibly justifiably, that it loses billions of dollars annually to illegal music file sharing.
  3. The easiest way out of the legal jam over challenged content in your iCloud storage would be to convert the suspected iCloud music by buying it from Apple.  Apple becomes almost like a white knight in the process.

While this is possible, I don’t think it’s probable for a variety of reasons. While the RIAA may not have a problem being seen as the bad guy in hardball enforcement, I think that even it has some limits, and this almost certainly passes those limits. I could actually see a random misguided indie label seeking this kind of information, but I’m not convinced it would really get that far. I would hope that most courts would recognize that this was a fishing expedition, rather than a reasonable search based on evidence of specific infringement. Simply demanding all data would (hopefully) be a non-starter.

In the end, I think both claims are probably overblown. Music Match is neither a way to launder infringing files, nor is it a honeypot to get you in trouble for your infringing files. It’s just a service to sync music.

Filed Under: , ,
Companies: apple

Rate this comment as insightful
Rate this comment as funny
You have rated this comment as insightful
You have rated this comment as funny
Flag this comment as abusive/trolling/spam
You have flagged this comment
The first word has already been claimed
The last word has already been claimed
Insightful Lightbulb icon Funny Laughing icon Abusive/trolling/spam Flag icon Insightful badge Lightbulb icon Funny badge Laughing icon Comments icon

Comments on “Could Apple's MusicMatch Be A Tool To Identify Infringers?”

Subscribe: RSS Leave a comment
72 Comments
Anonymous Coward says:

Re: Re: Re:

No this will not do it. Its not ‘header’ information like how an mp3 file stores artist/title/etc. Watermarking is designed to be resistant to any type of manipulation, including time distortion, filtering, etc. I’m just speculating at how this works because I haven’t read any papers (although I am an electrical engineer). It’s likely similar to how radio works. You have a carrier signal, information signal, and then mix and send them. On the receiving side, you can easily filter out the information from the carrier because the frequencies are in bands that allow for ‘cheap’ filters to perform the signal separation. The watermarking likely encodes information within the audible frequencies so it cannot be easily filtered out. This also degrades the sound quality although its obviously not easily perceptible.

Hashes (MD5,SHA family) are worthless for identification here. Hashes are designed to guarantee data integrity. You as an individual listening don’t care if the data is exactly the same as when you purchased. Think for instance about adding a half a millisecond silence on the end of a song. You wouldn’t even notice anything in the audio, but the data file’s hash value would be completely changed.

PrometheeFeu (profile) says:

Re: Re: Re: Re:

No offense meant, but you just admitted you don’t know how it’s done. So while I agree that it’s most likely not in headers, there is no way to know how it is done precisely or what distortions this is resistant to. If you wanted to destroy the watermark, I would recommend taking two music streams that play the same music, then, perform the following operation: (((streamA xor streamB) and random) xor streamA). The watermark HAS to be gone since the only commonality between streamA and the result is what was common between streamA and streamB. Of course, that’s a bit overkill and you could probably just take a single stream and add to it random noise up to the point where it does not degrade quality noticeably and the watermark should be gone. After all, if the watermark is not removed by that, it means the watermark has to degrade sound quality more than the noise you added which you would hopefully notice. Trust me Anon, this is a mission for information theory.

jsf (profile) says:

Re: Re: Re:

Maybe, sort of. The potential issue is that even without any watermarking you still end up with data in an MP3 file that could be used to identify a particular PC as the source of the encoded MP3. The argument is that some of this MP3 metadata will be the same across all MP3 files created on a given PC, with a given encoder. Thus, you could potential identify a bunch of different MP3 files as all coming from the same source, track back to that source and prove that they came from your PC. So unless you manipulate the metadata on every MP3 file you have, after you encoded it, it could be tracked back to you.

Of course all of this is speculation at this point, as there has been no proof of concept shown yet. It would also be fairly easy for an encoder to add in some sort of random seed to the meta data to make it more difficult to link tracks to a given PC.

John William Nelson (profile) says:

You can't prove the act with Apple's data

Apple’s service can’t necessarily identify who has holds the right to possess the music and who is the infringer. On top of that, it can’t provide evidence of distribution and copying.

In short, it might be able to provide an idea of the scope of copyright violations, but it can’t prove anything for a specific user. More information would be necessary to sustain a claim in court.

Kevin (profile) says:

Re: You can't prove the act with Apple's data

In short, it might be able to provide an idea of the scope of copyright violations…

Might be an interesting way to finally get our hands on some real data about the scope of file sharing. Even though there’s still no hard and fast link between “lost sales” and piracy, it would be interesting to have real facts about it and who it really effects.

PaulT (profile) says:

Re: Re: You can't prove the act with Apple's data

“who it really effects”

It still wouldn’t be possible to prove this with that kind of data. For example, while the assumption is that every infringing download is a lost sale, most music fans can point to a situation where the opposite is true, and a great many habitual infringers wouldn’t buy music if the pirate option wasn’t available.

I could basically predict that the results of such data would be along the lines of “there’s a lot of piracy” and “there’s a great deal of correlation between the most pirated albums and the biggest selling albums”. None of this proves any direct effect or causation of the industry’s woes, nor tells us anything we don’t already know.

Anonymous Coward says:

Re: Re: You can't prove the act with Apple's data

Nah, it still ignores all other avenues or purchase. Looking at pirated tracks doesn’t take in to account concert attendance, merch purchases or even people who DID buy the album but didn’t bother re-ripping it since they already had a perfectly working copy; or even those that bought/owned the album but found it easier to download than to rip.

At best, you’d have a rough idea of how rampant music piracy is among the small sample of music listeners that is their customer base, and nothing else.

sheenyglass (profile) says:

Re: You can't prove the act with Apple's data

They could argue that if a file has been copied then all people with copies are infringers either through receiving copies or by making them for other people. So if I rip a CD and send copies to my friends, all of us have infringed the copyright. It really doesn’t matter that I had a license, as my copying exceeded its scope.

Nick says:

For starters, where does he get the figures that Apple believes they lose billions per year because of file sharing? Sure, the big record labels and movie studios say that all the time, but I don’t recall ever reading about Apple making such complaints.

I’m in agreement with you. However, I would add one more point. It is a service to sync music, but first and foremost it is another method of convincing people to buy and use Apple products. That is priority number one for Apple.

I don’t know where concern for file sharing is on their list, but I would wager it is well below getting products out the door and into the hands of customers.

Lord Binky says:

Randomizer

So the solution to their technique is to change stamps in the file to make the hash different? I think I would call my program Hash Masher, where It merely changes any dates and times a file has to change the hash, for your personal hash tables of course. Then I’d make a Hash Masher Pro that sold for $1 that did some goofy simple extra thing that makes people go “Eh, it’s worth a dollar”

Jimr (profile) says:

Lots of if's.

I wonder what the odds of the same song with the same file hash value? I doubt it would be truly unique – Maybe 1 in 10,000 or 100,000.
I have a dedicated machine that accepts the CD and automatically make an MP3 file out it (in about 10 minutes), fetches art work, and categorizes it, and saves it on an internal drive which is accessible on my home network. Now if that company sold 10,000 units like this what are the odds that two or more users buy the CD on the release date and make copy – what are the odds the hash value match?

Secondly, what if I did download it as copying the CD to an MP3 was impossible. I do have one CD that when I tried to automatically generate the MP3 it created 999 mp3 files (all empty).

If accused then I must settle or, at my own cost, prove my copy is legitimate.

Lastly (as I adjust my tin foil hat), i think the data potential for discovery is far to tempting for the RIAA. The RIAA will do their best to get their hands on all this extremely valuable data, even it the means are less then legitimate.

It is going to take a couple years to earn my trust. With all that being said. I would not mind doing it as I could have my entire digital library backed up on iTunes and as long as I could re-download the entire library end then end my contract with iTune (and still be able to play my files).

Rekrul says:

Re: Lots of if's.

It is going to take a couple years to earn my trust. With all that being said. I would not mind doing it as I could have my entire digital library backed up on iTunes and as long as I could re-download the entire library end then end my contract with iTune (and still be able to play my files).

Psss… There are these things called “DVD-Rs”, which can store about 1100 average MP3 files. And the best part is that you can buy packs of 100 for around $20 when they’re on sale! That’s over 100,000 MP3 files, and if you spend another $20, you can make extra copies and put them in a safety deposit box for extra security. And nobody can even snoop on what you’re doing!

Rekrul says:

Re: Re: Re: Lots of if's.

With that cheap 2,5″ discs or flash drives, who uses DVD-Rs these days?

The price of hard drives is still double what it would cost to buy the equivalent amount of storage capacity on DVD-R. And Flash Drives are still ridiculously expensive compared to other media. You’d be lucky to find a 16GB flash drive for $20, usually that will only get you 8GB. At that price, you’d have to pay over $1,000 to get the same amount of storage space as a 100 pack of DVD-Rs.

nasch (profile) says:

Re: Re: Re:2 Lots of if's.

The price of hard drives is still double what it would cost to buy the equivalent amount of storage capacity on DVD-R.

One of us is doing math wrong. Here’s what I get:

100 4.7GB DVD-Rs for $20 = about 4 cents per GB
1000 GB hard drive for $65 = about 6.5 cents per GB

So a bit more than 50% more expensive for hard drives. And when you consider a 1TB drive holds the same data as over 200 DVDs, it starts to look pretty good. 25GB Blu-ray (BD-R) for $1 each is only 4 discs per terabyte, and costs the same as DVD-R, but of course you have to have a blu-ray burner.

Scote (profile) says:

Re: Re: DVD-Rs are *not* archival

Burnable DVDs are not archival media. They start to show signs of degradation in less than a year. Frankly, there is no digital archival media. The Internet Archive is backing up digital books on **paper** because digital data is so subject to bit rot. But DVD-Rs are among the least archival of all the digital media.

Rekrul says:

Re: Re: Re: DVD-Rs are *not* archival

Burnable DVDs are not archival media. They start to show signs of degradation in less than a year. Frankly, there is no digital archival media.

I heard the same thing about CDRs. People talked about them de-laminating, having bit-rot, etc. I just checked and one of my earliest CDRs from 2003 is still fine. Also, the first DVD I burned back in 2007 also verifies 100%.

PrometheeFeu (profile) says:

Re: Lots of if's.

The chances of collision are dependent upon the hash function and the size of the hash. So for a 256-bit hash. So if you take 2 music files, the chances that their hash matches is 1 in 2^256 (otheerwise known as a TON). However, because of the birthday paradox, if you have a lot of music files, the chances that at least one pair will have the same hash are very high. Really the chances that there exists at least one pair of music files in this world with the same MD5, SHA-1 or SHA-2 are for all intents and purposes 100%. However, if they really depend upon hashes for file identification, the whole scheme falls apart if you flip a single bit in the file. So I doubt that’s their approach. They are not THAT stupid.

TechnoMage (profile) says:

Your lack of faith is disturbing Mike

How dare you not have faith in the RIAA to:

1) Mess a great service up (Napster, and others Turntable.fm)
2) Go Overboard(sued an Amish family for file sharing…)
3) Cry Foul, especially when no crying is needed
4) Not go after Apple’s DB of info on electronic version of files (which in this case happen to be MP3s) which are ALL stored in a neat-little-single place
5) Use #4’s info to sue everyone under the sun
6) ???
7) Profit

…. Seriously, have some faith in the RIAA

cc (profile) says:

CD drives have digital and analog outputs (that 4-pin wire that connects directly from your CD drive to your sound card). If he’s using the analog connector I would guess it’s normal that he gets some noise that screws up the hashes.

It’s also likely that many people don’t use that wire, so they probably get perfect digital output (including better error checking) and identical hashes.

I also don’t believe there’s any information in the standard mp3 headers (like date information or similar) to identify one copy from another. The additional metadata that may be tagged at the end is typically downloaded automatically from online servers, so it’s again normal that independently made copies could have identical tags.

I haven’t tested these things out, so I may be wrong, but my guesses are no better than that guy’s single data point.

Chris Rhodes (profile) says:

Roll Your Own

Grab SubSonic for your home computer, and if whatever you want to stream from has a web browser and Flash, or is a device based on iOS, Android, or WP7 (each of which has apps available), you can stream your music and movies straight from your home computer to that device. If you pair it up with JungleDisk (or some other cloud storage service), you can have all your music stored online for a low price.

And to top it off, you can also give user accounts to your friends and family, so they can enjoy your collection too.

Chris Hoeschen says:

Simply way around this

If you have a MP3 that is not watermarked (or the watermark was removed) simply load your songs into your favorite MP3 tagging program and add a tag to the Comment section that is unique to you. By doing this you are altering the file (the tags are stored in the MP3 file) and therefore will get a different hash of that file.

As a test I took a MP3 from my own collection and computed a hash of it and got:
EA245E0465FEB2D97D6B12C4E73157386BE9845F

I then loaded that file into my MP3 tagging program and added a comment and got this for the hash of the same file with the addition of a MP3 tag comment:
2CF2735AACBB630E4D111816196E6246A06252A8

Done, my file is now unique to me. Someone could (or may have already) make a program that can go though your MP3s and remove any watermarks and add a unique, user editable tag to the comment.

taoareyou (profile) says:

Consider this

I get all my music from sites like Jamendo which offers free licensed music for download and sharing. It is quite possible that a lot of this music has been shared thousands of times over, legally. Just because an mp3 is shared does not mean it is infringement. Will they manually review every file that shows up in this manner? If not, it could get very embarrassing for them.

Scote (profile) says:

Re: Carefully Review? Mass copyright law suits are never about that...

Mass copyright law suits are never about a *careful* review of the facts. So don’t expect the idea that it would be tough to carefully review the facts to prevent lawsuits.

Apple may never intend to release the info, but once collected, it is subject to subpoena by anyone who claims to need it for a law suit. And, frankly, it would be nearly trivial to use that data to find file sharers. The RIAA and other orgs just need to download the hashes from filesharing sites and look for them, as you noted. When one account has a high enough correspondence, that is, multiple hits, they presumptively file suit and sue for discovery, just like they did just recently in their previous mass litigation. Except this time the data will be even richer, because the RIAA will know about their *entire* music collection, not just the few songs that were on their P2P share.

Apple’s music match is mass litigation just waiting to happen–and there is nothing Apple can do to stop it. If you don’t want information to be used then you shouldn’t collect it in the first place. The only thing Apple can do to prevent such suits is to never archive the basis for the match and to only check acoustical matches and never match based on hashes or metadata.

Anonymous Coward says:

I hope we all realize by now that in no way shape or form does this take into account security vulnerabilities. Anyone ever heard of a botnets, worms, trojans, phishing. Just because an individuals purchased songs are available online in no way guarantees that there is not some malware out there that has a sole purpose of finding content on people’s computers and uploading it into the file sharing space. Seems like a very likely goal of someone in the malware/piracy business.

Rekrul says:

Maybe it’s just me, but storing music on “the cloud” seems like a solution in search of a problem. I mean when did music become so indispensable that people have to have their entire digital music collection available to them 24 hours a day, 7 days a week, from every corner of the planet?

You can buy a memory card the size of an aspirin that can hold over 4,000 MP3 files. Who the hell leaves their house today thinking “Damn, I wish I didn’t have to be so selective about what music I take with me on my half-hour trip to the local grocery store.” Is being limited to “only” 4,000 songs really that much of a hardship? How about buying two memory cards and upping the limit to 8,000 songs? Can anyone here even name 1,000 songs without resorting to looking at a playlist? Does anyone actually listen to 4,000 songs on a regular basis? If each song is an average of three minutes, that’s over eight days worth of music on something that you could mistake for a breath mint.

PaulT (profile) says:

Re: Re:

Hmmm… you sound like you should get out more if the trip to the grocery store is all you use your player for.

Personally, my main music player is a 16Gb iPhone (bad decision on my part, should have gotten the 32Gb), so no expandable memory and that’s memory that also has to store apps, photos, videos, etc. If I’m lucky, I can get around 6-8Gb of music and podcasts on there – and I listen to a lot of podcasts so I maybe get 2-3Gb of actual music on there. My MP3 collection is currently over 120Gb, and that’s off the top of my head, it might be 50Gb+ more (yes, mostly legal stuff not counting things I downloaded during the Napster days).

I also do a lot of travelling and quite often listen at work. That can be 8-10 hours of listening on some days, and if I’m travelling I can’t resync my tunes till i get back. So, yes, I can quite easily tire of music, especially as I have such wide taste in music I might be in the mood for a type of music I don’t have on my iPhone, or only have a single album to go listen to. There’s been no end of times I’ve been in the mood for a bit of 90s metal or grunge, for example, and looked at my selection and realised I mostly synced progressive house, hip hop and breakbeat.

Things have certainly gotten better since I subscribed to Spotify, but they have a lot of gaps in their catalogue, especially from indie labels and albums that don’t have official Spanish releases. A solution that allows me access to my full catalogue wherever I am at any time would definitely be a bonus next time I’m sitting in a London hotel on a trip 1000 miles away from my hard drive for a week (this coming August, by the way).

As is too often the case with this kind of issue, don’t confuse your personal lack of need for a service with the idea of nobody requiring such a service.

Rekrul says:

Re: Re: Re:

Hmmm… you sound like you should get out more if the trip to the grocery store is all you use your player for.

I don’t have any player. I’m not that obsessed with music that I need to have it available to me at all times. Unlike most people, I don’t need a constant stream of music to keep my brain working. Most times I find it a distraction.

I also do a lot of travelling and quite often listen at work. That can be 8-10 hours of listening on some days, and if I’m travelling I can’t resync my tunes till i get back. So, yes, I can quite easily tire of music, especially as I have such wide taste in music I might be in the mood for a type of music I don’t have on my iPhone, or only have a single album to go listen to. There’s been no end of times I’ve been in the mood for a bit of 90s metal or grunge, for example, and looked at my selection and realised I mostly synced progressive house, hip hop and breakbeat.

How did you ever survive before the invention of the MP3 player? It must have been sheer torture having to leave the house knowing that you could only take a couple hours of music with you.

As is too often the case with this kind of issue, don’t confuse your personal lack of need for a service with the idea of nobody requiring such a service.

Music is a luxury, not something you need to live. Nobody is going to go into convulsions and die if they can’t instantly pull up every song they’ve ever heard while walking through the mall.

PaulT (profile) says:

Re: Re: Re: Re:

“I don’t have any player. “

Then why are you criticising others for not using the type of player you’d prefer them to use?

“Unlike most people, I don’t need a constant stream of music to keep my brain working. Most times I find it a distraction.”

Fantastic, you’ve proven that you are an individual with different tastes to some other people, some of whom work better with background music. So?

“How did you ever survive before the invention of the MP3 player? It must have been sheer torture having to leave the house knowing that you could only take a couple hours of music with you.”

Since you’re apparently incapable of reading comprehension, I’ll repeat: I often listen to music on shifts that can last 10 hours, and also spend days or weeks travelling to different countries. That’s more than a couple of hours, and yes only having a few hours of music can get repetitive after a week.

…and to answer your question, yes it can be. Have you ever seen the quality Italian broadcast television in a hotel room or been forced to sit through all to prayer sessions in Muslim countries that drown out your own thoughts while trying to write reports? Or, as a British atheist had to drive through miles of the US where there’s nothing but country music and Christian evangelist stations on the radio? I sure as hell have.

Yes, having your own music can be an extreme blessing, and with modern technology there’s no reason I should be able to access my whole collection. What is the problem with wishing to do so?

“Music is a luxury, not something you need to live. Nobody is going to go into convulsions and die if they can’t instantly pull up every song they’ve ever heard while walking through the mall.”

Ah, I see you’re not interested in discussion, just bringing out ridiculous strawmen now. There’s obviously no point talking to you, just try to accept that others do see the usefulness of a service you don’t personally want.

sowhat says:

I can’t imagine why a client oriented company like Apple would suddenly turn towards looking at users as possible infringers but without knowing the detail agreements with partners anything may be possible.
Regardless I take anything anywhere anyways. And consider the price paid fair and the method of purchase convenient.

kurtgreenwood (profile) says:

no thanks

This iCloud crap is just another attempt by Apple to take existing technology and put an “i” in front of it to make it seem hip. One look at this list of online storage businesses and you can see that the market is already saturated. So my question is.. I like listening to music in lossless audio files (AIFF, WAV). If I upload them to iCloud, do I get to hear iTune’s horrible 256 MP3s?

Add Your Comment

Your email address will not be published. Required fields are marked *

Have a Techdirt Account? Sign in now. Want one? Register here

Comment Options:

Make this the or (get credits or sign in to see balance) what's this?

What's this?

Techdirt community members with Techdirt Credits can spotlight a comment as either the "First Word" or "Last Word" on a particular comment thread. Credits can be purchased at the Techdirt Insider Shop ยป

Follow Techdirt

Techdirt Daily Newsletter

Techdirt Deals
Techdirt Insider Discord
The latest chatter on the Techdirt Insider Discord channel...
Loading...