Elsevier Says Downloading And Content-Mining Licensed Copies Of Research Papers 'Could Be Considered' Stealing

from the gotta-protect-that-39%-profit-margin dept

Elsevier has pretty much established itself as the most hated company in the world of academic publishing, a fact demonstrated most recently when all the editors and editorial board resigned from one of its top journals to set up their own, open access rival. A blog post by the statistician Chris H.J. Hartgerink shows that Elsevier is still an innovator when it comes to making life hard for academics. Hartgerink’s work at Tilburg University in the Netherlands concerns detecting potentially problematic research that might involve data fabrication — obviously an important issue for the academic world. A key technique he is employing is content mining — essentially bringing together large bodies of text and data in order to extract interesting facts from them:

I am trying to extract test results, figures, tables, and other information reported in papers throughout the majority of the psychology literature. As such, I need the research papers published in psychology that I can mine for these data. To this end, I started ‘bulk’ downloading research papers from, for instance, [Elsevier‘s] Sciencedirect. I was doing this for scholarly purposes and took into account potential server load by limiting the amount of papers I downloaded per minute to 9. I had no intention to redistribute the downloaded materials, had legal access to them because my university pays a subscription, and I only wanted to extract facts from these papers.

He spread out the downloads over ten days so as not to hammer Elsevier’s servers — which in any case are doubtless pretty beefy given the 39% profit margin the company enjoys:

I downloaded approximately 30GB of data from Sciencedirect in approximately 10 days. This boils down to a server load of 35KB/s, 0.0021GB/min, 0.125GB/h, 3GB/day.

Elsevier’s response to this super-considerate researcher is a classic:

Approximately two weeks after I started downloading psychology research papers, Elsevier notified my university that this was a violation of the access contract, that this could be considered stealing of content, and that they wanted it to stop. My librarian explicitly instructed me to stop downloading (which I did immediately), otherwise Elsevier would cut all access to Sciencedirect for my university.

There are clear parallels with the situation that Aaron Schwarz found himself in, but with a key difference. Elsevier is not only stopping Hartgerink from carrying out his research, but threatening to cut off all access to the company’s journals and books for everyone working at Tilburg University if he tries to continue. Alicia Wise, Elsevier’s Director of Access & Policy, added the following comment on Hartgerink’s blog post:

We are happy for you to text mind content that we publish via the ScienceDirect API, but not via screen scraping.

When she was asked why it was necessary to use the API, rather than simply downloading articles, she replied:

The reason that we require miners to use the API is so that we can meet their needs AND ALSO the needs of our human users who can continue to read, search and download articles and not have their service interrupted in any way.

But that doesn’t make any sense when Hartgerink had taken such pains to avoid any such adverse affects. Moreover, another commenter noted that Elsevier?s API often fails to work, rendering it useless for content mining. Even when it does work:

In many cases the API returns only metadata in the XML, compared to the fulltext PDF I can access on the website. Simply downloading the paper via the normal web service for readers is easy — much easier than using the API.

What is really at stake here is control. Elsevier wants to be acknowledged as the undisputed gatekeeper for all possible uses of the research it publishes — most of which was paid for by the public through taxes. And as far as the company is concerned, daring to use that knowledge in new ways without additional permission is simply “stealing.”

Follow me @glynmoody on Twitter or identi.ca, and +glynmoody on Google+

Filed Under: , , ,
Companies: elsevier

Rate this comment as insightful
Rate this comment as funny
You have rated this comment as insightful
You have rated this comment as funny
Flag this comment as abusive/trolling/spam
You have flagged this comment
The first word has already been claimed
The last word has already been claimed
Insightful Lightbulb icon Funny Laughing icon Abusive/trolling/spam Flag icon Insightful badge Lightbulb icon Funny badge Laughing icon Comments icon

Comments on “Elsevier Says Downloading And Content-Mining Licensed Copies Of Research Papers 'Could Be Considered' Stealing”

Subscribe: RSS Leave a comment
71 Comments
DannyB (profile) says:

Re: Re: Re:

But when it is business, you are ‘adding value’.

Elsevier adds value by putting a ‘protecting’ the content behind a troll gate paywall. Since you must pay to access the research, it must now (somehow) have become more valuable.

But I suppose Elsevier is lazy. If they wanted to add even more value, the research papers would have DRM and you would only be able to view them on special viewer software that runs on Windows. (Don’t all scientists run only Windows?)

Copy / Paste and the ability to make screenshots would enable thieving pirates to read the research without paying through the nose.

stimoceiver (profile) says:

I applaud the hackers behind the Library Genesis project.

The project makes tens of thousands of peer reviewed journals, books, and other documents freely available through a search engine. The back end repository for all this data is a torrent pool, invisible to the search engine users, but open to participation via seeding or extending the pool.

It mirrors quite a bit of otherwise paywalled content.

And its dedicated to the memory of Aaron Swartz. How cool is that?

DrZZ says:

Peter Murray-Rust has been fighting this for years

One of the most vocal and detailed critic of these kinds of policies has been Peter Murray-Rust. He has been fighting Elsevier specifically for years (see here where he warns about signing Elsevier’s “mining agreement” and here in a 2011 post where he details his “negotiations” with them. Even before that I believe that he got all of Cambridge University’s access to Chemical Abstracts shut down because he was using “too much” data. Elsevier might be the worst, but even some scientific societies will crush research if it will make them some money.

Anonymous Coward says:

Re: Re:

While the smart one are boycotting Elsevier for the publication of new papers, they still need access to the historical papers that Elsevier control. The nature of human advancement is that it is built on the works of those who have gone before, and therefore access to Elevier’s trove of existing papers is what Elsevier is leveraging to keep up their income stream.

Alicia Wise (user link) says:

Elsevier supports content mining, contra your salacious headline

Hi everyone,

As I mentioned in the comment thread to the original blog post, the reason that we require miners to use our API is so that we can meet their needs AND ALSO the needs of our human users. Our platforms provide access to 11million pieces of content, serves millions of researchers, and provides infrastructure for a number of services including ScienceDirect, Scopus, ClinicalKey. We are not alone in providing an API for this sort of high-volume content-intensive service – others including Wikipedia and Twitter take the same approach. We also appreciate that researchers might wish to text mine across publisher platforms, and this is why we also participate in the multi-publisher cross-platform text and data mining service by CrossRef (http://tdmsupport.crossref.org/).

With kind wishes,
Alicia

Dr Alicia Wise
Elsevier
Director of Access & Policy
a.wise@elsevier.com
@wisealic

Anonymous Coward says:

Re: Elsevier supports content mining, contra your salacious headline

Dr. Wise,

It’s also been stated that there are short falls to your API, and that this user took all reasonable actions to use minimal resources. If your alternate service to provide for high volume is not usable for whatever reason, then it’s fully reasonable to use the normal service, so long as you take as much care as this user is stated to have to prevent harming it, for your research.

If using your normal service in an automated manner is a problem, then please explain why, so that we can take appropriate care. Please do not be afraid to give us the technical reason why this use is an issue, as we will likely be able to understand the issue, and possibly propose a fix to this issue.

Sincerely,
Anonymous Coward

voiceofReason (profile) says:

Re: Elsevier supports content mining, contra your salacious headline

I know I’m going to sound like someone working at your GC’s office down the hall, but to me, I think all of this boils down to several basic questions:

1) Did Elsevier obtain this information in accordance with the law and with existing contractual rights and obligations, yes or no?

2) Does Elsevier have property rights in this information, yes or no?

3) Do Elsevier’s protocols for allowing access to this information comply with law and with existing contractual rights and obligations, yes or no?

If the answer to all of these questions is “yes,” then any rights to access that Elsevier grants in addition to what it is obligated, is irrelevant.

All of you folks, if you think Elsevier needs to do more based on a “moral imperative,” I can think of a lot more things that people don’t do voluntarily that they should which have a greater impact on humanity. Go after them first. If I read one more anonymous cowrard plugging away at what, in his not so humble opinion, some random company “should” or “shouldn’t” do, I will barf.

DrZZ says:

Re: Re: Elsevier supports content mining, contra your salacious headline

2) Does Elsevier have property rights in this information, yes or no?

Depends on exactly how you interpret “this information” but almost certainly no. Elsevier has the copyright on the specific expression written by the authors, but it does not have any property rights to the underlying facts and information. The big beef is that they are trying to get such rights by only letting you read their text if you sign a license that has such terms in it.

3) Do Elsevier’s protocols for allowing access to this information comply with law and with existing contractual rights and obligations, yes or no?

I am not aware of any copyright law, anywhere that gives the copyright holder control of how the facts and information in the work is used by someone who legally accesses the work, thus there is no copyright law basis for the distinction between reading and mining. They get this distinction in by putting into the licensing agreement. Is it legal to put it in the licensing agreement? My understanding is at least in the UK, it is not legal. Might be legal in the Netherlands.

voiceofReason (profile) says:

Re: Re: Re: Elsevier supports content mining, contra your salacious headline

Quote: Depends on exactly how you interpret “this information” but almost certainly no. Elsevier has the copyright on the specific expression written by the authors, but it does not have any property rights to the underlying facts and information. The big beef is that they are trying to get such rights by only letting you read their text if you sign a license that has such terms in it.

Response: Agreed. Rephrasing the question, does Elsevier have a legal or contractual obligation to provide non-copyrightable facts and information that has been organized in the way that this information is organized?

If it does, what is the extent of that obligation?

DrZZ says:

Re: Re: Re:2 Elsevier supports content mining, contra your salacious headline

There is certainly no question that it is legal to restrict access to copyrighted material to authorized people, so there is no legal or contractual obligation to provide unauthorized people with the facts and information (although see note below). That isn’t at issue in this case because there is no question that the researcher in question could read any paper he had. The issue is Elsevier wants to create a right to control how you use your legal access via the license agreement. They want to create a separate category of use from just reading and tell you that anything that falls into the separate category of text mining has to use a separate interface with additional agreements that at least according to some, essentially force you to acknowledge they you are using THEIR content, not un-copyrightable facts and information. As others have mentioned in this thread, the separation is not clearly spelled out (at least from an end user perspective) and does not seem to be related to server load or other practical issues. Others think that even trying to make the distinction between reading a paper and analyzing a paper via computer is absurd and only makes sense as an attempt to gain control over information that isn’t yours. These folks have convinced at least the UK legislature to make such contract terms illegal.

Note: the one caveat to the first sentence is that Elsevier does have a program where authors can pay a fee to make their paper free and open access. Peter Murray-Rust and others found numerous examples of papers where such fees were paid and yet there still was a charge to access the papers. Elsevier claimed it was due to some bugs (funny how the bugs only go one way) and I don’t know how many papers were affected or if the problem persists, but there were certainly concrete cases where Elsevier violated their agreement with the author. Come to think of it, the best way to get stats is to use some kind of web crawler to scan though all the open papers, which of course Elsevier says violates your license agreement. Hmmmm.

Anonymous Coward says:

Re: Re: Re:3 Elsevier supports content mining, contra your salacious headline

Elsevier does have a program where authors can pay a fee to make their paper free and open access.

And the fee that Elsevier considered reasonable is why an entire editorial team has resigned to start an open access journal.see “The editor had requested a price of 400 euros, an APC that is not sustainable”, where according to Elsevier:

The article publishing charge at Lingua for open access articles is 1800 USD. The editor had requested a price of 400 euros, an APC that is not sustainable. Had we made the journal open access only and at the suggested price point, it would have rendered the journal no longer viable – something that would serve nobody, least of which the linguistics community.

As far as I know we as discussing a per page charge.

Anonymous Coward says:

Re: Re: Re:3 Elsevier supports content mining, contra your salacious headline

There is certainly no question that it is legal to restrict access to copyrighted material to authorized people,

Wrong, copyright is the right to control the production of new copies, and not what use is made of the copies once sold. Unfortunately this does not fit well in a digital world, where copyright is being distorted into control over information and the uses that can be made of it.

tqk (profile) says:

Re: Re: Re:5 Elsevier supports content mining, contra your salacious headline

More like insisting that access to a publisher’s copy be unfettered,

You mean insisting that access to research done by scientists and paid for by tuition and grants from taxpayers and philanthropists should be unfettered? I fail to see why anyone needs to suffer the likes of Elsevier sticking their rapaciously greedy, self-entitled noses in there. They’ve long overstayed their welcome.

voiceofReason (profile) says:

Re: Re: Re:6 Elsevier supports content mining, contra your salacious headline

Perhaps they have, but I have not noticed an abundance of corporations in this world who intentionally turn down legal ways of earning money. In fact that is why corporations exist. Perhaps you are confusing them with charities.

As long as they have legally obtained this, it should not matter from a legal perspective if they obtained these texts from Warren Buffet or from teenage orphans living in a nunnery.

Come back to me when you see these noble scientists foregoing nicer homes, cars, etc. if an opportunity arises

Anonymous Coward says:

Re: Re: Re:7 Elsevier supports content mining, contra your salacious headline

As long as they have legally obtained this, it should not matter from a legal perspective if they obtained these texts from Warren Buffet or from teenage orphans living in a nunnery.

The good old it is the law, a common justification for maintaining the status quo used by those benefiting from the labours of others, from nobles enforcing serfdom, through slavery to modern corporations. Problem is, when those with the money have to fall back on this justification, they are ignoring the winds of change, and will likely lose more by clinging to the old ways than they would if they adapted their business to the changes in society.

tqk (profile) says:

Re: Re: Re:7 Elsevier supports content mining, contra your salacious headline

As long as they have legally obtained this, it should not matter from a legal perspective …

I’m not one to much care about legal perspectives. There are other, far more important, perspectives besides the legal one, such as morality and ethics. Legality should be the last resort tool you reach for. No, I don’t expect corporations to care about morality and ethics (they’re ill equipped to do so, and by law constrained from doing so), but we do, and we should. I understand Elsevier wants to enrich its shareholders. That doesn’t at all mean it would be smart or correct for us to let them get away with what to me looks like outright theft stirred with slavery.

Come back to me when you see these noble scientists foregoing nicer homes, cars, etc. if an opportunity arises

Wow. Think of where Elsevier gets the content it publishes. Yes, those same “noble scientists” whose face you just spit on. They spent years, or decades, learning their chosen field and the tools they need to understand to practice in their field, competing against all those thousands of others who also want in, yet you can dismiss all of that with “they’re greedy wanting nice homes and cars.” What an asshole!

I look forward to the day Elsevier enters chapter eleven bankruptcy.

Daniel Suarez says:

Re: Kill Decision

so,
you do not want scientists to do a private search
nor private data mining in their private and secure labs,
but you want to have in a file each search associated to each account? just to help us?

hm, that is interesting,
scary but interesting anyway:

-is this information safe? exactly how safe?
-who does have authorized access to this information?
-can this information be used to find out WHAT you are researching into?
-can this information be used to find WHO is researching around specific topics?
-can you think HOW MUCH this information is worth?
-and how dangerous it is for scientists to be in such a list?

have you read Daniel Suarez- Kill Decision?

The Wanderer (profile) says:

Re: Elsevier supports content mining, contra your salacious headline

As I mentioned in the comment thread to the original blog post, the reason that we require miners to use our API is so that we can meet their needs AND ALSO the needs of our human users.

Could you explain in what way the access described in this scenario (data transfer amounting to 35 KB / second, sustained over a week and a half) in any way serves to prevent you from meeting the needs of the human users?

Anonymous Coward says:

11 milion pieces of content

Dear Elsevier,

11 Million pieces of content could easily fit on a 5 GB DVD or two, or a cheap 64GB usb drive.

Where is your option for universities to get ALL documents on a stick for internal distribution, mining and other ‘approved’ purposes. It saves a bundle on server hosting costs too.

You would have to trust your sole suppliers of pieces of content not to distribute it to the world. But that’s the premise of copyright, isn’t it?

Bill Jackson (profile) says:

Nobel Prize Committee

The Committee could strike a powerful blow against Elsevier et al, by adopting a policy that only scholarly articles submitted to Open Access Academic Publishers would be reviewed by the Committee. If they did this they would repeat and emphasize the creative act that made the Nobel Prize the most important body for advancing Science in history.
Who could dare stand against them?

Anonymous Coward says:

And this is threatening a contract violation...

There is no limitation in the sample contract stating a definitive cap to the number of papers you may download. In fact, it says…

Each Authorized User may:

* access, search, browse and view the Subscribed Products;
* print, download and store a reasonable portion of individual items from the Subscribed Products for the exclusive use of such Authorized User;

While there is a clause

Except as expressly stated in this Agreement or otherwise permitted in writing by Elsevier, the Subscriber and its Authorized Users may not:

* use any robots, spiders, crawlers or other automated downloading programs, algorithms or devices to continuously and automatically search, scrape, extract, deep link, index or disrupt the working of the Subscribed Products;

But that, too, was not being violated. No evidence has been put forth of the use of automated downloading or of disruption of services.

So what we have here is a simple case of someone consuming a much larger amount of the services Elsevier provides than normal, while still not violating the contract terms.

In comcast terms, “he violated our unannounced bandwidth cap and must be terminated”.

tqk (profile) says:

The reason that we require miners to use the API is so that we can meet their needs AND ALSO the needs of our human users who can continue to read, search and download articles and not have their service interrupted in any way.

As a data center sysadmin with ca. thirty years in the trenches, this is bullshit. She’s a corporate liar. I’d discount anything she says as corporate PR BS. Elsevier lost the moral high ground long ago, but they’re desperate to not learn they’re morally and ethically bankrupt. There’s too much money at stake for them to acknowledge the facts of reality. She’s been told to say this and has no idea what she’s talking about. She’s saying it because her employer told her to.

What is really at stake here is control.

Yes. The corporate bottom line depends on their not accepting the truth of the situation. Elsevier’s shareholders should be ashamed for consorting with the likes of this. Some people can ignore anything as long as it’s to their financial benefit.

Nomad of Norad says:

What would it take to immediately take the ball away from Elsevier?

What would it take to retroactively make all the papers public domain or otherwise open and freely distributable? I have seen floated the idea that, since all the research behind the papers, and thus the papers themselves, are paid for out of taxpayer money, that that means the government or governments could presumably pass a law stating that ALL such papers, going back to the start of the collection, are hereby declared open-access and that they MUST BE made publicly available to whoever has need of them.

Anonymous Coward says:

Re: What would it take to immediately take the ball away from Elsevier?

“What would it take to retroactively make all the papers public domain or otherwise open and freely distributable?”

actually it looks VERY EASY:

1) hack elsevier
2) dump it to the net

the net will then manage to translate everything to searchable open format
and store it in multiple open repositories
If we can do this with movies, tv series, software and videogames I do not see why this has not been done with humanity knowledge

tqk (profile) says:

Re: What would it take to immediately take the ball away from Elsevier?

… since all the research behind the papers, and thus the papers themselves, are paid for out of taxpayer money, that that means the government or governments could presumably pass a law stating that ALL such papers, going back to the start of the collection, are hereby declared open-access and that they MUST BE made publicly available to whoever has need of them.

I don’t understand why universities haven’t yet banded together to do this. It would be a sweet revenue stream that would fund their students’ research and/or university operations. They could charge a tenth of what Elsevier is skimming off just to enrich third party investors, and still make enough to have plenty left over to fund their students’ research.

Letting Elsevier get away with this seems the silliest way possible, or else somebody’s a getting sweet unearned free ride for the lousiest return imaginable.

Anonymous Coward says:

Re: Re: What would it take to immediately take the ball away from Elsevier?

I don’t understand why universities haven’t yet banded together to do this.

Because they will still need to pay the academic publisher for access to existing papers, and that is a big lever that these publishers wield over the universities.

tqk (profile) says:

Re: Re: Re: What would it take to immediately take the ball away from Elsevier?

I don’t understand why universities haven’t yet banded together to do this.

Because they will still need to pay the academic publisher for access to existing papers, and that is a big lever that these publishers wield over the universities.

Yeah, it’s the same problem as moving to Open Source software. The initial cost is expensive and disruptive short term. Explaining you’ll make up that cost big time on the other side doesn’t seem to fly for short term profit addicts.

Susan Reilly (user link) says:

Speaking out for copyright reform

It’s sad that a researcher downloading content which he has accessed legally and in a responsible manner should have his research stopped in its tracks in this manner. We need more researchers to speak out about this in order to make the case for copyright reform. Policy makers are saying that there is not enough evidence that researchers want to text and data mine and therefore licence solutions, such as the one offered by Elsevier, are sufficient. We’re trying to gather such evidence by asking researchers to sign the Hague Declaration on Knowledge Discovery in the Digital Age http://thehaguedeclaration.com/

Bill Jackson (profile) says:

Snake Bit and going to die = Elsevier

Think magazines, that is what Elsevier sells, and they do not buy the content.
They now practice ‘microkerning’, which means that each copy they supply to a college in electronic format has the letter spacing and word spacing changed a little. It is a form of text based steganography. By this method they police the subscribers by threat of service withdrawal. Every researcher makes scans and sends to friends by e-mail for free. Whenever Elsevier finds one, that analyze it to see who made the scan = threat.
That is the club they bear – a product of a forced monopoly that would take government copyrifght action to recify.

What governments should do is enforce zero copyright on publicly financed papers. Other paper financiers should do the same. It is in all their interests that papers all become open ASAP. It is only in Elseviers monopoly interest that the current systems persist.

Bill Jackson (profile) says:

Re: Re: Snake Bit and going to die = Elsevier

This is an old an mature concept to control leaks in diplomatic circles. Every time they print a document every copy is unique, but to the untrained eye they all look the same, so if a leaker make a protocopy and hands it out a scan and some analysis of word and letter gaps will reveal the leaker. It only take a few changes to cover a group of- say 10 people. This was developed in the 80’s when word processors came out on many desks. You could for years do this with early word processors and even by the letter press method by inserting sliver of spacers in between certain words/letters, but that was labor intensive.
It is so common now, that leakers have learned to retype and paraphrase things they want to leak.

some clues here, https://www.google.ca/search?q=micro-kerning+document+control&oq=micro-kerning+document+control&aqs=chrome..69i57.11966j0j8&sourceid=chrome&es_sm=93&ie=UTF-8

Peter Murray-Rust says:

Re: Re: Re: Snake Bit and going to die = Elsevier

Bill, Thanks

I knew about micro-kerning and its purpose – I was specifically interested in the actual algorithms used – was it glyph widths, or heights, was it inter-character-spacing , etc.

If so, let me know.

(There’s also the cruder annotation of the name of the library subscribing. )

Bill Jackson (profile) says:

Re: Re: Re:2 Snake Bit and going to die = Elsevier

There are a number of ways Embassies, NSA typs organizations, political parties and organizations like Elsevier can use to create a uniquely coded pdf downloaded that links the subscriber’s identity and the date of the download to an individual downloaded document. Bear in mind, all these documents will look superficially identical, same words, same images etc. A single line of text can probably encode 2-3 bits per 5 letter word by microkerning. This form of docu,ent control is used to trap leakers of data, as Elesvier desires. Afterwards the document can be scanned and the same software that created it can inspect the text spacings to identify who sent it. Steganography can also be used with photos.

To combat this, documents need to be OCR recognised and all words re-word processed to standard kerning. Images can also be stripped of steganographic data via projection and re-photographing with a slightly different resolution.

As to the precise ways used, it is hard to say, but if a number of different subscribers downloaded the same document at different locations as discrete subscribers that used the Elsevier API, which causes the system to create the uniquely coded document. With a few of these, they can be analyzed from the various methods used to create them, to see what means is used to encode them

tqk (profile) says:

Re: Re:

Who appointed you in charge of deciding how much money Elsevier should forego based on your morals?

Who appointed Elsevier in charge of deciding what scientists’ published results would cost other researchers to keep up on on and continue their research?

The Jews have a great word for this. It’s chutzpah.

You sicken me moocher, hanger on, know nothing person. I don’t want to share a planet with the likes of you. You’re a predatory a-hole which none of the rest of us wants to be here. Die screaming in a fire. Consider it an act of humanity. Or, just go away. You won’t be missed.

DrZZ says:

Peter Murray-Rust's views.

I don’t think anyone has spent more time and effort working on the problems of text mining that Peter Murray-Rust. Part of the reason he has spent so much time is that he does take laws, licenses, and contracts very seriously, although he does have strong views on how currently many of these are very damaging to the practice of science. He has put together a series of posts that detail his experiences and views related to this matter that can be found starting here

Add Your Comment

Your email address will not be published.

Have a Techdirt Account? Sign in now. Want one? Register here

Comment Options:

Make this the or (get credits or sign in to see balance) what's this?

What's this?

Techdirt community members with Techdirt Credits can spotlight a comment as either the "First Word" or "Last Word" on a particular comment thread. Credits can be purchased at the Techdirt Insider Shop »

Follow Techdirt

Techdirt Daily Newsletter

Techdirt Deals
Techdirt Insider Discord
The latest chatter on the Techdirt Insider Discord channel...
Loading...