by Glyn Moody

Filed Under:
copyright, data mining, education, exceptions, uk

UK Publishers Moan About Content Mining's Possible Problems; Dismiss Other Countries' Actual Experience

from the why-bother-looking-at-the-evidence? dept

One of the recommendations made by the Hargreaves Review in the UK was that a text- and data-mining exception to copyright should be created, with the following explanation of why that made sense (PDF):
We therefore recommend below that the Government should press at EU level for the introduction of an exception allowing uses of a work enabled by technology which do not directly trade on the underlying creative and expressive purpose of the work (this has been referred to as “non-consumptive” use). The idea is to encompass the uses of copyright works where copying is really only carried out as part of the way the technology works. For instance, in data mining or search engine indexing, copies need to be created for the computer to be able to analyse; the technology provides a substitute for someone reading all the documents. This is not about overriding the aim of copyright – these uses do not compete with the normal exploitation of the work itself – indeed, they may facilitate it. Nor is copyright intended to restrict use of facts. That these new uses happen to fall within the scope of copyright regulation is essentially a side effect of how copyright has been defined, rather than being directly relevant to what copyright is supposed to protect.
Who could possibly object to that? Certainly not the UK government, which accepted the recommendation (PDF):
The Government will therefore bring forward proposals in autumn 2011 for a substantial opening up of the UK’s copyright exceptions regime on this basis. This will include proposals for a limited private copying exception; to widen the exception for noncommercial research, which should also cover both text- and data-mining to the extent permissible under EU law.
Nonetheless, the UK Publishers Association, which describes its "core service" as "representation and lobbying, around copyright, rights and other matters relevant to our members, who represent roughly 80 per cent of the industry by turnover", is unhappy. Here's Richard Mollet, the Association's CEO, explaining why it is against the idea of such a text-mining exception:
If publishers lost the ability to manage access to allow content mining, three things would happen. First, the platforms would collapse under the technological weight of crawler-bots. Some technical specialists liken the effects to a denial-of-service attack; others say it would be analogous to a broadband connection being diminished by competing use. Those who are already working in partnership on data mining routinely ask searchers to “throttle back” at certain times to prevent such overloads from occurring. Such requests would be impossible to make if no-one had to ask permission in the first place.
Large-scale academic content mining is a pretty new and specialized field, so it's hardly likely that there is going to be a sudden mass attack of crawler-bots taking down sites. Publishers would have ample time to expand their infrastructure to handle demand as it developed, which would be to their advantage: the more their holdings are mined, the more they are likely to be cited and read. And if content mining did take off suddenly, that would suggest there is a huge pent-up demand that the current system of licensing has stifled - one more reason why it should be abolished.
Then there is the commercial risk. It is all very well allowing a researcher to access and copy content to mine if they are, indeed, a researcher. But what if they are not? What if their intention is to copy the work for a directly competing-use; what if they have the intention of copying the work and then infringing the copyright in it? Sure they will still be breaking the law, but how do you chase after someone if you don’t know who, or where, they are? The current system of managed access allows the bona fides of miners to be checked out. An exception would make such checks impossible.
This makes no sense. Infringing uses are either easy or hard to find using search engines. If they are easy to find, they are easy to pursue. If they are hard - internal uses, for example - then even miners with "bona fides" will be able to use copyright material in exactly these ways, and the publishers won't know.
Which leads to the third risk. Britain would be placing itself at a competitive disadvantage in the European & global marketplace if it were the only country to provide such an exception (oh, except the Japanese and some Nordic countries). Why run the risk of publishing in the UK, which opens its data up to any Tom, Dick & Harry, not to mention the attendant technical and commercial risks, if there are other countries which take a more responsible attitude.
The fact that some countries are already allowing content mining ought to be a hint that the other two fears are groundless. Instead, these inconvenient facts are dismissed out of hand as if the experience of "the Japanese and some Nordic countries" somehow doesn't count for UK publishers.

But as it turns out, there's actually a simple way to allay all of Mollet's fears at a stroke. At the beginning of his post he writes:
In coming to its recommendation on content mining, the [Hargreaves] Review drew heavily on the views of various strands of academia, most of which claimed that their vital research was being hampered by the lack of such an exception. The process of requesting licences of publishers was too time-consuming, it was claimed, and so an exception would make life easier.
This confirms that the text-mining issue is only being considered in an academic context – it's about giving scholars the ability to extract extra information from academic articles by performing analyses on their texts.

Now, most of that academic research is funded by the public through government grants to educational institutions and researchers, both in the UK and elsewhere. The open access movement has been pointing out for a decade that it would therefore not be unreasonable if the general public had free online access to the results of all this research it paid for - the articles published in academic journals. It would also allow many more scholars to access such publicly-funded work – including those who wanted to carry out text mining.

This would answer Mollet's fear that publishers' "platforms would collapse under the technological weight of crawler-bots." Since the papers could be freely downloaded from any one of the servers holding copies around the Internet, and then analysed on the researcher's own machine, there would be no crawler-bots involved at all. Open access would also eliminate the commercial risk: after all, what's the point in pirating material that is already freely available?

As for that competitive disadvantage Mollet is worried about, moving their academic titles to open access would actually give UK publishers a big advantage, since open access continues to sweep through the academic sector. It would mean that UK publishers were leading the way, rather than dragging their heels at the back.

Follow me @glynmoody on Twitter or, and on Google+

Reader Comments

Subscribe: RSS

View by: Time | Thread

  • identicon
    Anonymous Coward, 22 Nov 2011 @ 1:10am

    "Content mining is a pretty new ..." !!!
    I would have thought a techno blogger would be more familiar with the techno world.

    reply to this | link to this | view in chronology ]

  • icon
    Ralph-J (profile), 22 Nov 2011 @ 1:11am

    They don't want a meritocracy

    If works could be freely analyzed, and potential customers can find the works that are really interesting to them, publishers lose the control over the marketability of their works.

    To them, there is a risk that previously lesser known artists are going to be more successful than the ones they were hoping to earn big money with, and which they are supporting with big marketing efforts and investments.

    reply to this | link to this | view in chronology ]

  • identicon
    Dr Evil, 22 Nov 2011 @ 4:39am

    keep your hands off my content

    after all, if you run a crawler bot over my content, I won't have any left to sell or consume myself!

    reply to this | link to this | view in chronology ]

  • identicon
    Anonymous Coward, 22 Nov 2011 @ 8:49am

    Problem is there are "search engines" that well know they are facilitating consumptive uses, and their entire purpose for existing is to knowingly, actively, and deliberately encourage consumptive use. Then, when they are called to task for what they are doing, they do all in their power to manufacture excuses for what they have been doing, looking everywhere else to place the blame.

    BTW, I am pleased that you have used a quote containing the term "non-consumptive" use. The distinction between it and a "consumptive" use is a very beneficial way to contrast that which is of no moment and that which is problematic.

    reply to this | link to this | view in chronology ]

    • identicon
      Anonymous Coward, 22 Nov 2011 @ 12:34pm


      What did you expect a search engine to do? hide content from the eyes of everyone?

      I your view apparently search engines should show nothing, you type in "LoL" and it shows nothing, because otherwise they would be facilitating consumptive uses right?

      reply to this | link to this | view in chronology ]

  • identicon
    Terry Bucknell, 22 Nov 2011 @ 10:40am

    Surely text mining requires the ability to download the site's entire content (probably in NLM DTD XML), not just the odd researcher downloading a few OA articles to their own PC? And surely the owners of the servers are entitled to know who is harvesting their content and when so that they can ensure that their servers can meet all the demands placed upon it?

    reply to this | link to this | view in chronology ]

Add Your Comment

Have a Techdirt Account? Sign in now. Want one? Register here
Get Techdirt’s Daily Email
Use markdown for basic formatting. HTML is no longer supported.
  Save me a cookie
Follow Techdirt
Techdirt Gear
Shop Now: I Invented Email
Report this ad  |  Hide Techdirt ads
Essential Reading
Techdirt Deals
Report this ad  |  Hide Techdirt ads
Techdirt Insider Chat
Report this ad  |  Hide Techdirt ads
Recent Stories
Report this ad  |  Hide Techdirt ads


Email This

This feature is only available to registered users. Register or sign in to use it.