UK Publishers Moan About Content Mining's Possible Problems; Dismiss Other Countries' Actual Experience
from the why-bother-looking-at-the-evidence? dept
We therefore recommend below that the Government should press at EU level for the introduction of an exception allowing uses of a work enabled by technology which do not directly trade on the underlying creative and expressive purpose of the work (this has been referred to as “non-consumptive” use). The idea is to encompass the uses of copyright works where copying is really only carried out as part of the way the technology works. For instance, in data mining or search engine indexing, copies need to be created for the computer to be able to analyse; the technology provides a substitute for someone reading all the documents. This is not about overriding the aim of copyright – these uses do not compete with the normal exploitation of the work itself – indeed, they may facilitate it. Nor is copyright intended to restrict use of facts. That these new uses happen to fall within the scope of copyright regulation is essentially a side effect of how copyright has been defined, rather than being directly relevant to what copyright is supposed to protect.Who could possibly object to that? Certainly not the UK government, which accepted the recommendation (PDF):
The Government will therefore bring forward proposals in autumn 2011 for a substantial opening up of the UK’s copyright exceptions regime on this basis. This will include proposals for a limited private copying exception; to widen the exception for noncommercial research, which should also cover both text- and data-mining to the extent permissible under EU law.Nonetheless, the UK Publishers Association, which describes its "core service" as "representation and lobbying, around copyright, rights and other matters relevant to our members, who represent roughly 80 per cent of the industry by turnover", is unhappy. Here's Richard Mollet, the Association's CEO, explaining why it is against the idea of such a text-mining exception:
If publishers lost the ability to manage access to allow content mining, three things would happen. First, the platforms would collapse under the technological weight of crawler-bots. Some technical specialists liken the effects to a denial-of-service attack; others say it would be analogous to a broadband connection being diminished by competing use. Those who are already working in partnership on data mining routinely ask searchers to “throttle back” at certain times to prevent such overloads from occurring. Such requests would be impossible to make if no-one had to ask permission in the first place.Large-scale academic content mining is a pretty new and specialized field, so it's hardly likely that there is going to be a sudden mass attack of crawler-bots taking down sites. Publishers would have ample time to expand their infrastructure to handle demand as it developed, which would be to their advantage: the more their holdings are mined, the more they are likely to be cited and read. And if content mining did take off suddenly, that would suggest there is a huge pent-up demand that the current system of licensing has stifled - one more reason why it should be abolished.
Then there is the commercial risk. It is all very well allowing a researcher to access and copy content to mine if they are, indeed, a researcher. But what if they are not? What if their intention is to copy the work for a directly competing-use; what if they have the intention of copying the work and then infringing the copyright in it? Sure they will still be breaking the law, but how do you chase after someone if you don’t know who, or where, they are? The current system of managed access allows the bona fides of miners to be checked out. An exception would make such checks impossible.This makes no sense. Infringing uses are either easy or hard to find using search engines. If they are easy to find, they are easy to pursue. If they are hard - internal uses, for example - then even miners with "bona fides" will be able to use copyright material in exactly these ways, and the publishers won't know.
Which leads to the third risk. Britain would be placing itself at a competitive disadvantage in the European & global marketplace if it were the only country to provide such an exception (oh, except the Japanese and some Nordic countries). Why run the risk of publishing in the UK, which opens its data up to any Tom, Dick & Harry, not to mention the attendant technical and commercial risks, if there are other countries which take a more responsible attitude.The fact that some countries are already allowing content mining ought to be a hint that the other two fears are groundless. Instead, these inconvenient facts are dismissed out of hand as if the experience of "the Japanese and some Nordic countries" somehow doesn't count for UK publishers.
But as it turns out, there's actually a simple way to allay all of Mollet's fears at a stroke. At the beginning of his post he writes:
In coming to its recommendation on content mining, the [Hargreaves] Review drew heavily on the views of various strands of academia, most of which claimed that their vital research was being hampered by the lack of such an exception. The process of requesting licences of publishers was too time-consuming, it was claimed, and so an exception would make life easier.This confirms that the text-mining issue is only being considered in an academic context – it's about giving scholars the ability to extract extra information from academic articles by performing analyses on their texts.
Now, most of that academic research is funded by the public through government grants to educational institutions and researchers, both in the UK and elsewhere. The open access movement has been pointing out for a decade that it would therefore not be unreasonable if the general public had free online access to the results of all this research it paid for - the articles published in academic journals. It would also allow many more scholars to access such publicly-funded work – including those who wanted to carry out text mining.
This would answer Mollet's fear that publishers' "platforms would collapse under the technological weight of crawler-bots." Since the papers could be freely downloaded from any one of the servers holding copies around the Internet, and then analysed on the researcher's own machine, there would be no crawler-bots involved at all. Open access would also eliminate the commercial risk: after all, what's the point in pirating material that is already freely available?
As for that competitive disadvantage Mollet is worried about, moving their academic titles to open access would actually give UK publishers a big advantage, since open access continues to sweep through the academic sector. It would mean that UK publishers were leading the way, rather than dragging their heels at the back.
Follow me @glynmoody on Twitter or identi.ca, and on Google+