New Research Shows Digitization Results In Routine Lock-Down Of Public Domain Books
from the what-about-our-rights? dept
But the situation is actually far worse than that, because the public is being denied access to many works that are unambiguously in the public domain because of new restrictions being placed on them when they are digitized. That's something that Techdirt has discussed before, but such stories have been largely anecdotal. Research from New Zealand provides us with more detailed information of what's going on:
In order to establish the extent to which digitized public domain books are being restricted, a sample of 100 pre-1890 books was selected from the New Zealand National Bibliography (NZNB). This sample was chosen on the assumption that these works had entered the public domain under New Zealand copyright law. Each book in the sample was searched for within six online repositories: Google Books, Hathi Trust, Internet Archive, Early New Zealand Books (ENZB), New Zealand Electronic Text Collection (NZETC) and Project Gutenberg. In addition, Google and Bing searches were conducted for all sample books that could not be located within these repositories.Here's what the researchers discovered:
The findings of this research suggest that a high proportion of digitized public domain books are being restricted by online repositories. Out of a sample of 100 public domain books, only three are hosted by repositories that do not impose any form of usage restriction. Furthermore, 48 percent (24) of all digitized books [50 out of the 100 public domain sample] are hosted by a repository that restricts or blocks access, with the most restrictive repository limiting or blocking access to 91 percent (21) of sample books within its collection.They also managed to pinpoint the key problem:
Almost all access restrictions applied to public domain books within the sample were the result of repositories using a process of estimation to assess copyright status. Within the sample, a one-minute search located accurate biographical information about authors two-thirds of the time. This task takes a fraction of the time required to digitize a book, which involves 30 minutes to scan 500 pages (Kelly, 2006).A solution is the following:
Digitizers should incorporate the sourcing of copyright information within the overall process of digitization, and copyright estimation should only be used as an option of last resort. Furthermore, copyright estimation periods should better reflect statistical norms regarding the actual duration of copyright protection. The current estimation period of 140 years, used by Google Books and Hathi Trust, is far too conservative. If hosted under this policy, 47 percent of sample books would be restricted. This is despite the fact that all books with locatable biographical information were confirmed as being in the public domain for between 30 and 132 years.This goes back to the problem of determining whether a work is in the public domain or not. Because that can be complex, those carrying out the digitization of works simply assume the worst, just to be on the safe side. That's something that needs to change, otherwise we risk losing not just the benefits of digitized public domain works, but also our undoubted rights to access them freely.
Follow me @glynmoody on Twitter or identi.ca, and +glynmoody on Google+