New Research Shows Digitization Results In Routine Lock-Down Of Public Domain Books

from the what-about-our-rights? dept

The public domain is supposed to be what we receive in return for, and after the expiry of, time-limited, government-backed intellectual monopolies that are granted to creators. As Mike noted recently, that neat equation does not reflect today's reality for copyright, where the situation is so complicated that it requires a 52-page handbook to determine whether or not something is in the public domain.

But the situation is actually far worse than that, because the public is being denied access to many works that are unambiguously in the public domain because of new restrictions being placed on them when they are digitized. That's something that Techdirt has discussed before, but such stories have been largely anecdotal. Research from New Zealand provides us with more detailed information of what's going on:
In order to establish the extent to which digitized public domain books are being restricted, a sample of 100 pre-1890 books was selected from the New Zealand National Bibliography (NZNB). This sample was chosen on the assumption that these works had entered the public domain under New Zealand copyright law. Each book in the sample was searched for within six online repositories: Google Books, Hathi Trust, Internet Archive, Early New Zealand Books (ENZB), New Zealand Electronic Text Collection (NZETC) and Project Gutenberg. In addition, Google and Bing searches were conducted for all sample books that could not be located within these repositories.
Here's what the researchers discovered:
The findings of this research suggest that a high proportion of digitized public domain books are being restricted by online repositories. Out of a sample of 100 public domain books, only three are hosted by repositories that do not impose any form of usage restriction. Furthermore, 48 percent (24) of all digitized books [50 out of the 100 public domain sample] are hosted by a repository that restricts or blocks access, with the most restrictive repository limiting or blocking access to 91 percent (21) of sample books within its collection.
They also managed to pinpoint the key problem:
Almost all access restrictions applied to public domain books within the sample were the result of repositories using a process of estimation to assess copyright status. Within the sample, a one-minute search located accurate biographical information about authors two-thirds of the time. This task takes a fraction of the time required to digitize a book, which involves 30 minutes to scan 500 pages (Kelly, 2006).
A solution is the following:
Digitizers should incorporate the sourcing of copyright information within the overall process of digitization, and copyright estimation should only be used as an option of last resort. Furthermore, copyright estimation periods should better reflect statistical norms regarding the actual duration of copyright protection. The current estimation period of 140 years, used by Google Books and Hathi Trust, is far too conservative. If hosted under this policy, 47 percent of sample books would be restricted. This is despite the fact that all books with locatable biographical information were confirmed as being in the public domain for between 30 and 132 years.
This goes back to the problem of determining whether a work is in the public domain or not. Because that can be complex, those carrying out the digitization of works simply assume the worst, just to be on the safe side. That's something that needs to change, otherwise we risk losing not just the benefits of digitized public domain works, but also our undoubted rights to access them freely.

Follow me @glynmoody on Twitter or identi.ca, and +glynmoody on Google+

Reader Comments (rss)

(Flattened / Threaded)

  1.  
    identicon
    Anonymous Coward, Jun 24th, 2014 @ 2:09am

    Why don't people denounce copyright law?

    Congress has the power to create copyright laws, not the responsibility. The laws aren't necessary, effective, or proportionate. Enforcing them requires reduction of the common carrier principle and mass monitoring of who is doing what online.

    I'd be curious to see a comparison of the percentage of voters who want marijuana legalized vs who want copyright law reformed. If people can spin that as pragmatic, it says something about our society when we can't spin something that affects more than our private lives as pragmatic enough to get off our asses and tend to.

     

    reply to this | link to this | view in thread ]

  2.  
    icon
    Seegras (profile), Jun 24th, 2014 @ 3:11am

    I already wrote about it, about Copyfraud and about repositories supporting that fraud.

    http://seegras.discordia.ch/Blog/stealing-from-the-public-domain/

     

    reply to this | link to this | view in thread ]

  3.  
    identicon
    Anonymous Coward, Jun 24th, 2014 @ 3:31am

    Congress?

    Wait, are you talking about the USA congress? This article is about New Zealand.

     

    reply to this | link to this | view in thread ]

  4.  
    identicon
    broken, Jun 24th, 2014 @ 3:51am

    re

    Copyrights are not for the public good. Simplistic Disney effects... Perpetual milking machines for corporations.

     

    reply to this | link to this | view in thread ]

  5.  
    identicon
    Anonymous Coward, Jun 24th, 2014 @ 4:30am

    As Mike noted recently, that neat equation does not reflect today's reality for copyright, where the situation is so complicated that it requires a 52-page handbook to determine whether or not something is in the public domain.

    This argument is so stupid. Neither you nor Mike actually try to figure out the public domain status of a given. If you did, you'd see how simple it is to do. You don't need all 52 pages for one work.

    But the situation is actually far worse than that, because the public is being denied access to many works that are unambiguously in the public domain because of new restrictions being placed on them when they are digitized.

    Even if a work is in the public domain, it can be locked up behind any paywall the owner of the COPY wants. Another stupid argument.

    This goes back to the problem of determining whether a work is in the public domain or not. Because that can be complex, those carrying out the digitization of works simply assume the worst, just to be on the safe side.

    Again, rather than alarmist bullshit, why don't you walk us through the determination of the public domain status of a given work. The handbook is simple to apply. They even released an 8-page flow chart version, and you only need one page for a given work. One page.

    That's something that needs to change, otherwise we risk losing not just the benefits of digitized public domain works, but also our undoubted rights to access them freely.

    "Undoubted rights"?? That's hilarious. If I have a copy of a public domain work on my bookshelf or on my server, you have ZERO rights to access it. Terrible argument, Glyn.

     

    reply to this | link to this | view in thread ]

  6.  
    identicon
    Anonymous Coward, Jun 24th, 2014 @ 5:00am

    Re:

    Since you are an apparent expert in the field, why don't you walk us through the determination of the public domain status of a given work? I hear the handbook is simple to apply and you only need one page of a given work, or so I'm told anyways.

     

    reply to this | link to this | view in thread ]

  7.  
    identicon
    Anonymous Coward, Jun 24th, 2014 @ 5:19am

    Re: Re:

    Give me the specifics, such as date of publication, date of author's death, whether published with a copyright notice, whether renewed, etc. It might be a while before I can answer since I'm heading out the door right now, but I'll check back this afternoon.

     

    reply to this | link to this | view in thread ]

  8.  
    identicon
    Anonymous Coward, Jun 24th, 2014 @ 5:31am

    Re: Re: Re:

    How about you pick a good example, supply all the details and show why the work should or should not be considered public domain.

     

    reply to this | link to this | view in thread ]

  9.  
    icon
    G Thompson (profile), Jun 24th, 2014 @ 6:01am

    Re: Re: Re:

    Ok I'll bite.

    My Brilliant Career (1901) - Miles Franklin, died 1954
    Animal Farm (1945) - George Orwell, died 1950
    The Great Gatsby (1925) - F Scott Fitzgerald, died 1940
    Tender is the Night (1933) - F Scott Fitzgerald, died 1940
    Lady Chatterley's Lover (1928) - D H Lawrence, died 1930
    Gone with the Wind (1936) - Margaret Mitchell, died 1949
    Between the Acts (1941) - Virginia Woolf, died 1941

    All were published with copyright notices except for first which had copyright at time of creation under blanket copyright structures.

    whether Renewed or not is irrelevant to the above due to the dates of death

    So come on.. you are so knowledgeable and have decided that you can determine copyright in a simplistic flowchart. Have a go at them, should be easy. Oh and remember the answer should be contextually based upon the article above too.

     

    reply to this | link to this | view in thread ]

  10.  
    icon
    Zakida Paul (profile), Jun 24th, 2014 @ 6:18am

    The real copyright theft is what this is.

     

    reply to this | link to this | view in thread ]

  11.  
    identicon
    Anonymous Coward, Jun 24th, 2014 @ 6:58am

    Re:

    Since I disagree with the entire concept of copyright, I could care less how difficult it is to figure out.

    But I agree that a company is not obligated to make their own copies of public domain works freely available to the public.

    As long as no one gets any crazy ideas that there are any restrictions on what anyone can do, once they have access through a paywall or whatever, with the copies that appear on their own devices.

     

    reply to this | link to this | view in thread ]

  12.  
    identicon
    Anonymous Coward, Jun 24th, 2014 @ 7:30am

    Re:

    This argument is so stupid. Neither you nor Mike actually try to figure out the public domain status of a given. If you did, you'd see how simple it is to do. You don't need all 52 pages for one work.

    That only applies when someone has read and understood the implications of all 52 pages. Until they have done that the cannot answer the question, do any other pages in the book change anything I have read so far.

     

    reply to this | link to this | view in thread ]

  13.  
    icon
    PaulT (profile), Jun 24th, 2014 @ 8:26am

    Re: Re: Re:

    Missed the point, didn't you? Even if you're given specifics, you still have to follow the steps contained in those 52 pages to determine copyright status. When the answer should actually just be "if the work is over X years old, it's public domain". Or, preferably "has the author got a current registration on file?".

     

    reply to this | link to this | view in thread ]

  14.  
    icon
    PaulT (profile), Jun 24th, 2014 @ 8:29am

    Re: Re:

    "I could care less how difficult it is to figure out"

    So, you do care since if you didn't care then you *couldn't* care less...

     

    reply to this | link to this | view in thread ]

  15.  
    identicon
    hutcheson, Jun 24th, 2014 @ 11:07am

    The rules (in the U.S.) are indeed horrifically complex, and include such facts as author's citizenship at the time of creation (and the copyright laws in that jurisdiction), author date of death, location and date (including month) of first publication anywhere, location and date (including month) of first U.S. publication ... and, as impossible as most of this is to find[*] there are additional, even-more-obscure details mentioned in the Stanford SUMMARY of copyright law that could impact the result.

    How can you call something intellectual PROPERTY if nobody can know who it belongs to?

    How can you call something INTELLECTUAL property if most of it, is, well, FORGOTTEN?

    [*]Yes, I'm speaking from experience, researching a book by a citizen of the Austro-Hungarian empire who came to the U.S. as a teenager and remained there the rest of his life. How am I as a U.S. citizen supposed to know what the Austro-Hungarian empire's copyright laws were--since the Empire didn't exist or even have a unique successor on the date the book was written! And how can I know whether/when someone became a U.S. citizen?

     

    reply to this | link to this | view in thread ]

  16.  
    identicon
    Anonymous Coward, Jun 24th, 2014 @ 6:22pm

    Re: Re: Re:

    It's colloquial, not literal. English is imperfect, make the most of it and try to keep up. This one has been around for more than 50 years, anyway!

    Fun fact: this is one of the perversions that was exported from the UK rather than imported from the colonies

     

    reply to this | link to this | view in thread ]

  17.  
    identicon
    Anonymous Coward, Jun 24th, 2014 @ 6:31pm

    Re: Congress?

    That's actually part of the problem. Do the websites in the study need to conform to the copyright legislation of the country they are hosted in, or the country the requests come from, or both, or take the worst case scenario from around the world just to be safe?

    I suspect host law is more likely to be involved than destination law, which means that the article isn't, in fact, about New Zealand (law)... or perhaps it is for those sites which have local hosts in New Zealand (Google?).

    See also https://www.techdirt.com/articles/20131231/23434825735/grinch-who-stole-public-domain.shtml

     

    reply to this | link to this | view in thread ]

  18.  
    identicon
    Anonymous Coward, Jun 25th, 2014 @ 5:12am

    Re: Re: Re: Re:

    It has always been 'I couldn't care less', and I am from the UK.

     

    reply to this | link to this | view in thread ]

  19.  
    icon
    1st Dread Pirate Roberts (profile), Jun 25th, 2014 @ 11:32am

    Har!

    Prior to copyright enactment in England, authors had full control of works, essentially forever. Copyright law was intended to force works into the public domain. If you wanted a continuing income stream, you needed to produce new works. You were granted a limited period during which to earn income from your works.

    Copyright has been turned on its head. Thanks to that %$%*@
    Sonny Bono, copyright lasts longer than the lifespan of almost the entire population. That's like not having a copyright law at all.

     

    reply to this | link to this | view in thread ]

  20.  
    identicon
    Anonymous Coward, Jun 25th, 2014 @ 11:55pm

    Re: Re: Re: Re: Re:

    I always got the impression that both were valid: "couldn't" is a simple statement of fact, while "could" carried a sarcastic tone ("I could care less, but it would be hard.")

    Now, to figuratively run literally into the ground...

     

    reply to this | link to this | view in thread ]

  21.  
    icon
    PaulT (profile), Jun 26th, 2014 @ 12:54am

    Re: Re: Re: Re:

    Well, I'm from the UK and I'd never heard the incorrect term being used until I started seeing it online. Where I'm from, it was always "I couldn't care less", which is accurate.

     

    reply to this | link to this | view in thread ]

  22.  
    icon
    G Thompson (profile), Jun 26th, 2014 @ 1:55am

    Re: Re: Re: Re:

    Yep.. as I suspected...


    *crickets*

     

    reply to this | link to this | view in thread ]

  23.  
    icon
    Sheogorath (profile), Nov 4th, 2014 @ 8:45am

    Re: Re: Re: Re:

    I can answer this one from a UK perspective:
    My Brilliant Career (1901) - Miles Franklin,
    died 1954 Under copyright until 2025
    Animal Farm (1945) - George Orwell, died 1950 Under copyright until 2021
    The Great Gatsby (1925) - F Scott Fitzgerald,
    died 1940 Public Domain since 2011
    Tender is the Night (1933) - F Scott Fitzgerald,
    died 1940 Public Domain since 2011
    Lady Chatterley's Lover (1928) - D H
    Lawrence, died 1930 Public Domain from 1981-1996 then since 2001
    Gone with the Wind (1936) - Margaret Mitchell,
    died 1949 Under copyright until 2020
    Between the Acts (1941) - Virginia Woolf, died 1941 Public Domain

     

    reply to this | link to this | view in thread ]

  24.  
    icon
    Sheogorath (profile), Nov 4th, 2014 @ 9:15am

    Re: Re: Re: Re: Re:

    My comment got cut off. The last part should read Between the Acts (1941) - Virginia Woolf, died 1941 Public Domain since 2012

     

    reply to this | link to this | view in thread ]


Add Your Comment

Have a Techdirt Account? Sign in now. Want one? Register here
Get Techdirt’s Daily Email
Save me a cookie
  • Note: A CRLF will be replaced by a break tag (<br>), all other allowable HTML will remain intact
  • Allowed HTML Tags: <b> <i> <a> <em> <br> <strong> <blockquote> <hr> <tt>
Follow Techdirt
Advertisement
Essential Reading
Techdirt Reading List
Techdirt Insider Chat
Advertisement
Recent Stories
Advertisement
Support Techdirt - Get Great Stuff!

Close

Email This

This feature is only available to registered users. Register or sign in to use it.