from the into-the-memory-hole dept
The National Endowment for the Humanities announced last Wednesday the "Chronicling America" contest to create projects out of historical newspaper data. The contest is supposed to showcase the history of the United States through the lens of a popular (and somewhat ephemeral) news format. But looking at the limits of the archival data, another story emerges: the dark cloud of copyright's legal uncertainty is threatening the ability of amateur and even professional historians to explore the last century as they might explore the ones before it.
Consider that the National Digital Newspaper Program holds the history of American newspapers only up until 1922. (It originally focused on material from 1900-1910 and gradually expanded outwards to cover material from as early as 1836.) Those years may seem arbitrary—and it makes sense that there would be some cut-off date for a historical archive—but for copyright nerds 1922 rings some bells: it's the latest date from which people can confidently declare a published work is in the public domain. Thanks to the arcane and byzantine rules created by 11 copyright term extensions in the years between 1962 and 1998, determining whether a work from any later requires consulting a flow chart from hell—the simple version of which, published by the Samuelson Clinic last year, runs to 50 pages.
The result is what's been dubbed "The Missing 20th Century," after it was brought to light by the striking research of Paul Heald, which shows copyright restrictions are tightly correlated with the lack of commercial availability of books. He analyzed the titles available in Amazon's warehouses to find a steep drop-off in titles first published after 1923, which carries through until just the last few years. As Heald's research shows, the number of books available from the 1850s is double the number available from 1950.
Despite what advocates of copyright term extensions like to say, the data suggests that after the first few years of a book's publication, publishers as a group are much less willing to print a text that's under copyright than one in the public domain.
The situation with newspapers is worse. After all, while books may tend to see their value to readers taper off after a few years after publication, for newspapers that same tapering happened in just days. Today's newspaper issue may be incredibly valuable in the right hands, but yesterday's is more likely to line bird cages or wrap fish than to end up preserved for posterity.
The big players keep their own archives. The New York Times, for example, makes articles available dating back to 1851. But that's an incomplete solution for two major reasons. For one thing, it sets up a single point of failure that could allow catastrophic losses. Just last month, flooding threatened a priceless collection of photos in the New York Times archive; had those images been digitized and widely copied, no single flood or fire would pose a risk. But also, even a robust archive from a major publication like the Times can't provide the kinds of insights that come from looking at a diverse collection from multiple different sources.
In the world of media journalism, we talk a lot about the future. But we can't have a coherent conversation about that without thinking about the past and the present. And those thoughts, in turn, rely on access to the history that we've allowed to be locked up under effectively unlimited copyright restrictions or as orphan works.
Because this issue is bigger than the entries into a particular contest, or the way today's history students can explore the past. The Atlantic documented last month the near-total disappearance of a groundbreaking series of investigative journalism from just eight years back. If copyright continues to jeopardize the unrestricted ability of archivists and researchers to preserve and contextualize our history, how much will we lose?