from the too-bad dept
Nagaraj realized that Wikipedians were using this as good source material for Wikipedia pages -- especially on the profiles of older baseball players. He noted that there was little stopping the text from being rewritten, but the real issue was around images. People could use the scanned images to illustrate the profiles, but clearly they could only use the public domain ones without permission.
But Nagaraj found was that the availability of public domain material dramatically improved the article's images. Before the digitization, players from between '44 and '64 had an average of .183 pictures on their articles. The '64 to '84 group had about .158 pictures. But after digitization, those numbers dramatically changed: there were 1.15 pictures on each of the older group's articles -- but only .667 in the new group. More recent players, covered by privately-owned parts of Baseball Digest, had half as many images on their pages as did old-timers.And, yes, the article notes that he put in place various controls to correct for unrelated differences. Basically, the only observable difference in why the pages have more images is the public domain status of some of those works vs. others. Some might argue that this is no big deal, but he found a second bit of useful data s well:
And the effects of this -- of just having an image on the page -- cascaded to other metrics. "Out-of-copyright" players's pages saw a significant boost in traffic. Articles from the pre-'64 that were already in the top 10 percent saw their hits increase more than 70 percent. Articles from that group in the least-popular ten percent saw traffic to their articles increase by 25 percent. Those pages were more frequently edited across the board, too. And this makes sense: Google rewards updated content, and it rewards images. The out-of-copyright players provided more of both.I'm reminded, yet again, of that chart of the now infamous gap in books under copyright that you can't find any more -- even though older books in the public domain are widely available. Once again, we're seeing not only the massive value of the public domain, but how much useful content is being locked away by excessively strict (and excessively long) copyright law.