In The Vacuum Of AI Legislation, Libraries Have The Playbook

from the always-listen-to-librarians dept

The White House AI framework made official what we already knew: this administration has no interest in regulating AI. Any legislation that contradicts the framework will be a dead end. In this regulatory vacuum, it is instructive to turn to norms developed by libraries and archives through their decades of experience working through the same core issues that are now animating AI debate: understanding copyright law; providing machine access to data; contextualizing information; and adhering to responsible stewardship obligations to communities.

The Google Books Library Project can be instructive. In the mid-2000s, research libraries partnered with Google to digitize and preserve millions of volumes in their collections. To solve the problem of how to store and provide access to a massive number of scanned books, research libraries banded together to create HathiTrust, a secure, searchable repository that remains in use today. Of course, this didn’t happen without legal challenges. Authors Guild separately sued Google and HathiTrust for copyright infringement in what came to be known as the “Google Books” cases. But these cases ultimately established the legal precedent that copying books to create a digital searchable database is fair use. Based on this precedent, research methods such as text and data mining are possible because of mass digitization, and lawful under fair use.

Based on Google Books and other litigation, libraries put a stake in the ground when it comes to copyright law: training AI models on copyrighted works generally is fair use, a position articulated by the Library Copyright Alliance (LCA) in 2023, and updated in light of recent court decisions. In two of those decisions, Kadrey v. Meta and Bartz v. Anthropic, judges held that training AI models on copyrighted works is transformative and therefore fair use. It’s worth noting that these cases are in a commercial context. It is likely that a court would rule in favor of AI uses in educational, research, and scholarly contexts, as those are favored uses under fair use.

Meanwhile, disagreements over AI safety, harm prevention, bias mitigation, and abuse have held up federal AI legislation in the US. But these are not new problems for libraries, which have developed norms to balance the collection and preservation of sensitive information in archives and special collections with the imperative to provide the broadest possible user access to digitized content. One example is the 2010 ARL principles to guide vendor/publisher relations in large scale digitization projects with special collections, which calls for libraries to make material available to the public while providing context to aid in the understanding of that material. Libraries have also developed frameworks for stewarding materials of vulnerable communities and historically marginalized groups, like the Library of Congress access policy on culturally sensitive materials relating to Indigenous peoples, which includes transparent procedures for controlled access and use of culturally sensitive materials.

Congress has also been legislating in the dark around issues like transparency and provenance in AI training, and many of the proposals we have seen so misunderstand these concepts that they threaten to bring the university-based research enterprise to a halt. Libraries already do what Congress is trying to mandate — authenticating, contextualizing, and documenting collections — but the legislation is too disconnected from this expertise, and as a result unworkable for the institutions that actually practice rigorous provenance.

As AI governance debates continue to stall on Capitol Hill, library norms offer a foundation for approaching AI training and research in a way that is responsible, steeped in library expertise, and advances the public interest.

With gratitude to Betsy Rosenblatt, Professor of Law, Case Western Reserve University Law School

Katherine Klosek is the Director of Information Policy and Federal Relations at the Association of Research Libraries.

Filed Under: , , , , ,

Rate this comment as insightful
Rate this comment as funny
You have rated this comment as insightful
You have rated this comment as funny
Flag this comment as abusive/trolling/spam
You have flagged this comment
The first word has already been claimed
The last word has already been claimed
Insightful Lightbulb icon Funny Laughing icon Abusive/trolling/spam Flag icon Insightful badge Lightbulb icon Funny badge Laughing icon Comments icon

Comments on “In The Vacuum Of AI Legislation, Libraries Have The Playbook”

Subscribe: RSS Leave a comment
6 Comments

This comment has been flagged by the community. Click here to show it.

Kyle K. Courtney says:

spot on!

This is a sharp analysis, Katherine. You’ve hit on the fundamental truth that while the policy world is currently scrambling to define “responsible AI,” libraries have been quietly refining the actual mechanics of ethical data stewardship (and fair use!) for decades. The HathiTrust and Google Books precedents aren’t just legal history; they are the functional blueprints for how we balance massive scale with institutional integrity.

I’m particularly glad you called out the disconnect in current legislative proposals regarding transparency and provenance. There is a real risk that Congress, in an attempt to rein in “Big Tech,” will inadvertently create a regulatory regime that is unworkable for the very research libraries that have been the most rigorous practitioners of these values.

My only concern is that as we lean into these library norms, we must be careful. We want to ensure that library expertise guides the policy without letting the “playbook” be co-opted by commercial interests to justify licensing models that erode the permanent role of the library collection.

Grateful for your leadership on this at ARL!

Add Your Comment

Your email address will not be published. Required fields are marked *

Have a Techdirt Account? Sign in now. Want one? Register here

Comment Options:

Make this the or (get credits or sign in to see balance) what's this?

What's this?

Techdirt community members with Techdirt Credits can spotlight a comment as either the "First Word" or "Last Word" on a particular comment thread. Credits can be purchased at the Techdirt Insider Shop »

Follow Techdirt

Techdirt Daily Newsletter

Subscribe to Our Newsletter

Get all our posts in your inbox with the Techdirt Daily Newsletter!

We don’t spam. Read our privacy policy for more info.

Ctrl-Alt-Speech

A weekly news podcast from
Mike Masnick & Ben Whitelaw

Subscribe now to Ctrl-Alt-Speech »
Techdirt Deals
Techdirt Insider Discord
The latest chatter on the Techdirt Insider Discord channel...
Loading...