In The Vacuum Of AI Legislation, Libraries Have The Playbook
from the always-listen-to-librarians dept
The White House AI framework made official what we already knew: this administration has no interest in regulating AI. Any legislation that contradicts the framework will be a dead end. In this regulatory vacuum, it is instructive to turn to norms developed by libraries and archives through their decades of experience working through the same core issues that are now animating AI debate: understanding copyright law; providing machine access to data; contextualizing information; and adhering to responsible stewardship obligations to communities.
The Google Books Library Project can be instructive. In the mid-2000s, research libraries partnered with Google to digitize and preserve millions of volumes in their collections. To solve the problem of how to store and provide access to a massive number of scanned books, research libraries banded together to create HathiTrust, a secure, searchable repository that remains in use today. Of course, this didn’t happen without legal challenges. Authors Guild separately sued Google and HathiTrust for copyright infringement in what came to be known as the “Google Books” cases. But these cases ultimately established the legal precedent that copying books to create a digital searchable database is fair use. Based on this precedent, research methods such as text and data mining are possible because of mass digitization, and lawful under fair use.
Based on Google Books and other litigation, libraries put a stake in the ground when it comes to copyright law: training AI models on copyrighted works generally is fair use, a position articulated by the Library Copyright Alliance (LCA) in 2023, and updated in light of recent court decisions. In two of those decisions, Kadrey v. Meta and Bartz v. Anthropic, judges held that training AI models on copyrighted works is transformative and therefore fair use. It’s worth noting that these cases are in a commercial context. It is likely that a court would rule in favor of AI uses in educational, research, and scholarly contexts, as those are favored uses under fair use.
Meanwhile, disagreements over AI safety, harm prevention, bias mitigation, and abuse have held up federal AI legislation in the US. But these are not new problems for libraries, which have developed norms to balance the collection and preservation of sensitive information in archives and special collections with the imperative to provide the broadest possible user access to digitized content. One example is the 2010 ARL principles to guide vendor/publisher relations in large scale digitization projects with special collections, which calls for libraries to make material available to the public while providing context to aid in the understanding of that material. Libraries have also developed frameworks for stewarding materials of vulnerable communities and historically marginalized groups, like the Library of Congress access policy on culturally sensitive materials relating to Indigenous peoples, which includes transparent procedures for controlled access and use of culturally sensitive materials.
Congress has also been legislating in the dark around issues like transparency and provenance in AI training, and many of the proposals we have seen so misunderstand these concepts that they threaten to bring the university-based research enterprise to a halt. Libraries already do what Congress is trying to mandate — authenticating, contextualizing, and documenting collections — but the legislation is too disconnected from this expertise, and as a result unworkable for the institutions that actually practice rigorous provenance.
As AI governance debates continue to stall on Capitol Hill, library norms offer a foundation for approaching AI training and research in a way that is responsible, steeped in library expertise, and advances the public interest.
With gratitude to Betsy Rosenblatt, Professor of Law, Case Western Reserve University Law School
Katherine Klosek is the Director of Information Policy and Federal Relations at the Association of Research Libraries.
Filed Under: ai, ai policy, copyright, fair use, librarians, libraries


Comments on “In The Vacuum Of AI Legislation, Libraries Have The Playbook”
This comment has been flagged by the community. Click here to show it.
So. You don’t ever want to get paid for what you write?
Re:
Wut?!?
Re: Re: What's 'her' name, btw?
Shouldn’t you be too busy productivitymaxxing with your AI secretary to pick dumb fights like this?
Re: Re: Re:
Wut?!?
Re: Pseudonymous Coward
You posted this anonymously, is that because you don’t think you should be allowed to use your name online?
spot on!
This is a sharp analysis, Katherine. You’ve hit on the fundamental truth that while the policy world is currently scrambling to define “responsible AI,” libraries have been quietly refining the actual mechanics of ethical data stewardship (and fair use!) for decades. The HathiTrust and Google Books precedents aren’t just legal history; they are the functional blueprints for how we balance massive scale with institutional integrity.
I’m particularly glad you called out the disconnect in current legislative proposals regarding transparency and provenance. There is a real risk that Congress, in an attempt to rein in “Big Tech,” will inadvertently create a regulatory regime that is unworkable for the very research libraries that have been the most rigorous practitioners of these values.
My only concern is that as we lean into these library norms, we must be careful. We want to ensure that library expertise guides the policy without letting the “playbook” be co-opted by commercial interests to justify licensing models that erode the permanent role of the library collection.
Grateful for your leadership on this at ARL!