The American Privacy Rights Act’s Hidden AI Ban
from the hidden-problems dept
The release of a bipartisan draft of the American Privacy Rights Act (APRA) reinvigorated the effort to pass a federal consumer privacy law, only to sputter and stall amid concerns raised from across the political spectrum. All that is gone, however, is not forgotten: it is only a matter of time before Congress returns its institutional gaze to consumer privacy. When it does, Congress should pay careful attention to the implications of the APRA’s policy choices on AI development.
The APRA proposed to regulate AI development and use in two key ways. First, it required impact assessments and audits on algorithms used to make “consequential decisions” in areas such as housing, employment, healthcare, insurance, and credit, and provided consumers with rights to opt-out of the use of such algorithms. House drafters subsequently struck these provisions. Second, perhaps more importantly – and the focus of this article – the APRA also prohibited the use of personal data to train multipurpose AI models. This prohibition is not explicit in the APRA text. Rather, it is a direct implication of the “data minimization” principle that serves as the bedrock of the entire bill.
Data Minimization as a Framework for Consumer Privacy
Data minimization is the principle that data collection should be limited to only what is required to fulfill a specific purpose, and has both procedural and substantive components. Procedural data minimization, which is a hallmark of both European Union and United States privacy law, focuses on disclosure and consumer consent. Virginia’s Consumer Data Protection Act, for example, requires data collected and processed to be “adequate, relevant, and reasonably necessary” for its purposes as disclosed to the consumer. Privacy statutes modeled on procedural data minimization might make it difficult to process certain kinds of personal information, but ultimately with sufficient evidence of disclosure, they tend to remain agnostic about the data’s ultimate use.
Substantive data minimization goes further by limiting the ability of controllers to use consumer data for purposes beyond those expressly permitted under the law. Maryland’s Online Data Privacy Act, enacted earlier this year, is an example of this. The Maryland law permits covered businesses to collect, process or share sensitive data when it is “reasonably necessary and proportionate to provide or maintain a specific product or service requested by the consumer.” Although Maryland permits consumers to consent to additional uses, practices that are by default legal under Virginia’s and similar statutes — such as a local boat builder using data on its current customers’ employment or hobbies to predict who else in the area is likely to be interested in its business — would generally not be permissible in Maryland.
The APRA adopts a substantive data minimization approach, but it goes further than Maryland. The APRA mandates that covered entities shall not collect or process covered data “beyond what is necessary, proportionate, and limited to provide or maintain a specific product or service requested by the individual to whom the data pertains,” or alternatively “for a purpose other than those expressly permitted.” The latter category would then permit data to be used only for purposes explicitly authorized in the legislation — described as “permitted purposes” — but does not permit consumers to consent to additional uses, or even to several such “permitted purposes” at the same time.
The APRA proposes what is essentially a white list approach to data collection and processing. It does not permit personal data to be used for a range of socially-beneficial purposes, such as detecting and preventing identity theft, fraud and harassment that are essential to a functioning economy. And because the development of AI models is not among the permitted purposes, no personal data could be used to train AI models – even if consumers were to consent and even if the information was never disclosed. In contrast, current U.S. laws permit collection and processing of personal data subject to a series of risk-based regulations.
The substantive data minimization approach reflected in the APRA represents a potential sea change in norms for consumer privacy law in the United States. Each of the 19 state consumer privacy laws now in effect has by and large adopted a procedural data minimization approach in which data collection and processing is presumptively permissible. They have generally avoided substantive minimization restrictions. Even Maryland, the most stringent of these, has stopped well short of the APRA’s proposal to restrict data collection and processing to only those uses specified in the bill itself.
The GDPR’s Minimization Approach
The APRA’s approach to data minimization has more in common with the EU General Data Protection Regulation (GDPR) than with U.S. state privacy laws. The GDPR follows a substantive data minimization model, allowing collection only for a set of “specified, explicit, and legitimate” purposes. Unlike the APRA, however, a data controller may use data if a consumer provides affirmative express consent. As such, compliance practitioners typically advise companies operating in Europe that intend to “reuse” data for multiple purposes, such as to train multimodal AI models, to simply obtain a consumer’s consent to use any data sets that would undergird future technological development of these models.1
Even with the permission to use data pursuant to consumer consent, the GDPR framework has been largely criticized for slowing innovation that relies on data. Some have attributed the slow pace of European AI development, compared to the United States and China, to the GDPR’s restriction of data use. Notably, enforcement actions by EU regulators, as well as general uncertainty over the legality of training multimodal AI under the GDPR, have already forced even large companies operating in the EU to altogether stop offering their consumer AI applications within the jurisdiction.
How the APRA Would Cut Off AI Development
The APRA, if enacted in its current form, would have a starker impact on AI development than even the GDPR. This is because the APRA would not permit any “reuse” of data, nor permit the use of data for any purpose outside the bill’s white list, even in cases where a consumer affirmatively consents.
That policy choice moves the APRA from the GDPR’s already restrictive framework into a new kind of exclusively substantive privacy regulation that will hamstring AI development. Multifaceted requests by end users form the foundation of generative AI. Flexibility in consumer applications is these models’ purpose and promise. If data collected and processed for one purpose may never be reused for another purpose regardless of consumer consent or even a clear criteria, training and offering multipurpose generative AI applications is rendered facially illegal. The AI developer that could comply with the GDPR by obtaining affirmative consent in order to enable the reuse of data for multiple productive applications could not do so under the APRA.
The downsides of training entire AI models to serve only one purpose will have negative effects on both safety and reliability. Responsible AI practices include a multitude of safeguards that build off each other and their underlying data set to optimize machine learning applications for accuracy, consumer experience, and even data minimization itself. These improvements would not be feasible if every model used for a new purpose is forced to “start from scratch.” For example, filtering for inaccurate data and efforts to avoid duplicative datasets, both of which depend on well-developed training data, would be rendered ineffective. Consumers would also need to reset preferences, parameters and data output safeguards for each model, leading to user fatigue.
Moreover, the APRA approach would prevent developers from building AI tools designed to enhance privacy. For example, the creation of synthetic data based on well-developed datasets that is then substituted instead of consumers’ personal data — a privacy-protective goal — is impossible in the absence of well-developed underlying data. Paradoxically, consumers’ personal data would instead need to be duplicated to serve each model and each purpose.
The sole provision in the APRA that would generally permit personal data to be used in technological development is a specific permitted purpose that allows covered entities to “develop or enhance a product or service.” This subsection, however, applies only to de-identified data. Filtering out all personal data from AI training data sets presents an impossible challenge at scale. Models are not capable of distinguishing whether, for example, a word is a name, or what data may be linked to it. Implementing filters attempting to weed out all personal data from a training data set would inevitably also remove large swaths of non-personal data – a phenomenon known as “false positives.” High false positive rates are especially detrimental to training data sets because they refer to the removal of large amounts of valuable training data that are not personal data, leading to unpredictable and potentially biased results.
Even if this were feasible, filtering all personal data out from training data would itself lower the quality of the data set, further biasing outputs. Furthermore, many AI models include anti-bias output safeguards that would also be diminished in the absence of the data they use to control for bias. Thus, a lack of relevant training data can bias outputs, yet so too can an inherently biased model whose output safeguards are rendered ineffective because they lack the necessary personal information to accomplish their task. Unfortunately, both of these harms are almost certain to materialize under a regime that wholly eschews personal information from inclusion in training data.
Where to Go From Here
As the APRA falters and Congress looks forward to a likely redraft of federal privacy legislation, it is critical to avoid mothballing domestic AI development with a poorly-scoped overhaul of U.S. privacy norms. For several years preceding the APRA’s introduction, privacy advocates have advanced a narrative that the U.S. experiment with “notice and choice,” or notifying consumers and presenting an opportunity to opt out of data collection, has failed to protect consumer data. Improving this framework in a way that gives consumers greater control over their data is possible, and even desirable, via federal legislation. Yet a framework built around permitting only predetermined uses of data would have unintended, unforeseen and potentially disastrous consequences both for domestic technological development and U.S. competitiveness on the world stage.
1 The GDPR does not generally permit data collected for one permitted purpose to be used for others, except as subject to vague criteria. Although the law includes a series of criteria to do so, these criteria are. They include 1) a link between the new and original purpose, 2) the context of collection, “in particular regarding the relationship between data subjects and the controller,” 3) the nature and sensitivity of the personal data, 4) the possible consequences of the new processing to data subjects, and 5) appropriate technical safeguards. The GDPR also specifically articulates that this criteria also may not include contextual considerations, rendering compliance uncertain in the majority of cases.
Paul Lekas is Senior Vice President and Head of Global Public Policy and Government Affairs at the Software & Information Industry Association (SIIA). Anton van Seventer is Counsel for Privacy and Data Policy at SIIA.
Filed Under: ai, apra, data minimization, gdpr, generative ai, permission, privacy


Comments on “The American Privacy Rights Act’s Hidden AI Ban”
After reading the article, and looking into the SIIA, I am not impressed.
The APRA would hinder corporate AI garbage? Good. The hypotheticals sound stupid.
What would this even be used for?
This piece is just the same vague gesturing toward “innovation”, from an industry firm that doesn’t actually care about consumers or users or anything outside of profit. Just last year, they wrote a piece called “The Case For Right To Repair Has Not Been Made“.
Re:
Thanks for the context. That article is a huge red flag. It praises section 1201 of the DMCA, which chills not only independent repairs, but also security research, accessibility, and creative remixing of videos.
People should certainly have the right to provide data as they see fit.
But i don’t see a reason to buy into the idea of harms to LLM or other objects in the field of AI here. If you need AI to balance your checkbook or whatever, you and society as a whole have problems that will absolutely not be solved or mitigated by any AI ever. We don’t need AI everywhere. It’s already annoyingly intrusive.
This is merely self-serving PR
There’s a lot of hand-waving and plaintive whining here, but the bottom line is “give us all your private data so that our bullshit AI products can make us some money before everyone realizes that they’re bullshit AI products and the bottom drops out of the market”.
No.
If you can’t develop your bullshit AI product without private data, then you shouldn’t develop it all. (Which isn’t a bad idea.) Privacy — and security, which completely depends on privacy — is/are vastly more important and will still be vastly more important after you’ve cashed out and gone on to shill for the next round of bogus “innovators”. (So far: virtual reality, crypto, AI.)
I’m not really seeing the problem. This sounds like a natural outcome of consumer privacy. Which is..good. That’s the whole point.
You do realize that every single one of these monolithic “rights” laws, in the real world, transforms into “Check this box to waive all your rights under the XYZ Privacy Law or you’re not allowed to do anything here.” It’s utterly meaningless. The GDPR is a gigantic joke that serves no purpose other than to give lawyers a reason to sue each other. ARPA, if it’s passed, will be much the same. No one will be protected. Nothing will be saved.
Hey, if a policy kills AI development, it is more, mot less, likely to be a reasonable policy
Re:
You do realize that AI, as a concept, is not inherently bad, right?
Re: Re:
You do realize that the current methods of AI development are, to put it lightly, inherently bad.
Re: Re: Re:
“A.I.” has never been much more than a buzzword to get funding. It was big in the 1960s, then there was an “A.I. winter” in the 1970s once people realized it had been vastly over-hyped. It came back into vogue briefly in the 1980s, before another longer “winter” quickly started. Of course it happened again with the dot-com boom and crash; then we had Alexa and Siri, which are still around but no longer really seen as “A.I.”; and now we’re back in a major boom.
Every time it happens, the new hypesters try to distance themselves from the goals of the past, and the claim of “intelligence” in particular. Right now we’re at the phase of “it’s fun to play with, even if its bizarre mistakes make it less useful than promised”.
Re: Re: Re:2
Problem is, the development lab leaks untested product and marketing treats the customer as though they are an efficient QA dept. ymmv
Re: Re: Re:
What on earth are you smoking and how the hell did this get insightful? Are idiots really that enamored with their AI-hatred?
What I wanna know is, whose gonna police all of the “can’t do that!” follow-on uses of legitimately collected data, hmmmm?
And regardless of that little factor, it remains true that when doing something is outlawed, newly-minted outlaws will continue to do it. At this point, the risk is far outweighed by the potential rewards.
Re:
” whose gonna police all of the “can’t do that!””
I guess it is up to the civil court system then. We have already seen some of this, there was a story a while back where a woman was declared pregnant by some chain store that leaks purchase records. That sort of info was sensitive back then but has now become a huge concern for many people.
If you show that you can’t be trusted to behave responsibly with the toys you’ve gotten you don’t get to act surprised when someone decides to take them away, and if that results in them going a tad overboard then again, maybe you should have shown a bit more restraint to begin with.
While I’m slightly more inclined to think that making it so that people cannot give consent to their data being used might be a bit too far I can also see how leaving that window open could result in the requirement being trivialized as it becomes just another ‘Click to accept cookies’ button for services to use. ‘Click to allow us to collect and use your data, we pinky-promise we’ll be responsible with it’.
Of course the real test will be how widespread the limitations are, if they’re only aimed at a select few companies/industries rather than applying to data collection as a whole the act will end up being just another performative ‘Tech companies are evil, look at us Doing Something about/to them!’ bill.
For starters, someone should not be discriminated against for not opting into the training and it shouldn’t be thrust in someone’s face when they’re there to use some functionality which doesn’t involve it.
Is this really 2 shills that are being allowed to do their spin and propaganda here?
What is going on Techdirt?
We have already seen several cases where this supposed AI has been used to harass those least able to defend themselves. But I’m sure more benevolent minds will prevail .. not.
https://projects.tampabay.com/projects/2020/investigations/police-pasco-sheriff-targeted/intelligence-led-policing/