California AI Bill Tells GenAI Startups To Nerd Harder

from the this-is-something,-we-must-do-this dept

There’s a stunning degree of fear mongering and lack of humility about what California AI bill SB 942 can or can’t do. Honest conversation about this bill’s limitations are essential to ensuring we don’t pass this ineffective law. But its proponents have obstructed reasoned policy development by injecting panic into that conversation and pretending it will solve various multifaceted GenAI abuses that it simply cannot.

This article is a follow up to my first one, where I explain why SB 942’s forced disclosures and inclusion of AI-generated text were unworkable. Despite recent amendments that fix those two issues, the remaining requirement that nascent AI companies create “AI detection tools” make it still a fundamentally flawed bill.

SB 942 vaguely aims to “tackle the issue of GenAI-produced content” by requiring three things:

  1. AI providers must offer a free AI detection tool that identifies content generated using their service
  2. AI providers must offer users the option to include a conspicuous disclosure on content generated with their service. 
  3. AI providers must embed specific metadata into files and the generated content itself, including company name, version number, timestamp, and a unique identifier.

A theme that runs through nearly every aspect of this bill’s journey through the legislature is a mismatch between any given GenAI abuse and how this bill would address it. For example, a June 28 committee analysis clumsily tries to explain the danger of open-source AI and incorrectly implies that SB 942 will help counter it: 

“ChatGPT is an example of an open-sourced tool, meaning it is accessible to the public. Researchers and developers can also access its code and parameters. This accessibility increases transparency, but it has downsides: when a tool’s code and parameters can be easily accessed, they can be easily altered, and open-source tools have the potential to be used for nefarious purposes.”

While OpenAI has released some model weights, ChatGPT is famously not open-source. Here, open-source has been conflated with “widely available.” Ironically, this bill only applies to AI businesses, and not situations where actual open-source AI software may be abused. 

The analysis continues:

“The need for this bill is further highlighted by various instances and research underscoring the threats posed by unregulated GenAI.”

It goes on to describe three examples of GenAI abuse and again falsely implies this bill is the fix. Let’s talk about why that’s not true. 

Example 1:

In January, voters in New Hampshire received phone calls from an AI-generated voice clone of Joe Biden telling them not to vote in the primary election in order to save their vote for the upcoming general election. They caught the man responsible and he’s now facing a 6 million dollar fine and 13 felony charges. The FCC has also now made AI robocalls illegal in response.  

The bill’s three main provisions would not prevent this from happening again: 1) It would be impractical for consumers to record robocalls and then upload them to a detection tool. 2) Nefarious actors would opt out of the optional disclosures and 3) metadata for the fraudulent audiofile would be irrelevant in the context of an ephemeral phone call. 

Example 2:

A finance worker at a Hong Kong firm was persuaded to transfer $25 million to thieves when they were invited to participate in a video call with several other AI-generated colleagues.

This case shares the same pitfalls as the previous example: 1) An employee who thinks they may be in the middle of a fraudulent video call won’t have any way to upload that information to a detection tool during that session. 2) Bad actors will opt out of disclosures and 3) metadata will again be irrelevant.  

Example 3:  

California high school students have been caught generating nonconsensual nude images of their classmates.

1) Determining the authenticity of this content is useless and doesn’t address the harm it creates. 2) Even if a user opts for a disclosure, the content is still extremely harmful. 3) A hidden disclosure embedded into content might be helpful only in cases where no other evidence or context leads to the perpetrator. But this assumes that all companies can implement this still-developing idea across audio, video, and images. Requiring by law that this be implemented by startups is heavy handed and unrealistic.      


The analysis ends with a hypothetical that highlights yet another fundamental flaw:

“In theory, a person who views a video circulating on social media conveying President Joe Biden telling voters not to vote in the primary election could use a provider’s AI detection tool to upload and analyze the video. By examining the embedded machine-readable disclosures, the user could identify the provider’s name, the GenAI system used, and the creation date, concluding that the video was produced by AI. This process would reveal that the video is not genuine, thus, in theory, helping to prevent the spread of misinformation.”

That sounds nice, in theory. But in reality, it’s more complicated than that.

A social media user that uploads to an AI detection tool may not be returned any useful indication about a piece of content’s authenticity because the law only requires AI providers to determine if their service was used. The user would then need to make uploads to every AI provider’s tool until they get a hit, or give up before they do. The user will also need to understand that each tool has a different level of accuracy across video, image, and audio, and will need to account for the fact that there may be false positive and negative results. 

One alternative to this rigmarole is for users to do what they’ve always done when inquiring about suspicious online content: google it. 

But seriously, how many different AI detection websites will one have to potentially go to? Is it 5? 15? Maybe 50? I doubt anyone has contemplated this number. AI providers with over 1M monthly users will have to comply with this law, making the user threshold arbitrarily over inclusive given that there are at least 1,500 GenAI startups poised for growth. And the unlucky startups that already have 1M users will have only four months to develop this unproven technology before the law takes effect in January of next year.

By passing this law, California lawmakers would be telling an untold number of GenAI startups to nerd harder


And now we get to the weird part. 

In the July 2 hearing [2:55hr mark] for which our much-discussed committee analysis was prepared, co-drafter of the bill Tom Kemp again testified in support. 

One statement stood out in particular:

“Recent changes to the bill have addressed many opposition concerns coming out of the privacy committee. For example, a critic just recently wrote that the recent changes have quote, ‘fixed the biggest issues I’ve had with the bill.’ ” 

Who is this unnamed critic? Well, Kemp appears to quote a comment that I made while summarizing several amendments in a Linkedin post. Read next to my own commentary, the two phrases are almost identical:

“These changes fix the biggest issues I had with the bill, link in comments.”

These two phrases don’t appear anywhere else on the internet, suggesting Kemp did in fact quote me. That’s a shame because while amendments did fix two of three issues I originally pointed out, this short phrasing doesn’t reflect my full thoughts on the bill. SB 942, in my opinion, is still a hot mess. 

The amendments didn’t address the issue of compelled AI detection tools. As we’ve already discussed, requiring AI providers to create detection tools does not guarantee them to actually work well. The fact that we can’t rely on them 100% of the time means there will be false positives and negatives, which has already proved to be highly damaging. Mandating these tools by law, with no consideration for how inaccurate they may be, is tech solutionism and will only confuse people more about the authenticity of content they encounter. 

I regret that my choice of a few short words was used to promote this piece of legislation. To avoid any doubt, I’ve edited my Linkedin post to read:

“These changes fix [some of] the biggest issues I had with the bill, [but it still has many, many issues.]


Alan Kyle is a tech policy professional available for hire in AI Governance, Trust & Safety, and Privacy.

Filed Under: , , , , ,

Rate this comment as insightful
Rate this comment as funny
You have rated this comment as insightful
You have rated this comment as funny
Flag this comment as abusive/trolling/spam
You have flagged this comment
The first word has already been claimed
The last word has already been claimed
Insightful Lightbulb icon Funny Laughing icon Abusive/trolling/spam Flag icon Insightful badge Lightbulb icon Funny badge Laughing icon Comments icon

Comments on “California AI Bill Tells GenAI Startups To Nerd Harder”

Subscribe: RSS Leave a comment
21 Comments
Anonymous Coward says:

AI providers must offer a free AI detection tool that identifies content generated using their service.

So they basically need to retain every output they ever generate for comparison. But since people can make slight alterations, it’ll need to be fuzzy matches.

Best I can tell, the California legislature wants every Disk byte and CPU op from this point forward to be dedicated to tracking AI outputs and running fuzzy matches against them all day every day.

Incompetent fucking nitwits to the last.

Anonymous Coward says:

That was already proposed during the 2000s to add metadata to all images that have been modified using software like Photoshop, because “theses images were fake”, and we absolutely needed to know if the email received was legit.
Then came the 2010s, social networks diffusing fakes on a global scale, to billions of people. So far, most people manage to distinguish pranks from reality.
Now it’s the 2020s, the technology is against the whole humanity and want to alienate use to believe in (better crafted) grotesque fakes because we have too much free time.
By 2030, all the humanity will be wipe out because fakes will become too much real, or us too lazy to overthink about it, and we start cults about them.
By 2040, the technology will be wipe out because it will starting to believe its own fakes, that will be much better than the human ones, and start begin as bad decision maker as (now extinguisher) human.
By 2050, the planet will become again a quiet and nice place for animals and plants.
In two millions years, animals will be evolved enough to start working on some basic machinery that will be able to display nude pictures, without understanding it’s what that will wipe them out.

Anonymous Coward says:

Re:

That was already proposed during the 2000s to add metadata to all images that have been modified using software like Photoshop, because “theses images were fake”, and we absolutely needed to know if the email received was legit.

So the solution to easily manipulated data was additional easily manipulated data. Sounds about like blowhards trying to dictate things they don’t understand. I have no trouble at all believing this idea emerged from a congressional body.

Anonymous Coward says:

AI providers must offer a free AI detection tool that identifies content generated using their service.

So. I use AI to generate some “content” (not sure what that is), then I modify one byte. Now, it’s not AI-generated content.

If you read the current version of this idiotic bill, you’ll find far more stupid concepts. In particular, there’s the entire certification requirement, much of which simply is possible. Example: identifying critical harms the model might cause or enable. Huh? In what universe?

Anonymous Coward says:

Re:

It would absolutely require a level of fuzzy matching. Especially if OCR was involved.

Mike, earlier today, recognized some of his own work taken wildly out of context in the wind. I think that’s what they’re expecting. Ultimately, they just fundamentally misunderstand basic everything. A number among them are likely even aware of that fact but just don’t give a fuck.

Anonymous Coward says:

*These two phrases don’t appear anywhere else on the internet, suggesting Kemp did in fact quote me. That’s a shame because while amendments did fix two of three issues I originally pointed out, this short phrasing doesn’t reflect my full thoughts on the bill. SB 942, in my opinion, is still a hot mess. *

Also showing how “AI” isn’t the issue, people with shoddy thinking and agendas are.

Heart of Dawn (profile) says:

It’s super trivial to remove meta/data, watermarks or other features that could be embedded in a work, such as simply taking a screenshot or capture and sending that instead, cropping it, or re-saving it with slightly worse compression

Nerd harder is not the answer here, we need to teach people media literacy and critical thinking skills, and young people especially just how harmful generating non-consenual media can be

Ethin Probst (profile) says:

Re:

Not just that, but what about text? You can’t insert metadata into text. A string in a computers memory is just bytes. There’s no fancy formatting or other stuff to it unless you count Unicode/UTF-8 as “fancy encodings” (which really it’s not). I’ve said it before to many people: I don’t see how it’s remotely possible to insert any form of metadata into machine-generated text no matter how you slice it.

Ethin Probst (profile) says:

The actual solution

The only truly workable solution I can think of to solve this “problem” is a societal one. It’s a social problem. This bill asks Gen AI people to fix what has been a problem for, like, forever. It’s just taken on a new form, and is a lot easier to cause, than it has been. But it’s still a societal one. But eh, lawmakers don’t listen to me so….

Anonymous Coward says:

Imagine giving math geeks a number from 1 to 6 and telling them to invent a system that can tell the layman at a glance whether it came from a 1d6 or a 1d20, without fail, without going into technical specifics.

That’s what the bill’s asking for. Preemptive notice and staydown. No wonder copyright interests are slavering at the chops, that’s the unicorn they’ve been praying for for years.

Anonymous Coward says:

I rather object to the framing of this piece. “Nerd harder” formed as a rallying call regarding attempts to legislate the existence of “secure encryption that the government can access at will”; something which is, as far as we know, impossible in any known or hypothesized method of encryption

In contrast, content classifiers are well understood and routinely already used in all kinds of fields including generative AI and its adjacent fields (machine learning, neural networks, etc.). They are very much possible, and (at least in image and sound generation) relatively trivial to produce, as there are essentially infinite methods of fingerprinting those which are indistinguihsable to human senses. Text would be the only one with any difficulty, because it generates so little data to begin with (though I question how much text is even covered by the law, as it was mentioned for defining covered entities and then specifically dropped from all subsequent required actions).

The legislation admittedly achieves basically nothing of use, but that’s not because it requires something impossible. It just requires something useless. It’s not a nerd harder situation.

Ironically, this bill only applies to AI businesses, and not situations where actual open-source AI software may be abused.

It applies to any person who creates, codes, or produces a generative AI with at least 1000000 monthly users. While it may be infeasible to identify how many monthly users open source software actually has, and possibly difficult to identiy who coded it, it does cover them.

PaulT (profile) says:

Re:

“there are essentially infinite methods of fingerprinting”

There are equally as many ways of removing the fingerprints. You’re very much on a losing path if you think this is the solution.

“The legislation admittedly achieves basically nothing of use”

Then, whether or not you agree with the above, it’s a waste of time and energy from perhaps the least equipped people to deal with the problems.

“While it may be infeasible to identify how many monthly users open source software actually has, and possibly difficult to identiy who coded it, it does cover them.”

No, as you just admitted – if you place a restriction on who it applies to but you have no way of working out who that restriction applies to, the restriction either applies to everyone or nobody. Then, while you’re trying to work out how to apply it to the people you want it applied to, people outside of your jurisdiction have already altered it.

It’s a hard problem, but watermarks and laws that apply to 3% of the world population (even if the entire US is covered, which is not the case with CA legislation) won’t cut it.

Ultimately, it is the same as the “nerd harder” crypto demands – there are very real problems that make the request either unfeasible, impossible, or only applicable to US markets.

Add Your Comment

Your email address will not be published. Required fields are marked *

Have a Techdirt Account? Sign in now. Want one? Register here

Comment Options:

Make this the or (get credits or sign in to see balance) what's this?

What's this?

Techdirt community members with Techdirt Credits can spotlight a comment as either the "First Word" or "Last Word" on a particular comment thread. Credits can be purchased at the Techdirt Insider Shop »

Follow Techdirt

Techdirt Daily Newsletter

Subscribe to Our Newsletter

Get all our posts in your inbox with the Techdirt Daily Newsletter!

We don’t spam. Read our privacy policy for more info.

Ctrl-Alt-Speech

A weekly news podcast from
Mike Masnick & Ben Whitelaw

Subscribe now to Ctrl-Alt-Speech »
Techdirt Deals
Techdirt Insider Discord
The latest chatter on the Techdirt Insider Discord channel...
Loading...