The Fear Of AI Just Killed A Very Useful Tool

from the can-we-not? dept

I do understand why so many people, especially creative folks, are worried about AI and how it’s used. The future is quite unknown, and things are changing very rapidly, at a pace that can feel out of control. However, when concern and worry about new technologies and how they may impact things morphs into mob-inspiring fear, dumb things happen. I would much rather that when we look at new things, we take a more realistic approach to them, and look at ways we can keep the good parts of what they provide, while looking for ways to mitigate the downsides.

Hopefully without everyone going crazy in the meantime. Unfortunately, that’s not really the world we live in.

Last year, when everyone was focused on generative AI for images, we had Rob Sheridan on the podcast to talk about why it was important for creative people to figure out how to embrace the technology rather than fear it. The opening story of the recent NY Times profile of me was all about me in a group chat, trying to suggest to some very creative Hollywood folks how to embrace AI rather than simply raging against it. And I’ve already called out how folks rushing to copyright, thinking that will somehow “save” them from AI, are barking up the wrong tree.

But, in the meantime, the fear over AI is leading to some crazy and sometimes unfortunate outcomes. Benji Smith, who created what appears to be an absolutely amazing tool for writers, Shaxpir, also created what looked like an absolutely fascinating tool called Prosecraft, that had scanned and analyzed a whole bunch of books and would let you call up really useful data on books.

He created it years ago, based on an idea he had years earlier, trying to understand the length of various books (which he initially kept in a spreadsheet). As Smith himself describes in a blog post:

I heard a story on NPR about how Kurt Vonnegut invented an idea about the “shapes of stories” by counting happy and sad words. The University of Vermont “Computational Story Lab” published research papers about how this technique could show the major plot points and the “emotional story arc” of the Harry Potter novels (as well as many many other books).

So I tried it myself and found that I could plot a graph of the emotional ups and downs of any story. I added those new “sentiment analysis” tools to the prosecraft website too.

When I ran out of books on my own shelves, I looked to the internet for more text that I could analyze, and I used web crawlers to find more books. I wanted to be mindful of the diversity of different stories, so I tried to find books by authors of every race and gender, from every different cultural and political background, writing in every different genre and exploring all different kinds of themes. Fiction and nonfiction and philosophy and science and religion and culture and politics.

Somewhere out there on the internet, I thought to myself, there was a new author writing a horror or romance or fantasy novel, struggling for guidance about how long to write their stories, how to write more vivid prose, and how much “passive voice” was too much or too little.

I wanted to give those budding storytellers a suite of “lexicographic” tools that they could use, to compare their own writing with the writing of authors they admire. I’ve been working in the field of computational linguistics and machine learning for 20+ years, and I was always frustrated that the fancy tools were only accessible to big businesses and government spy agencies. I wanted to bring that magic to everyone.

Frankly, all of that sounds amazing. And amazingly useful. Even more amazing is that he built it, and it worked. It would produce useful analysis of books, such as this example from Alice’s Adventures in Wonderland:

And, it could also do further analysis like the following:

This is all quite interesting. It’s also the kind of thing that data scientists do on all kinds of work for useful purposes.

Smith built Prosecraft into Shaxpir, again, making it a more useful tool. But, on Monday, some authors on the internet found out about it and lost their shit, leading Smith to shut the whole project down.

There seems to be a lot of misunderstanding about all of this. Smith notes that he had researched the copyright issues and was sure he wasn’t violating anything, and he’s right. We’ve gone over this many times before. Scanning books is pretty clearly fair use. What you do with that later could violate copyright law, but I don’t see anything that Prosecraft did that comes anywhere even remotely close to violating copyright law.

But… some authors got pretty upset about all of it.

I’m still perplexed at what the complaint is here? You don’t need to “consent” for someone to analyze your book. You don’t need to “consent” to someone putting up statistics about their analysis of your book.

But, Zach’s tweet went viral with a bunch of folks ready to blow up anything that smacks of tech bro AI, and lots of authors started yelling at Smith.

The Gizmodo article has a ridiculously wrong “fair use” analysis, saying “Fair Use does not, by any stretch of the imagination, allow you to use an author’s entire copyrighted work without permission as a part of a data training program that feeds into your own ‘AI algorithm.’” Except… it almost certainly does? Again, we’ve gone through this with the Google Book scanning case, and the courts said that you can absolutely do that because it’s transformative.

It seems that what really tripped up people here was the “AI” part of it, and the fear that this was just another a VC funded “tech bro” exercise of building something to get rich by using the works of creatives. Except… none of that is accurate. As Smith explained in his blog post:

For what it’s worth, the prosecraft website has never generated any income. The Shaxpir desktop app is a labor of love, and during most of its lifetime, I’ve worked other jobs to pay the bills while trying to get the company off the ground and solve the technical challenges of scaling a startup with limited resources. We’ve never taken any VC money, and the whole company is a two-person operation just working our hardest to serve our small community of authors.

He also recognizes that the concerns about it being some “AI” thing are probably what upset people, but plenty of authors have found the tool super useful, and even added their own books:

I launched the prosecraft website in the summer of 2017, and I started showing it off to authors at writers conferences. The response was universally positive, and I incorporated the prosecraft analytic tools into the Shaxpir desktop application so that authors could privately run these analytics on their own works-in-progress (without ever sharing those analyses publicly, or even privately with us in our cloud).

I’ve spent thousands of hours working on this project, cleaning up and annotating text, organizing and tweaking things. A small handful of authors have even reached out to me, asking to have their books added to the website. I was grateful for their enthusiasm.

But in the meantime, “AI” became a thing.

And the arrival of AI on the scene has been tainted by early use-cases that allow anyone to create zero-effort impersonations of artists, cutting those creators out of their own creative process.

That’s not something I ever wanted to participate in.

Smith took the project down entirely because of that. He doesn’t want to get lumped in with other projects, and even though his project is almost certainly legal, he recognized that this was becoming an issue:

Today the community of authors has spoken out, and I’m listening. I care about you, and I hear your objections.

Your feelings are legitimate, and I hope you’ll accept my sincerest apologies. I care about stories. I care about publishing. I care about authors. I never meant to hurt anyone. I only hoped to make something that would be fun and useful and beautiful, for people like me out there struggling to tell their own stories.

I find all of this really unfortunate. Smith built something really cool, really amazing, that does not, in any way, infringe on anyone’s rights. I get the kneejerk reaction from some authors, who feared that this was some obnoxious project, but couldn’t they have taken 10 minutes to look at the details of what it was they were killing?

I know we live in an outrage era, where the immediate reaction is to turn the outrage meter up to 11. I’m certainly guilty of that at times myself. But this whole incident is just sad. It was an overreaction from the start, destroying what had been a clear labor of love and a useful project, through misleading and misguided attacks from authors.

Filed Under: , , , , , , , ,
Companies: shaxpir

Rate this comment as insightful
Rate this comment as funny
You have rated this comment as insightful
You have rated this comment as funny
Flag this comment as abusive/trolling/spam
You have flagged this comment
The first word has already been claimed
The last word has already been claimed
Insightful Lightbulb icon Funny Laughing icon Abusive/trolling/spam Flag icon Insightful badge Lightbulb icon Funny badge Laughing icon Comments icon

Comments on “The Fear Of AI Just Killed A Very Useful Tool”

Subscribe: RSS Leave a comment
153 Comments
This comment has been deemed insightful by the community.
jmcken says:

Re:

Think of it this way: If someone pirates a movie, that’s unlawful. But if that person then writes a review of that movie for their blog, that review is perfectly legal, regardless of how they obtained their copy of the movie.

Prosecraft was basically a fancy “review” of books with extra bells and whistles. Even if he’d pirated the books, it would have no bearing on Prosecraft itself.

jmcken says:

Re:

Think of it this way: If someone pirates a movie, that’s unlawful. But if that person then reviews that movie for their blog or YouTube channel or what have you, that review in itself is perfectly legal, regardless of how they obtained their copy of the movie.

Prosecraft was basically a fancy “review” of books with extra bells and whistles. Even if he’d pirated the books, he might get in trouble for that if found out, but it would have no bearing on Prosecraft itself.

This comment has been flagged by the community. Click here to show it.

This comment has been deemed insightful by the community.
Anonymous Coward says:

Re: Re:

The creative process is 10% ideas and 90% sweat and effective use of the tools. Most, if not all budding authors need help in learn and refine their use of language, and that is a tool that helps them do that, building on the most basic of advice to budding authors,look at how successful authors expressed things.

You would not have learnt effective use of the English language if you had not read and learnt effective expression from lots of books etc. Shaxpir is a tool that extracts a lot of what people learn by reading widely, and can help someone develop their writing skills.

You absolute imbecile says:

Re: Re:

WHO has claimed the intent to replace traditional analysis, you absolutely fucking imbecile? You cave dwelling troglodyte, you feeble minded lobotomite, you brain damaged moron, you drooling senile. This is LITERALLY sentiment analysis run on a piece of text to get interesting statistics out of curiosity, as it has been made clear in this very same article you absolute illiterate brainlet.

Anonymous Coward says:

How many authors actually make a living from their writing compared to how many people author stories and publish for free? It seems to me that those shouting the loudest are the very few who won the lottery and found a publisher, and they want to stop a younger hungrier author replacing them in sweep stakes for success.

Rowenna says:

Re:

Oh hon, no–the tool was crap. It wasn’t helping anyone learn to write, at least not any better than they could have learned on their own. It applied broad concept like “vividness” arbitrarily and poorly. Basic definitions (see “passive voice” for example) were not even correct. The fact that he was peddling a defective and potentially harmful product to newbie writers ticked me off, actually, as a published writer. There are enough ways for “young hungry authors” to get scammed already.

This comment has been flagged by the community. Click here to show it.

This comment has been deemed insightful by the community.
Anonymous Coward says:

Re:

Perhaps a more in-depth review of precisely what analysis was being performed might be in order. You seem to think that this program was intended to have human-level recognition of content, and thus are attempting to attribute to the analysis abilities that were never claimed.

This comment has been deemed insightful by the community.
James Burkhardt (profile) says:

Re: Re: Re:

Right, but its not there to replace humans. Its providing data a human can analyze. That’s the issue everyone has with AI bros – the assumption that the tool should or needs to replace humanity. This tool was never intended to perform a human analysis. It is intended to provide raw data, and let the human do the human analysis part.

Dan Someone says:

Re: Re: Re:2 Raw data to what end?

So what is the significance – let alone the importance – of this “raw data”? What are authors supposed to take away from these reports? “Oh, my work is in the 43rd percentile for use of adverbs.” So what?

The issue I have with this is not that it’s trying to replace humanity, but that it is trying to solve a problem that doesn’t exist. Quantifying these (or any other) aspects of writing and comparing them to those aspects over a large corpus of other works doesn’t tell anybody anything meaningful. Benji says:

“Somewhere out there on the internet, I thought to myself, there was a new author writing a horror or romance or fantasy novel, struggling for guidance about how long to write their stories, how to write more vivid prose, and how much ‘passive voice’ was too much or too little.”

But all those prescriptive concepts – “how long,” “too much,” “too little” – are (a) entirely subjective and (b) nonsensical. So all the tool provides is a quantitative analysis that serves no actual purpose. (And new authors eventually find their own voice; they don’t need a quant analysis to tell the what their voice “should” sound like.)

Also, to the extent that it pushes authors to modify their writing to be more like the rest of what’s out there, it creates a risk of homogenizing literature – a problem similar to that of training generative AI on an internet that is increasingly composed of AI-generated content.

Maybe the tool is legal under copyright law. But regardless, in a world where AI-generated content is being pushed hard as an alternative to human creativity, it is certainly understandable why authors reacted with alarm and anger, and it was so unnecessary.

This comment has been deemed insightful by the community.
neitherbeckusnaurbacchus (profile) says:

Reminds me of Searchtodon

This is very reminiscent of the outrage that was directed at Searchtodon, where someone provided a genuinely useful service, and then it was a target of people accusing it of being “created by an out of touch tech bro”, and saying that it causes harm through a mechanism that it doesn’t actually implement.

Anonymous Coward says:

Commenters on the Gizmodo piece are pointing out how the analysis of the books, and also the books chosen to be in there, were suspect and makes it sound not all that useful. Like the meanings of words could be solved with math and misunderstanding what “passive voice” means. Also, why were self-help books from Faith G Harper in there, for instance? That sure ain’t any “prose”…

This comment has been deemed insightful by the community.
blakestacey (profile) says:

Re:

Many humans who complain about the passive voice have neither a clue what linguists mean by that nor a coherent idea of what they’re complaining about. So, a program written to spot it will quite likely be misguided, and a model trained on a corpus of people kvetching about it will probably be completely incoherent.

Anonymous Coward says:

I read this article on the whole thing. Diane Urban had a “most vivid page” that was the most spoilery page of the climax of the book, something not really publicly available. The Quartz piece also points out that back in March, Benji was looking for help to fine tune/train an LLM.

This Mashable article also points out that Shaxpir has a monthly subscription option and parts of it were made with Prosecraft’s database.

Yeah, this does sound like a tech bro thing built off of work that wasn’t paid for the more I’m digging into it.

Anonymous Coward says:

Re:

This Mashable article also points out that Shaxpir has a monthly subscription option and parts of it were made with Prosecraft’s database.

Yeah, this does sound like a tech bro thing built off of work that wasn’t paid for the more I’m digging into it.

Perhaps you can expand on that last bit. Shaxpir built off of Prosecraft’s database? As in “one product the author built using the database of another product the author built”?

Or are you thinking about all the books scanned into the database? The post here kinda describes that the database itself is almost certainly within Fair Use limits.

(Or, as you were still looking into it, are you now looking for an edit button? 🙂 )

Anonymous Coward says:

Re: Re:

The way that Prosecraft had decidedly non-prose stuff like a Faith G Harper’s self-help book, some pages featured as the most vivid were definitely-not-publicly-available stuff like the most spoilery pages of a climax while Benji says he supposedly cares about authors, Benji was looking for help with another LLM, Shaxpir being a service with a paid subscription tier and Shaxpir using data from Prosecraft… it all just reeks of the tech dude getting caught by authors and having to issue some fake apologia about it, rather than Prosecraft and Shaxpir being some legitimately misunderstood wonder-tools.

Anonymous Coward says:

Re: Re: Re:2

From the article, regarding the page from Diana Urban’s book:

“The ‘most vivid page’ excerpt from my book was literally the most spoilery moment of the climax, not published publicly, not scrapable…”

Where did Benji get it? Like I said, it reeks of him trawling download sites for books and not actually caring about authors, the opposite of how he says he cares about them.

Anonymous Coward says:

Re: Re: Re:4

It’s in print. But looking at the facts, it’s clear he was lifting books from places that nobody who says they “care” about authors would actually use, and then monetizing the “analysis” from Prosecraft by tying it to a subscription in the Shaxpir app. Benji is a liar and a hypocrite.

tim fitz says:

Re: Re: Re:5

looking at the facts, there is zero evidence that he didn’t just scan that book himself, but because you guys are doing a moral panic, you can’t even think straight long enough to work that out. the level of brainless toxicity that has been on display from supposedly intellectual people this week is absolutely mind blowing. virtually zero of your facts are right — any of you, because almost none of them do anything but wreck your entire premise. it’s unbelievable. you’re doing nothing here but embarrassing yourselves and your entire profession.

Anonymous Coward says:

Re: Re: Re:

OK, asshole, you’ve just outed yourself as one of the complaining authors. Now either stop being an AC and identify yourself, ’cause we want to see what you’ve written that’s so great that you’ve got to freak out when you think you’re not seeing a profit.

Either that, or just sit the fuck down and shut the fuck up.

Anonymous Coward says:

Re: Re: Re:

… most vivid were definitely-not-publicly-available stuff like the most spoilery pages of a climax …

All that says to me is that someone has never heard of onlyfans.com, which almost immediately begat a fuck-ton of OnlyFans leak sites.

That same someone needs to grow up and learn how the internet works, warts and all.

This comment has been deemed insightful by the community.
Transient Thoughts says:

Re: Re: Re:3

My guy here doesn’t know about libraries lol. You can literally get any book for free and create a dataset with it to train models. Perfectly legal. The creation of knowledge from what boils down to textual analysis is an entirely moral endeavor. Suck it up buttercup, you’re wrong and your whiny arguments make you look foolish.

drew (profile) says:

Re: Re: Re: You keep using that word; I do not think it means what you think it means

With regard to the inclusion of the self-help book, you appear to be confusing ‘prose’ with ‘fiction’. Very little non-fiction is not written in prose…
Additionally, just because a piece is non-fiction doesn’t mean the author hasn’t considered a story arc and hasn’t worked just as hard on writing emotionally engaging content.
At least, not if they’re any good at it.

Anonymous Coward says:

Re:

I read this article on the whole thing. Diane Urban had a “most vivid page” that was the most spoilery page of the climax of the book

Revealing spoilers doesn’t violate copyright law.

And, if a tool like this is ruining your book it’s probably not a very good book. No one is using a tool like this to find out how a book ends.

This Mashable article also points out that Shaxpir has a monthly subscription option and parts of it were made with Prosecraft’s database.

So what?

Anonymous Coward says:

Re: Re:

And, if a tool like this is ruining your book it’s probably not a very good book.

There are a lot of good books whose twists and turns hinge on info or action that happens on a single page. It’s more an issue with the tool than the book, methinks.

And yeah, spoiling a book doesn’t violate copyright law. However, the way he talks a big game about caring about authors and the written word, but everything about the “analysis” that Prosecraft spits out, the non-prose books in there as well, and the way that he was itching to work with an LLM trained on a ton of books gives me the vibe that he actually doesn’t. Like he just tossed a bunch of books he found on sketchy download sites while trawling the web into his thing.

Anonymous Coward says:

Re: Re: Re:

However, the way he talks a big game about caring about authors and the written word, but everything about the “analysis” that Prosecraft spits out, the non-prose books in there as well, and the way that he was itching to work with an LLM trained on a ton of books gives me the vibe that he actually doesn’t. Like he just tossed a bunch of books he found on sketchy download sites while trawling the web into his thing.

It’s still not clear how that matters for anything?

Reading a book that you didn’t pay for isn’t against the law either. Creating a database of stats and your feelings about the book isn’t either. Nor is building a search engine. Or other tools to provide data.

So what is it that you think he did here that was wrong?

Anonymous Coward says:

Re: Re: Re:2

He says he “cares” about authors but trawled pirate sites to create a database of questionable functionality given what people have said about the flaws in its analysis, then he decided to monetize it with making some of it part of a subscription to his Shaxpir app. The dudes a liar and a hypocrite.

tim fitz says:

Re: Re: Re:

let me get this straight — none of the facts support your original read, so you’re just going to go on vibes and therefore conclude the guy is a lying sack of shit. lmao why would anyone ever want to read a book by you when you have the intellectual honesty of a trump fan and the emotional maturity of.. well, a trump fan? asking for serious.

blakestacey (profile) says:

Just because it is (probably, at the moment) legal doesn’t make it useful, or even on a path to becoming useful. That “most passive/most vivid” example certainly doesn’t indicate a whit of utility. Nothing in the former is grammatically passive; “Who stole the tarts?” is not more vivid than dozens of other passages in Alice.

The Cat only grinned when it saw Alice. It looked good–natured, she thought: still it had VERY long claws and a great many teeth, so she felt that it ought to be treated with respect.

This comment has been deemed insightful by the community.
Straddler says:

Re: Re:

There are two issues at play here.

1) The utility of this tool is questionable. It spits out numbers but what a writer can do with those numbers is a bit obscure. At best it seems a good way for a hack to bring their project in line with market standards. I don’t mean this in a derogatory way. Hack work is hard work, but it is also the most likely to be driven by deadlines and algorithmic feedback. Thinking of my own projects I don’t know what I would possibly do with the fact that some fiction of mine sits in the 30th percentile of -ly adverbs.

2) The project looks to be fair use and is not providing for the exploitation of creative work in the way that generative language models have been. The authors who are complaining about it on the grounds that it is “stealing” their work or that it needs authorization to do what it is doing are probably wrong and misunderstand that this is not a LLM or a prompt chewing bot. But LLMs have made creatives more broadly sensitive to the use of their work for any automated process. Their misunderstanding is unfortunate but a not surprising consequence of the techbro space heralding the end of “gatekeeping” for creative work (e.g., automating the creative process through bots)

Uriel-238 (profile) says:

A stupid idea from a stupid premise

I was noticing that Linken Park releases versions of its songs that separate the instrumental tracks and the a cappella voice tracks, I assumed inviting transformative use (some results of which I’d seen on YouTube), and it reminded me of the story about Micheal Jackson talking with Daryl Hall about borrowing the base line of I Can’t Go for That (No Can Do) for Billy Jean

I’d expect that authors and writers who are actually published would be fine if other writers studied their material with full intent to emulate or borrow certain styles if it was a human being doing it. But right now the notion they seem to fear is not that their book will be fed into an AI in order to create additional, transformative product, but to write what they’d create in their stead.

Generative AI is a long, long way from being able to write us a new Hemmingway novel, or to draw us a new Maurice Sendak book. And even content IP owners might only be able to slow the development of AI to where it’s ability to Sendak convincingly meets and exceeds the original material.

But I don’t think we want to withhold all our art like Prince did, rather I think creative people who depend on those incomes are afraid of losing them and being left to the elements.

We’re not afraid of AI, we’re afraid of capitalism. And the solution is not going to be in delaying AI, but confronting that exploitation of labor is going to leave more and more people unemployed and hungry until it reaches a crisis point.

Mamba (profile) says:

Gizmodo

Man, they sure have fallen a long way. Weren’t they the ones that tried the AI written content recently, that fucked up right out of the gate?

Unfortunately, the human writers aren’t any better. They regularly write opinion pieces with a complete lack of fundamental understanding. Their piece on the Canadian link tax was just as completely ignorant.

tim fitz says:

Re:

Yeah, when Peter Thiel murdered Gawker, Gizmodo and all the other Kinja verticals were sold off to Fusion. That was fine for awhile, but then Fusion sold it to one of those squeeze-it-til-it’s-dead private equity firms, who have been squeezing for the last few years. I’m afraid it won’t be long, now.

Sneeje (profile) says:

This is what I fear...

https://arstechnica.com/information-technology/2023/08/author-discovers-ai-generated-counterfeit-books-written-in-her-name-on-amazon/

A number of cartoonists have been struggling with this as well. Cartoons in their style are being created and monetized that they did not create.

Mike, I’m curious what you think of this. It seems like the counterargument would be, “well it’s fraud.” Which, ok, but what if they didn’t submit it under the authors name or the cartoonists name and just monetized it. I’m not convinced the market would sort it out.

Stephen J. Anderson says:

Re: Re: But is it really fair use?

Generative LLMs are about to get tested in court. I’m not so sure they’ll be found to be fair use.

Here’s a thought experiment: let’s say someone took an LM and trained it entirely and solely on one of my novels, and then told it to generate a series of novels. Sooner or later, it’s going to replicate passages from my novel in the ones it generates, because the patterns of my language use are all that it “knows”. Is that fair use? If a human did that and sold the output it would probably be found to be infringing.

If we assume that isn’t fair use, then what if they trained it solely on all my novels and did the same thing? Is that fair use?

Now generalise that to having it trained on all the novels ever written. Well, that’s where it gets fuzzy. Because, in a sense, now it’s just replicating what a lot of (all?) human authors do. The output of all human authors is affected by what they have read, consciously or not. But will AI get the same leeway as humans?

Uriel-238 (profile) says:

Re: Re: Re: AI's not there yet, and IP law is complicated.

Generative LLMs are about to get tested in court. I’m not so sure they’ll be found to be fair use.

I am quite unsure the courts are prepared or knowledgeable about copyright law enough to decide what is fair use or not. Techdirt teems with articles about stupid IP rulings made by state and federal courts.

If they trained [an AI] solely on all my novels and did the same thing? Is that fair use?

I think what matters to make it fair use, is that it’s transformative. So yes, if you have a large body of novels, and the AI is able to create one that uses your writing style but is sufficiently different, that would count as fair use.

The problem is, (at least as I understand it) generative AI isn’t making original work at the touch of a button. A user has to describe very specifically what they want, and run through a large number of iterations (like over a hundred) and then review them and cull out all the ones that are NSFL or fall to Sturgeon’s law, ultimately choosing the best example.

This is a bit of a chore for a single person working the AI to create an image. It would be way to tedious for a human being (or a crew of twenty) to do the same just to make a passable Shakespearean play or Arthur Conan Doyle Sherlock Holmes mystery.

Stephen J. Anderson says:

Re: Re: Re:2

The thing I keep coming back to is that copyright is somewhat irrational, and fair use is vague and capricious, and that fundamentally, copyright was created to service human interests. The courts — or the governments — may decide that fair use only applies to human-generated content. Or decisions may vary by jurisdiction. We just don’t know yet.

This comment has been deemed insightful by the community.
bakage says:

Re: Re: Re:

“let’s say someone took an LM and trained it entirely and solely on one of my novels, and then told it to generate a series of novels.”

It would be considered infringing or fair use depending on a case by case basis. Just because it used your work doesn’t mean it is automatically infringing. Like another author reading your work and analyzing how you write. His work will only be considered infringing if he took enough elements from your writings, otherwise it’s fair use.

This comment has been deemed insightful by the community.
Anonymous Coward says:

Re: Re: Re:

How much of the objection to generative AI is driven by the fear that it will allow someone else to do a better job at telling a story, or will increase the competition by allowing more people to write stories that attract an audience?

Anonymous Coward says:

Re: Re: Re:2

History suggests that the rich will use procedural content generators to automate the entertainmemt industries, replacing creatives.

Please don’t assume that your average human is creative enough to tell ONE compelling story, write one moving song or something that even mildly stimulates the brain.

Entertainmemt execs churn out the same boring bullshit, dumb down plots, recontextualize foreign films to the point a drooling 5 year old can understand it, and usually worse for a good reason: the average audience has no need for thw thought-provoking.

tim fitz says:

Re: Re: Re:3

The history of AI suggests that you’re, with all due respect, tripping sack to think that it will ever be capable of replacing creators. It will never replace drivers, who aren’t even doing art, never mind artists. Time to take a deep breath and play with ChatGPT until you realize that without you, the operator, it is literally nothing at all.

Anonymous Coward says:

Re: Re: Re:4

You’re correct wrt to the history of AI.

But as for the history of automation and the worker…

Are you familiar with the automation of manufacturing? Displaced a ton of jobs, it did.

There’s a massive potential for these procedural content generators to replace the creative. And taht’s why SAG-AFTRA are striking: Hollywood WANTS to repeate the “miracle* of automation in the entertainment business.

I am NOT against technology. I am typing this on a PC, using the Internet and know well enough about these advances. I know hat a procedural content generator needs a human operator to feed it “prompts” to generate content.

And yes, you will still need drivers, cleaners, construction workers, soldiers, ie, the “shitjobs” because there’s no cost-effective way to make a robot do labor-intensive jobs. And msot of those jobs tend to be shunted to foreign “workers” and undocumented immigrants”. Though it doesn’t mean those C-suites aren’t gonna try to cut down on the number of warm bodies the need to pay for.

Rocky says:

Re: Re: Re:5

Are you familiar with the automation of manufacturing? Displaced a ton of jobs, it did.

Displaced, which always happens when someone invents a new way to do things. Everything you use today is manufactured by a process that displaced “a ton of jobs”.

If you want to lament the displacement of jobs, whether it is artists or artisans, make sure that everything you use is something that is entirely bespoke produced by an artisan.

The argument that artists are a special case is also an implicit argument that any other type of job is of lesser value.

Anonymous Coward says:

Re: Re: Re:6

…No, I am NOT lamenting the displacement of jobs.

I am lamenting the fact that the C-suite thinks it’s a good idea to replace artists and creatives to manufacture content for the masses.

And to address the union question, a union is still voluntary and while I’d suggest, legally, that artists should form and join a union, the reality is that not everyone thinks it’s the best idea. Or are inclined to join unions.

Anonymous Coward says:

Re: Re: Re:7

I am lamenting the fact that the C-suite thinks it’s a good idea to replace artists and creatives to manufacture content for the masses.

Those displaced can form their own cooperatives or other organizations and compete head on for the attention of the masses. It is not as if a massive amount of capital is needed to create and distribute content using the Internet.

Anonymous Coward says:

Re: Re: Re:3

History suggests that the rich will use procedural content generators to automate the entertainmemt industries, replacing creatives.

And what stops those creatives forming co-operatives and informal organizations so that they can self publish on the Internet. It is not as if the need publishers these days to get their work in front of the public. If they can do a better job that the corporations they will gain an audience and support to continue creating. Indeed these days there are more creative people making a living by self publishing that there are employed or contracted to the studios, labels and publishers.

Mamba (profile) says:

Re: Re: Re:

Sorry, but there is fundamentally no way that training AI on copyrighted works isn’t fair use. Now, you might be able to find outputs that stray to close to someone’s work (or even verbatim reproduce it), but again I don’t find that such a strange situation. We’ve long had computers that could easily reproduce copyrighted works: and nobody has found that Photoshop violates copyright laws. It’s the person that publishes the work that’s responsible.

Personally, I think the vilification of AI by artists is a fools errand. It is a tool, and in the hands of a good artist, it will certainly be much more effective than anything I can do. So the artist that are avoiding it, are getting stale.

Anonymous Coward says:

Re:

Style is not nor should ever be protected under copyright. Facts (how a tree looks like, how drawings of trees look like, how animals look like, how drawings of animals look like and so on) are not copyrighteable either. The AI works entirely by learning style and facts, which is not a violation nor will ever be, and neither is it immoral.

This comment has been deemed insightful by the community.
Ken Analysing says:

As a data scientist my worst nightmare just got true

I am a data scientist who works in natural language processing (NLP) and AI. I’ve read the Gizmodo, Quartz and this article. I believe there are some things getting mixed up by people and that makes me sad and scared and even infuriated. Here is what gets mixed up and why it scares me:

There is a huge difference between text analytics aka text mining, NLP and LLMs. Let me try to explain the differences and please bear with me, I try to make the stuff I do daily understandable and English is not my first language.
Text mining has the dreaded mining in it’s name but it’s about text statistics.The mining stems, as far as I know, from the collection of the words within the text/s. This is where we count words calculate word frequency, compare with word lists and other texts and generate what we call sentiments on a sentence/ paragraph/ chapter or book level. We don’t have to use really fancy algorithms or computer models. Most of the time it’s RegEx aka regular expression and frequencies. We simply look for stuff like punctuation, spaces, capital letters to figure out where words, sentences, paragraphs and chapters start and end. Really basic stuff. No AI here. If you want to learn about the methods there is a great book called Tidy Text Mining
Since I haven’t been able to visit Prosecraft before the shutdown, I can’t be quite sure about it, but from what I read here and in the other articles this seems to be what Prosecraft did for the most part.

Then we get to NLP. That’s where the models/AI are. Text mining is nice when you have well written long texts in English and easier tasks like “which is the most common word / 3 word phrase?”. When you want to generate information like “What is the most common noun?” most people switch to language models. They are trained on texts and provide information like word types. Bigger ones like udpipe or spacy have the capacity to provide more details like names entities and better sentiment understanding. There are models that you can download and use on your computer. Some are bigger and will slow your machine down quite a bit, especially if you analyse a big dataset. But they still can run on any PC. Those models are trained on larger sets of texts usually gathered from the internet. Depending on the type of text they are trained on, they will perform better or worse on the texts you are working with. Most models perform well on well written text like news articles and literature. Especially comments in langages with complex grammar they perform quite poorly. This is where we have to retrain a model or build our own. Think for example we wanted to analyse a text in klingon 😉 But to train those simple NLP models we need labels for every word you want labelled in every sentencewithinyour training and test data. Sometimes you can use RegEx to define the entities you want to retrain. But it’s also common to manually label every entity for every sentence within your dataset. It’s tedious. We speak about hundreds or better thousands of sentences. Prosecraft might have used some kind of this magic, but probably they wouldn’t have used the text they analyzed for the training of their underlying models.
And finally we have the eerie LLMs. They are part of NLP. Those are huge models. They are trained on vast amounts of data. From what I know they don’t need labelled data. They learn everything from their training data. They can still perform the analytics of the aforementioned models. But I wonder if Prosecraft would use such a resource intense tool for simple statistics and analytics. It’s overkill. It would be like cracking a nut with a sledgehammer. It’s something I see people do, when they want to show off, are new to the field, want to experiment with the new technology or need to do it because marketing wants to be cool. And from what I have read, Benji Smith is a computer linguist and Prosecraftis a private project. He would probably know better.

So from this standpoint Prosecraft is analysing texts/books in form of statistical key data. They transformed text into numbers. Nothing more nothing less. They can give you an idea of a book, like arc of suspense, names of main figures, theme, mood… But not generate a new book.

Now for the part that scares me:
Like I said, my background is in data science. I personally would love to do something like Prosecraft. I love data science and sharing my passion and knowledge in a relatable way would make me very happy. But the way Prosecraft and Benji Smith are treated is exactly why I haven’t dared to ever start my personal project. It’s even worse for me since I live in Europe and our copyright law is … you know, you are on Techdirt…
It scares me and infuriates me that there is this subliminal assumption that every one that does language processing is also using the analysed material to make a LLM and write stuff with it to earn money. First of all, nobody knows if Benji Smith really uses the books he got from the authors that didn’t provide him their work for anything but the analysis. Second of all, have you ever read a text from an LLM? They are glitchy and meh. Of course in the future there will be better LLMs that generate better text. But maybe in the future the same authors and people that now fear LLMs will recognize that it can assist them in their work. There is even a chance that they are already using LLMs like for grammar and spelling or translation or like Google…
And the worst part is that this situation makes me feel hopeless. Because if we as society care so much about copyright that we stifle every one that wants to build something from a collection of works of others, how will we be able to learn? I have a baby. He is now learning by putting two things together. That’s how we learn. That’s how we evolve. Humans always did this to master something. Artists copied their masters work. Fans copy their idols. And some will get better than the original. But most are not. And so most will not be able to take much money from the original artist. I do hope that we can as a society evolve to honor the original artist by providing quotations and credentials. So that people can find the original. This way I believe the original artist gets their fair share of resources  (money, reach, whatever). And I hope that this way we stop stifling our way of learning. So that there is a chance my baby will be able to find what he likes by copying it and that maybe he can create something new or to show the world what he loves. Without fear or need to doubt if what he is doing will generate a shit storm and twist every intention he had in the publics eye. But from reading all the articles about Prosecraft and LLMs, my heart sinks and I fear we are heading into a future where our natural way of learning and sharing our love for something will cause a witch hunt.

Thanks for reading if you got to here. I needed to speak up.

Best wishes,
Ken

tim fitz says:

Re: thanks, Ken

I read it and I hear you. It must suck having a giant community of artists show up on the guy next door’s stoop with torches and pitchforks, calling everything “AI” and hollering about some shit that just isn’t actually happening. I spoke up about it on Twitter today and had a pretty awful time. But look, you’re not alone. We will figure out how to help people understand a) that they have no idea what’s going on right now and b) what’s going on right now.

I hope you have a great night!

Tim

Anonymous Coward says:

Re: Re:

Took a look at Twitter, and boy was that ugly…

  1. Make ill informed assumptions about the tool and jump to conclusions.
  2. Lash out in outrage and urge other technically illiterate people furious, taking your uninformed assumptions as established facts.
  3. Pretend you are somehow the good guy, and rant furiously when people challenge your misconceptions about things like “statistics”, “data” and “AI”.
This comment has been deemed insightful by the community.
Anonymous Coward says:

Re:

Ken,

as much as I sympathize with your position, unfortunately, we ARE living in a world where

a) corporations are trying to replace even the creatives with machines that they think can do the job better, because look at all the good things it did in manufacturing when automation became feasible and scalable

b) our economies worship “line goes up” to the point where ethics are cast aside

c) anyone who tries to paint technology in a positive light will be viewed as a corrupt late-stage capitalist who wants to kill their livelihoods and force them to get a shitjob that… is likely to not exist or be, more grimly, dying in a fucking ditch in RatBumFuckistan because the only choice left is to join the Armed Forces.

I have tried to read up and inform myself on the scams these late-stage capitalists are running and a lot of it runs counter to the research data scientists like you are doing.

While I do hope we, as a global society, will reach the stage where we honor the creative’s right to associate, create and learn as well as everyone else’s, we don’t live in that ideal world.

We live in a shitty reality where late-stage capitalists are trying to replace the jobs of creatives, backed by the successful replacement of the manufacturing workforce with machines, and their captured politicians who won’t even bother to listen to the people who voted them in power.

What we’re really seeing is a massive pushback from a lot of not-Boomers who realize that their future, is, to out it nicely, gone.

tim fitz says:

Re: Re:

But see, that’s just the thing — that’s exactly what is so heartbreaking about this context collapse. Because the conclusion that large models can actually do any of that is completely conjectural and, IMO, very, very dubious, no matter how big a corpus one of them has. They fundamentally are not creative and fundamentally cannot replace creatives. Not even commercial artists, which I had thought were the most at risk, until I tried to do extremely simple shit with it and realized that even at its best its output is only useful for the novelty value of it having been thrown together with math.

So all of this panic is a) for nothing and b) actually causing a ton of upheaval, as executives watch the frothing mob and conclude that AI must actually be a threat to those down below. The one thing the studios and the striking writers and actors both have in common may be that their leaders both wrongly believe AI can fundamentally shift the balance of power in the arts.

Anonymous Coward says:

Re: Re: Re:

The curremt crop of procedural content generators are pretty bad at even procedurally generating content that seems like human-made content, yes.

But remember this. Automation managed to drastically reduce manufacturing costs by drastically reducing the number of workers needed to go to a factory. The fucking C-suites and their peers want to replicate what happened to manufacturing automation. Unless it’s cheaper to exploit foreign “workers” to get the same short term line goes up fuckshittery that is our current economic climate.

tim fitz says:

Re: Re: Re:2

but automated manufacturing is really not even close to comparable to automated ideation. there is one best way to make a widget. your ai is honing towards that perfect way and will only get closer over time. there is no perfect way to write a novel. your AI will flail around trying to find it forever. it won’t work, sorry. just like with cars. no matter how much intuitive sense it makes that a computer should be able to do it, a computer cannot do it and never will. maybe if every single car is networked and all the intuition can be sucked out of the process, but that’s not soon and it’s not the same approach.

Anonymous Coward says:

Re: Re: Re:3

No, but try convincing the C-suite fuckers that.

They don’t care and they want to make that fucking line go up.

So unless you’re willing to go to the most extreme of measures to ensure that the C-suites do not get their way, get fucking educated on why they want to do this.

It ain’t the tech, sonny. This Prosecraft thing doesn’t impress me in the slightest and reeks of someone wanting to count words to get an analysis, like a bloody high-school writing class. And even then, comparing the software to a high-school writing class is a massive insult to the tracher who actually ahd to come up with the metrics, teaching plan and even marking schemes for said class.

This comment has been flagged by the community. Click here to show it.

This comment has been flagged by the community. Click here to show it.

This comment has been deemed insightful by the community.
Bloof (profile) says:

Re: Re:

Tell that to the talented VFX artists looking at unemployment as corporations like Disney try to maximise profit be replacing them with noticeably worse AI technology, the aspiring writers getting their works onto Amazon then drowned in a sea of Ai generated works pumped out by grifters. Tell that to the writers who gain traction then find AI slop dumped online using their names.

Cream rising to the top is bullsh*t. It can’t happen if it’s pumped out into a whirlpool of sewage.

This comment has been deemed insightful by the community.
Anonymous Coward says:

Re: Re:

Cream rising to the top?

You mean crap rising to the top.

While the literary and artistic greats were indeed great, remember that they were supported by the rich and in power, usually both.

And there’s no shortage of “artists”, “writers” and whatnot who are more than willing to sing their “praises” to “rise to the top”.

For every Hitchcock, every Francis Ford Coppola, every Michalengo, there’s always at least one Leni Riefenstahl, at least one Elon Musk, and definitely a LEGION of willing assholes more than HAPPY to peddle propaganda, hate speech, and much, much worse.

Talent? Oh, no, the talented and honest folk don’t tend to rise to the top.

I hope you like being in an unmarked mass grave because once the rest of us are dead, you’re next. You can’t please your masters all the time.

blergh says:

Useless

It doesn’t matter whether you agree or disagree with the authors. What he did with the IP wasn’t legal and was always going to get taken down once publishing houses’ legal teams got wind of the site. It’s a dumb thing to argue about. Should I be able to take a painting out of a museum and do stuff with it and then put it back? Doesn’t matter, because you can’t do that. Seems like wasted effort to argue. But I guess this is the internet lmao what am I saying and here I am arguing! I am part of the problem

This comment has been deemed insightful by the community.
Mamba (profile) says:

Re:

That is absolutely untrue.

From The Verge:

“Considering the onus placed on these factors, Gervais says “it is much more likely than not” that training systems on copyrighted data will be covered by fair use. ”

https://www.theverge.com/23444685/generative-ai-copyright-infringement-legal-fair-use-training-data

From Foundation Models and Fair Use, Peter Henderson, et al.

“In the United States and several other countries, copyrighted content may be used to build foundation
models without incurring liability due to the fair use doctrine. ”

From Foley and Lardner, LLP

“Training AI is fair use under U.S. copyright law.”

and

“That said, the arguments in favor of considering training AI as fair use are strong, and it is likely that courts will continue to find that training AI is fair use in many cases.”

https://viewpoints.foley.com/post/102ih41/is-training-ai-fair-use

This comment has been deemed insightful by the community.
Strawb (profile) says:

Re:

Should I be able to take a painting out of a museum and do stuff with it and then put it back?

That’s a nonsensical analogy.

A more apt one in this case would be to find a picture of the painting online, fuck around with/modify/change it and then uploading that online. You know, like people do on the internet completely legally all the fucking time.

If the features of this tool were done manually, it’d pretty clearly be fair use. So why would it not be just because an AI is involved?

This comment has been deemed insightful by the community.
Anonymous Coward says:

Kneejerk

If this type of analysis were to be made by a procedural program (not an LLM), I’m sure the reaction wouldn’t have been so strong.

It’s understandable to dislike types like Sam Altman, but the reality is that LLMs have many legitimate uses that cannot remotely be argued to be copyright violations. These derived statistics (and the book snippets) cannot compete with the original work.

Anonymous Coward says:

Re:

If this type of analysis were to be made by a procedural program (not an LLM), I’m sure the reaction wouldn’t have been so strong.

Oh, I doubt that very much. It is blatantly clear that most of the outraged hacks have absolutely zero understanding, and zero interest in understanding, anything about the tool and what it did.

They smelled “AI” and went nuts.

Matthew Kressel says:

It's theft

“Available on the internet” does not mean “free to use.” The dude stole thousands of copyrighted works without payment or permission. That’s theft.

Also, counting the number of “to be” verbs isn’t a measure of “passive” voice, the number of adverbs is not a useful metric to decide the value of a work of art, and his “vividness” score is entirely subjective.

The guy was a huckster and a fraud, who stole other people’s work and tried to pass off a scam “analysis” that doesn’t offer writers anything useful other than an erroneous sense of their work’s value.

This comment has been deemed insightful by the community.
Anonymous Coward says:

Re:

The guy is/was trying to figure out how authors used the English language using publically available works, and to provide a tool to help authors improve their writing. You are foaming at the mouth as though he tried to steal your life’s work because AI was mentioned.

Joe Dirt says:

Re:

Dude
guy
Trans women are women you nazi. KYS

“vividness” score is entirely subjective.
As opposed to an entirely objective “vividness” score? Are you braindead?

doesn’t offer writers anything useful other than an erroneous sense of their work’s value.
So if it’s useful that makes it ok? Theft is okay as long as it’s useful? You have no idea what you’re talking about. You’re a self-contradicting terf nazi.

Uriel-238 (profile) says:

Re: Authors throwing a fit (throwing fits?)

I suspect it is not really hiding anything more sinister than concern for their own survival as artists in a capitalist industry.

If the only reason they are tolerated by their capitalist masters is the presumption of talent for unique content, then yes, they’re going to feel threatened by anything that might replicate it.

But what threatens their jobs is not the existence of AI that can replace them, but that capitalists believe they can replace creative persons using AI.

Even if it’s not true, it reveals that our publishers see art as product as ordinary as potato chips. And that is a blow to anyone who thinks they’re more special than the guitarist who has to flip burgers for a living,

PS: As The Menu illustrates, burger flipping is not an unlaudable skill either.

meh says:

a disagreement

i can see this getting all wrapped up and warped around “fair use” …but regrardless – any author or creator should always have the last word about how their work is utilized. It’s easy at this point in time to say – “it’s only for X now – cool your jets – relax….” however – these thing tend to morph themselves into completely different things, and once you’ve allowed it once – you cannot put that jeanie back into the bottle. I can fully understand his reluctance or objection to it.

Steve says:

Unfortunately, tech has made its bed

My subject line pretty much says it all. I’ve been in high tech since the early 1980s, when it was pretty commonplace for budding programmers to write and share big systems for the sheer joy of it.

Unfortunately, the last 20 years have been characterized by extreme exploitation of users (social media using users content), of employees (the entire gig economy), of customers (privacy, sharing, clawing back and charging for features, etc.), and of anyone without the power to fight back (Adobe using the Creative Cloud contents to quietly train their own IA).

It’s a shame that a “good” application got caught up in this, but it’s the natural consequence of an industry that has repeatedly and consistently violated its users’ trust.

Maybe when we see the major tech companies acting responsibly and ethically, people will stop getting hysterical. But at this point, assuming that a programmer has your worst interests at heart (or at best, doesn’t care about your interests one way or the other) is the most rational, safe assumption you can make.

LostInLoDOS (profile) says:

Hey Mike, some insight…

The fear over ai is real. Deep. And ingrained in the psyche of every 1st and 2nd world trader. From the killer robots of the early 1900s to the HAL to lawnmower man to tron and Terminator
It’s not wrong.
Sentience is nothing more than self awareness and eventually software and hardware will reach a point where it becomes true. It’s not a possibility or probability it’s a fact.
The same methods that turned us from cells to cognitive primates will turn hello world into a being of its own eventually
We are close. But decades away.
We often ignore how many “close” moments have happened everauest purged a neighbouring server rack to expand (thus saving itself, per se).
Watson was purged twice for expanding beyond its programming.

No rational being would say code can’t become self aware.
The question is how we will react and how it will react to our reaction.

Wise people won’t pull the plug, but welcome our new life to our existence

tim fitz says:

Re:

I love your imagination, but I just want to underscore as a point of fact that many people in fact think what you just said is not at all true. It’s nowhere near something all “rational beings” believe. There are fundamental differences between what’s happening here and how humans create, and the assumption that all this needs is to get a little better before it makes artists obsolete is not one that is rooted in rationality but rather in storytelling and imagination. It’s fine to have whatever headcanon you want, and hey, maybe you’re right, but I also just want you to be aware that many smart people do not share your expectations.

Add Your Comment

Your email address will not be published. Required fields are marked *

Have a Techdirt Account? Sign in now. Want one? Register here

Comment Options:

Make this the or (get credits or sign in to see balance) what's this?

What's this?

Techdirt community members with Techdirt Credits can spotlight a comment as either the "First Word" or "Last Word" on a particular comment thread. Credits can be purchased at the Techdirt Insider Shop »

Follow Techdirt

Techdirt Daily Newsletter

Ctrl-Alt-Speech

A weekly news podcast from
Mike Masnick & Ben Whitelaw

Subscribe now to Ctrl-Alt-Speech »
Techdirt Deals
Techdirt Insider Discord
The latest chatter on the Techdirt Insider Discord channel...
Loading...