Anonymous Coward

did he buy all those books legally or did he download a bunch of them?

mick

August 8, 2023 at 3:25 pm

Re:

It doesn’t matter even a little bit how he got the books.

jmcken

August 8, 2023 at 3:36 pm

Re:

Think of it this way: If someone pirates a movie, that’s unlawful. But if that person then writes a review of that movie for their blog, that review is perfectly legal, regardless of how they obtained their copy of the movie.

Prosecraft was basically a fancy “review” of books with extra bells and whistles. Even if he’d pirated the books, it would have no bearing on Prosecraft itself.

jmcken

August 8, 2023 at 3:41 pm

Re:

Think of it this way: If someone pirates a movie, that’s unlawful. But if that person then reviews that movie for their blog or YouTube channel or what have you, that review in itself is perfectly legal, regardless of how they obtained their copy of the movie.

Prosecraft was basically a fancy “review” of books with extra bells and whistles. Even if he’d pirated the books, he might get in trouble for that if found out, but it would have no bearing on Prosecraft itself.

Anonymous Coward

August 8, 2023 at 5:54 pm

Re:

Yes.

And also, the public domain is also a thing if you really wanted to argue.

This comment has been flagged by the community. Click here to show it.

Double A

August 8, 2023 at 11:22 pm

Re: Get fucked, idiots

The very premise of this analysis is an utterly idiotic attempt to dilute the creative process into a paint-by-numbers mindset of merely satisfying arbitrary metrics that have little to nothing to do with the actual quality of a literary work.

Fuck yourselves to death, please.

Anonymous Coward

August 9, 2023 at 1:21 am

Re: Re:

The creative process is 10% ideas and 90% sweat and effective use of the tools. Most, if not all budding authors need help in learn and refine their use of language, and that is a tool that helps them do that, building on the most basic of advice to budding authors,look at how successful authors expressed things.

You would not have learnt effective use of the English language if you had not read and learnt effective expression from lots of books etc. Shaxpir is a tool that extracts a lot of what people learn by reading widely, and can help someone develop their writing skills.

You absolute imbecile

August 9, 2023 at 4:15 am

Re: Re:

WHO has claimed the intent to replace traditional analysis, you absolutely fucking imbecile? You cave dwelling troglodyte, you feeble minded lobotomite, you brain damaged moron, you drooling senile. This is LITERALLY sentiment analysis run on a piece of text to get interesting statistics out of curiosity, as it has been made clear in this very same article you absolute illiterate brainlet.

Chops

August 9, 2023 at 5:22 am

Re: Re: Hope you get hit by a bus soon

More elitist crap from the book police telling people they’re reading books wrong. Go back to your pseudo-intellectual circle-jerk where you can all congratulate each other about how smart you are and how much better you are than people reading how they like.

nasch (profile)

August 9, 2023 at 8:13 am

Re: Re:

The very premise of this analysis is an utterly idiotic attempt to dilute the creative process into a paint-by-numbers mindset

If that were the problem, I doubt authors would have demanded he take it down.

Anonymous Coward

August 8, 2023 at 2:52 pm

How many authors actually make a living from their writing compared to how many people author stories and publish for free? It seems to me that those shouting the loudest are the very few who won the lottery and found a publisher, and they want to stop a younger hungrier author replacing them in sweep stakes for success.

Anonymous Coward

August 9, 2023 at 8:51 am

Re:

Wait, so now writers who have publishers are supposed to be fine with tech bros crawling the web for illegally pirated books to use for their silly little projects because…checks notes…some writers aren’t published?

Careful, my dude. Your envy is showing.

Library Goose

August 9, 2023 at 9:44 am

Re:

Your argument that published print writers are complaining winners ignore that there are lots of self-published authors who don’t want their work scraped. Even fan fiction author who write for free don’t want their work fed into AI/LLM datasets.

Anonymous Coward

August 10, 2023 at 10:34 am

Re: Re:

Ironic, since their fanart is illegal.

Anonymous Coward

August 10, 2023 at 5:23 pm

Re: Re:

I want my work fed into AI, but if I didn’t, I now wouldn’t put it online.

It’s clearly transformative fair use and not infringement, but it is exploitative. I doubt content creators will be so generous going forward.

Rowenna

August 9, 2023 at 12:02 pm

Re:

Oh hon, no–the tool was crap. It wasn’t helping anyone learn to write, at least not any better than they could have learned on their own. It applied broad concept like “vividness” arbitrarily and poorly. Basic definitions (see “passive voice” for example) were not even correct. The fact that he was peddling a defective and potentially harmful product to newbie writers ticked me off, actually, as a published writer. There are enough ways for “young hungry authors” to get scammed already.

glenn

August 8, 2023 at 3:03 pm

“Well, I don’t know what it is, but I know I hate it.”

brought to you by the author of “How Dare You Read My Books!”

This comment has been flagged by the community. Click here to show it.

Anonymous Coward

August 8, 2023 at 3:04 pm

The “analysis” it did was quite bad, so no big loss regardless. Things like identifying horror scenes as happy based on the words used, etc.

Anonymous Coward

August 8, 2023 at 3:55 pm

Re:

Perhaps a more in-depth review of precisely what analysis was being performed might be in order. You seem to think that this program was intended to have human-level recognition of content, and thus are attempting to attribute to the analysis abilities that were never claimed.

Anonymous Coward

August 8, 2023 at 4:05 pm

Re: Re:

Everything about it was that they wanted it to be a tool for other humans to use. That would require human level analysis that looks like it just wasn’t there.

Rocky

August 8, 2023 at 5:47 pm

Re: Re: Re:

Just like how a spell-checker needs human level analysis? Or any other tool for that matter.

James Burkhardt (profile)

August 9, 2023 at 7:48 am

Re: Re: Re:

Right, but its not there to replace humans. Its providing data a human can analyze. That’s the issue everyone has with AI bros – the assumption that the tool should or needs to replace humanity. This tool was never intended to perform a human analysis. It is intended to provide raw data, and let the human do the human analysis part.

Dan Someone

August 9, 2023 at 9:34 am

Re: Re: Re:² Raw data to what end?

So what is the significance – let alone the importance – of this “raw data”? What are authors supposed to take away from these reports? “Oh, my work is in the 43rd percentile for use of adverbs.” So what?

The issue I have with this is not that it’s trying to replace humanity, but that it is trying to solve a problem that doesn’t exist. Quantifying these (or any other) aspects of writing and comparing them to those aspects over a large corpus of other works doesn’t tell anybody anything meaningful. Benji says:

“Somewhere out there on the internet, I thought to myself, there was a new author writing a horror or romance or fantasy novel, struggling for guidance about how long to write their stories, how to write more vivid prose, and how much ‘passive voice’ was too much or too little.”

But all those prescriptive concepts – “how long,” “too much,” “too little” – are (a) entirely subjective and (b) nonsensical. So all the tool provides is a quantitative analysis that serves no actual purpose. (And new authors eventually find their own voice; they don’t need a quant analysis to tell the what their voice “should” sound like.)

Also, to the extent that it pushes authors to modify their writing to be more like the rest of what’s out there, it creates a risk of homogenizing literature – a problem similar to that of training generative AI on an internet that is increasingly composed of AI-generated content.

Maybe the tool is legal under copyright law. But regardless, in a world where AI-generated content is being pushed hard as an alternative to human creativity, it is certainly understandable why authors reacted with alarm and anger, and it was so unnecessary.

neitherbeckusnaurbacchus (profile)

August 8, 2023 at 3:09 pm

Reminds me of Searchtodon

This is very reminiscent of the outrage that was directed at Searchtodon, where someone provided a genuinely useful service, and then it was a target of people accusing it of being “created by an out of touch tech bro”, and saying that it causes harm through a mechanism that it doesn’t actually implement.

Anonymous Coward

August 9, 2023 at 8:52 am

Re:

Using “techbro” sure has become a red flag these days for being a horrible and stupid person.

Anonymous Coward

August 10, 2023 at 10:37 am

Re: Re:

Amen. Red flag for a slobbering idiot.

Anonymous Coward

August 8, 2023 at 3:14 pm

Commenters on the Gizmodo piece are pointing out how the analysis of the books, and also the books chosen to be in there, were suspect and makes it sound not all that useful. Like the meanings of words could be solved with math and misunderstanding what “passive voice” means. Also, why were self-help books from Faith G Harper in there, for instance? That sure ain’t any “prose”…

blakestacey (profile)

August 8, 2023 at 4:18 pm

Re:

Many humans who complain about the passive voice have neither a clue what linguists mean by that nor a coherent idea of what they’re complaining about. So, a program written to spot it will quite likely be misguided, and a model trained on a corpus of people kvetching about it will probably be completely incoherent.

nasch (profile)

August 9, 2023 at 8:16 am

Re:

That sure ain’t any “prose”…

Are you thinking of the word “fiction”? Because I doubt a self help book is written in a particular meter.

Anonymous Coward

August 8, 2023 at 3:45 pm

I read this article on the whole thing. Diane Urban had a “most vivid page” that was the most spoilery page of the climax of the book, something not really publicly available. The Quartz piece also points out that back in March, Benji was looking for help to fine tune/train an LLM.

This Mashable article also points out that Shaxpir has a monthly subscription option and parts of it were made with Prosecraft’s database.

Yeah, this does sound like a tech bro thing built off of work that wasn’t paid for the more I’m digging into it.

Anonymous Coward

August 8, 2023 at 4:01 pm

Re:

This Mashable article also points out that Shaxpir has a monthly subscription option and parts of it were made with Prosecraft’s database.

Yeah, this does sound like a tech bro thing built off of work that wasn’t paid for the more I’m digging into it.

Perhaps you can expand on that last bit. Shaxpir built off of Prosecraft’s database? As in “one product the author built using the database of another product the author built”?

Or are you thinking about all the books scanned into the database? The post here kinda describes that the database itself is almost certainly within Fair Use limits.

(Or, as you were still looking into it, are you now looking for an edit button? 🙂 )

Anonymous Coward

August 8, 2023 at 4:54 pm

Re: Re:

The way that Prosecraft had decidedly non-prose stuff like a Faith G Harper’s self-help book, some pages featured as the most vivid were definitely-not-publicly-available stuff like the most spoilery pages of a climax while Benji says he supposedly cares about authors, Benji was looking for help with another LLM, Shaxpir being a service with a paid subscription tier and Shaxpir using data from Prosecraft… it all just reeks of the tech dude getting caught by authors and having to issue some fake apologia about it, rather than Prosecraft and Shaxpir being some legitimately misunderstood wonder-tools.

Anonymous Coward

August 8, 2023 at 6:37 pm

Re: Re: Re:

How does anyone have anything not publicly available? Spoilers don’t mean squat

Anonymous Coward

August 8, 2023 at 7:00 pm

Re: Re: Re:²

From the article, regarding the page from Diana Urban’s book:

“The ‘most vivid page’ excerpt from my book was literally the most spoilery moment of the climax, not published publicly, not scrapable…”

Where did Benji get it? Like I said, it reeks of him trawling download sites for books and not actually caring about authors, the opposite of how he says he cares about them.

tim fitz

August 8, 2023 at 9:40 pm

Re: Re: Re:³

“Not published publicly”? So the book isn’t in print?

Anonymous Coward

August 9, 2023 at 5:31 am

Re: Re: Re:⁴

It’s in print. But looking at the facts, it’s clear he was lifting books from places that nobody who says they “care” about authors would actually use, and then monetizing the “analysis” from Prosecraft by tying it to a subscription in the Shaxpir app. Benji is a liar and a hypocrite.

Samuel Abram (profile)

August 9, 2023 at 5:36 am

Re: Re: Re:⁵

You call him “a liar and a hypocrite” whilst he shut down his own program that all these authors were complaining about?

Anonymous Coward

August 9, 2023 at 5:44 am

Re: Re: Re:⁶

We’re being a bit too charitable to someone who put publishers before authors in his apology.

But it’s entirely possible that it was an oversight. I’m still gonna assume the worst of the individual, though, just for that.

Anonymous Coward

August 9, 2023 at 6:38 am

Re: Re: Re:⁶

He didn’t shut it down because he cared. He shut it down because he got caught and he wasn’t a multibillion dollar corporation that could just give people the finger.

Anonymous Coward

August 9, 2023 at 4:43 pm

Re: Re: Re:⁷

Or, he shut it down to make the howling mob at his door go away.

tim fitz

August 11, 2023 at 10:27 pm

Re: Re: Re:⁵

looking at the facts, there is zero evidence that he didn’t just scan that book himself, but because you guys are doing a moral panic, you can’t even think straight long enough to work that out. the level of brainless toxicity that has been on display from supposedly intellectual people this week is absolutely mind blowing. virtually zero of your facts are right — any of you, because almost none of them do anything but wreck your entire premise. it’s unbelievable. you’re doing nothing here but embarrassing yourselves and your entire profession.

Anonymous Coward

August 8, 2023 at 9:09 pm

Re: Re: Re:

OK, asshole, you’ve just outed yourself as one of the complaining authors. Now either stop being an AC and identify yourself, ’cause we want to see what you’ve written that’s so great that you’ve got to freak out when you think you’re not seeing a profit.

Either that, or just sit the fuck down and shut the fuck up.

Anonymous Coward

August 8, 2023 at 9:15 pm

Re: Re: Re:

… most vivid were definitely-not-publicly-available stuff like the most spoilery pages of a climax …

All that says to me is that someone has never heard of onlyfans.com, which almost immediately begat a fuck-ton of OnlyFans leak sites.

That same someone needs to grow up and learn how the internet works, warts and all.

Anonymous Coward

August 9, 2023 at 5:22 am

Re: Re: Re:²

Benji saying he “cares” about authors but then going off scraping pirated copies of books, for Prosecraft, points to the type of guy Benji is.

Transient Thoughts

August 9, 2023 at 10:31 am

Re: Re: Re:³

My guy here doesn’t know about libraries lol. You can literally get any book for free and create a dataset with it to train models. Perfectly legal. The creation of knowledge from what boils down to textual analysis is an entirely moral endeavor. Suck it up buttercup, you’re wrong and your whiny arguments make you look foolish.

Anonymous Coward

August 9, 2023 at 10:43 am

Re: Re: Re:⁴

Nothing about this model and thenway he talks about getting all the text of the books from the Internet sounds like he used a library.

Mamba (profile)

August 9, 2023 at 7:48 pm

Re: Re: Re:⁵

Libraries are now on the internet.

drew (profile)

August 9, 2023 at 4:27 am

Re: Re: Re: You keep using that word; I do not think it means what you think it means

With regard to the inclusion of the self-help book, you appear to be confusing ‘prose’ with ‘fiction’. Very little non-fiction is not written in prose…
Additionally, just because a piece is non-fiction doesn’t mean the author hasn’t considered a story arc and hasn’t worked just as hard on writing emotionally engaging content.
At least, not if they’re any good at it.

Anonymous Coward

August 8, 2023 at 4:53 pm

Re:

I read this article on the whole thing. Diane Urban had a “most vivid page” that was the most spoilery page of the climax of the book

Revealing spoilers doesn’t violate copyright law.

And, if a tool like this is ruining your book it’s probably not a very good book. No one is using a tool like this to find out how a book ends.

This Mashable article also points out that Shaxpir has a monthly subscription option and parts of it were made with Prosecraft’s database.

So what?

Anonymous Coward

August 8, 2023 at 5:30 pm

Re: Re:

And, if a tool like this is ruining your book it’s probably not a very good book.

There are a lot of good books whose twists and turns hinge on info or action that happens on a single page. It’s more an issue with the tool than the book, methinks.

And yeah, spoiling a book doesn’t violate copyright law. However, the way he talks a big game about caring about authors and the written word, but everything about the “analysis” that Prosecraft spits out, the non-prose books in there as well, and the way that he was itching to work with an LLM trained on a ton of books gives me the vibe that he actually doesn’t. Like he just tossed a bunch of books he found on sketchy download sites while trawling the web into his thing.

Anonymous Coward

August 8, 2023 at 9:12 pm

Re: Re: Re:

However, the way he talks a big game about caring about authors and the written word, but everything about the “analysis” that Prosecraft spits out, the non-prose books in there as well, and the way that he was itching to work with an LLM trained on a ton of books gives me the vibe that he actually doesn’t. Like he just tossed a bunch of books he found on sketchy download sites while trawling the web into his thing.

It’s still not clear how that matters for anything?

Reading a book that you didn’t pay for isn’t against the law either. Creating a database of stats and your feelings about the book isn’t either. Nor is building a search engine. Or other tools to provide data.

So what is it that you think he did here that was wrong?

Anonymous Coward

August 9, 2023 at 5:27 am

Re: Re: Re:²

He says he “cares” about authors but trawled pirate sites to create a database of questionable functionality given what people have said about the flaws in its analysis, then he decided to monetize it with making some of it part of a subscription to his Shaxpir app. The dudes a liar and a hypocrite.

Anonymous Coward

August 9, 2023 at 5:46 am

Re: Re: Re:³

Whats your beef with tools that help people become successful authors? It wouldn’t be an increase in competition for readers eyeballs would it?

bluegrassgeek (profile)

August 9, 2023 at 1:22 pm

Re: Re: Re:⁴ "Helping" is carrying a lot of weight here

Nothing about this tool “helped” anyone be a more successful author.

Anonymous Coward

August 9, 2023 at 1:56 pm

Re: Re: Re:⁵

Are you an author? Did you try the tool, Wy does that set you so against the tool?

Anonymous Coward

August 9, 2023 at 2:08 pm

Re: Re: Re:⁵

That you don’t find it useful, does not mean that others didn’t find it useful.

Or, that with time, it would have become more useful.

If screaming authors hadn’t killed it.

Anonymous Coward

August 10, 2023 at 10:41 am

Re: Re: Re:⁵

So no financial threat, then.

Anonymous Coward

August 10, 2023 at 2:50 pm

Re: Re: Re:

People have been trying to explain this multiple times, and somehow this is not being understood by some people: “non-prose” is poetic verse. Both fiction AND non-fiction, including memoirs, are written in prose. I am writing in prose right now. YOU are writing in prose.

tim fitz

August 11, 2023 at 10:30 pm

Re: Re: Re:

let me get this straight — none of the facts support your original read, so you’re just going to go on vibes and therefore conclude the guy is a lying sack of shit. lmao why would anyone ever want to read a book by you when you have the intellectual honesty of a trump fan and the emotional maturity of.. well, a trump fan? asking for serious.

Anonymous Coward

August 8, 2023 at 6:35 pm

Re:

From your report, it sounds like nothing of the sort.

blakestacey (profile)

August 8, 2023 at 4:10 pm

Just because it is (probably, at the moment) legal doesn’t make it useful, or even on a path to becoming useful. That “most passive/most vivid” example certainly doesn’t indicate a whit of utility. Nothing in the former is grammatically passive; “Who stole the tarts?” is not more vivid than dozens of other passages in Alice.

The Cat only grinned when it saw Alice. It looked good–natured, she thought: still it had VERY long claws and a great many teeth, so she felt that it ought to be treated with respect.

Anonymous Coward

August 8, 2023 at 5:36 pm

Re:

Things should be shut down because you personally don’t have use for them? That’s nonsense.

Straddler

August 9, 2023 at 11:28 am

Re: Re:

There are two issues at play here.

1) The utility of this tool is questionable. It spits out numbers but what a writer can do with those numbers is a bit obscure. At best it seems a good way for a hack to bring their project in line with market standards. I don’t mean this in a derogatory way. Hack work is hard work, but it is also the most likely to be driven by deadlines and algorithmic feedback. Thinking of my own projects I don’t know what I would possibly do with the fact that some fiction of mine sits in the 30th percentile of -ly adverbs.

2) The project looks to be fair use and is not providing for the exploitation of creative work in the way that generative language models have been. The authors who are complaining about it on the grounds that it is “stealing” their work or that it needs authorization to do what it is doing are probably wrong and misunderstand that this is not a LLM or a prompt chewing bot. But LLMs have made creatives more broadly sensitive to the use of their work for any automated process. Their misunderstanding is unfortunate but a not surprising consequence of the techbro space heralding the end of “gatekeeping” for creative work (e.g., automating the creative process through bots)

JMT (profile)

August 8, 2023 at 8:24 pm

Re:

Imagine if authors provided feedback on what they thought it did well and what it did poorly instead of arrogantly freaking out (“How DARE you…”) and shitting on the whole thing.

Anonymous Coward

August 9, 2023 at 4:25 am

Re:

It will be legal FOREVER my man. Trying to outlaw scraping will result in the death of search engines, you imbeciles are going against Google (who has strong legal precedents for the legality of scraping) with nothing but hurt feelings as arguments.

Uriel-238 (profile)

August 8, 2023 at 5:29 pm

A stupid idea from a stupid premise

I was noticing that Linken Park releases versions of its songs that separate the instrumental tracks and the a cappella voice tracks, I assumed inviting transformative use (some results of which I’d seen on YouTube), and it reminded me of the story about Micheal Jackson talking with Daryl Hall about borrowing the base line of I Can’t Go for That (No Can Do) for Billy Jean

I’d expect that authors and writers who are actually published would be fine if other writers studied their material with full intent to emulate or borrow certain styles if it was a human being doing it. But right now the notion they seem to fear is not that their book will be fed into an AI in order to create additional, transformative product, but to write what they’d create in their stead.

Generative AI is a long, long way from being able to write us a new Hemmingway novel, or to draw us a new Maurice Sendak book. And even content IP owners might only be able to slow the development of AI to where it’s ability to Sendak convincingly meets and exceeds the original material.

But I don’t think we want to withhold all our art like Prince did, rather I think creative people who depend on those incomes are afraid of losing them and being left to the elements.

We’re not afraid of AI, we’re afraid of capitalism. And the solution is not going to be in delaying AI, but confronting that exploitation of labor is going to leave more and more people unemployed and hungry until it reaches a crisis point.

Mamba (profile)

August 8, 2023 at 6:06 pm

Gizmodo

Man, they sure have fallen a long way. Weren’t they the ones that tried the AI written content recently, that fucked up right out of the gate?

Unfortunately, the human writers aren’t any better. They regularly write opinion pieces with a complete lack of fundamental understanding. Their piece on the Canadian link tax was just as completely ignorant.

Mamba (profile)

August 8, 2023 at 6:08 pm

Re:

Also, I can’t see the comments, because Kinja sucks and I can’t read much on the phone. The article was painful enough, but the comments just refuse to load.

tim fitz

August 8, 2023 at 9:49 pm

Re:

Yeah, when Peter Thiel murdered Gawker, Gizmodo and all the other Kinja verticals were sold off to Fusion. That was fine for awhile, but then Fusion sold it to one of those squeeze-it-til-it’s-dead private equity firms, who have been squeezing for the last few years. I’m afraid it won’t be long, now.

Mamba (profile)

August 9, 2023 at 2:46 am

Re: Re:

I still think Denton and Daulerio share a considerable portion of the blame. They were bumbling fools through the whole case.

Anonymous Coward

August 11, 2023 at 9:05 am

Re:

Haven’t they always been the tabloid of tech?

Sneeje (profile)

August 8, 2023 at 6:13 pm

This is what I fear...

https://arstechnica.com/information-technology/2023/08/author-discovers-ai-generated-counterfeit-books-written-in-her-name-on-amazon/

A number of cartoonists have been struggling with this as well. Cartoons in their style are being created and monetized that they did not create.

Mike, I’m curious what you think of this. It seems like the counterargument would be, “well it’s fraud.” Which, ok, but what if they didn’t submit it under the authors name or the cartoonists name and just monetized it. I’m not convinced the market would sort it out.

Mamba (profile)

August 8, 2023 at 6:21 pm

Re:

What’s your solution? Outlaw AI? Change fair use?

Stephen J. Anderson

August 8, 2023 at 11:21 pm

Re: Re: But is it really fair use?

Generative LLMs are about to get tested in court. I’m not so sure they’ll be found to be fair use.

Here’s a thought experiment: let’s say someone took an LM and trained it entirely and solely on one of my novels, and then told it to generate a series of novels. Sooner or later, it’s going to replicate passages from my novel in the ones it generates, because the patterns of my language use are all that it “knows”. Is that fair use? If a human did that and sold the output it would probably be found to be infringing.

If we assume that isn’t fair use, then what if they trained it solely on all my novels and did the same thing? Is that fair use?

Now generalise that to having it trained on all the novels ever written. Well, that’s where it gets fuzzy. Because, in a sense, now it’s just replicating what a lot of (all?) human authors do. The output of all human authors is affected by what they have read, consciously or not. But will AI get the same leeway as humans?

Uriel-238 (profile)

August 9, 2023 at 12:14 am

Re: Re: Re: AI's not there yet, and IP law is complicated.

Generative LLMs are about to get tested in court. I’m not so sure they’ll be found to be fair use.

I am quite unsure the courts are prepared or knowledgeable about copyright law enough to decide what is fair use or not. Techdirt teems with articles about stupid IP rulings made by state and federal courts.

If they trained [an AI] solely on all my novels and did the same thing? Is that fair use?

I think what matters to make it fair use, is that it’s transformative. So yes, if you have a large body of novels, and the AI is able to create one that uses your writing style but is sufficiently different, that would count as fair use.

The problem is, (at least as I understand it) generative AI isn’t making original work at the touch of a button. A user has to describe very specifically what they want, and run through a large number of iterations (like over a hundred) and then review them and cull out all the ones that are NSFL or fall to Sturgeon’s law, ultimately choosing the best example.

This is a bit of a chore for a single person working the AI to create an image. It would be way to tedious for a human being (or a crew of twenty) to do the same just to make a passable Shakespearean play or Arthur Conan Doyle Sherlock Holmes mystery.

Stephen J. Anderson

August 9, 2023 at 12:34 am

Re: Re: Re:²

The thing I keep coming back to is that copyright is somewhat irrational, and fair use is vague and capricious, and that fundamentally, copyright was created to service human interests. The courts — or the governments — may decide that fair use only applies to human-generated content. Or decisions may vary by jurisdiction. We just don’t know yet.

Anonymous Coward

August 9, 2023 at 8:43 am

Re: Re: Re:³

Copyright exists to serve corporate interests, not humans. Not since the Disney laws. If you’re expecting anything but a pro-corporate ruling you’re a fool. Regulating AI is the only way.

Anonymous Coward

August 9, 2023 at 2:42 pm

Re: Re: Re:⁴

SirTapTap, is that you?

bakage

August 9, 2023 at 12:54 am

Re: Re: Re:

“let’s say someone took an LM and trained it entirely and solely on one of my novels, and then told it to generate a series of novels.”

It would be considered infringing or fair use depending on a case by case basis. Just because it used your work doesn’t mean it is automatically infringing. Like another author reading your work and analyzing how you write. His work will only be considered infringing if he took enough elements from your writings, otherwise it’s fair use.

Anonymous Coward

August 9, 2023 at 1:43 am

Re: Re: Re:

How much of the objection to generative AI is driven by the fear that it will allow someone else to do a better job at telling a story, or will increase the competition by allowing more people to write stories that attract an audience?

Anonymous Coward

August 9, 2023 at 8:26 pm

Re: Re: Re:²

History suggests that the rich will use procedural content generators to automate the entertainmemt industries, replacing creatives.

Please don’t assume that your average human is creative enough to tell ONE compelling story, write one moving song or something that even mildly stimulates the brain.

Entertainmemt execs churn out the same boring bullshit, dumb down plots, recontextualize foreign films to the point a drooling 5 year old can understand it, and usually worse for a good reason: the average audience has no need for thw thought-provoking.

tim fitz

August 11, 2023 at 10:34 pm

Re: Re: Re:³

The history of AI suggests that you’re, with all due respect, tripping sack to think that it will ever be capable of replacing creators. It will never replace drivers, who aren’t even doing art, never mind artists. Time to take a deep breath and play with ChatGPT until you realize that without you, the operator, it is literally nothing at all.

Anonymous Coward

August 12, 2023 at 1:52 am

Re: Re: Re:⁴

You’re correct wrt to the history of AI.

But as for the history of automation and the worker…

Are you familiar with the automation of manufacturing? Displaced a ton of jobs, it did.

There’s a massive potential for these procedural content generators to replace the creative. And taht’s why SAG-AFTRA are striking: Hollywood WANTS to repeate the “miracle* of automation in the entertainment business.

I am NOT against technology. I am typing this on a PC, using the Internet and know well enough about these advances. I know hat a procedural content generator needs a human operator to feed it “prompts” to generate content.

And yes, you will still need drivers, cleaners, construction workers, soldiers, ie, the “shitjobs” because there’s no cost-effective way to make a robot do labor-intensive jobs. And msot of those jobs tend to be shunted to foreign “workers” and undocumented immigrants”. Though it doesn’t mean those C-suites aren’t gonna try to cut down on the number of warm bodies the need to pay for.

Rocky

August 12, 2023 at 3:19 pm

Re: Re: Re:⁵

Are you familiar with the automation of manufacturing? Displaced a ton of jobs, it did.

Displaced, which always happens when someone invents a new way to do things. Everything you use today is manufactured by a process that displaced “a ton of jobs”.

If you want to lament the displacement of jobs, whether it is artists or artisans, make sure that everything you use is something that is entirely bespoke produced by an artisan.

The argument that artists are a special case is also an implicit argument that any other type of job is of lesser value.

Anonymous Coward

August 14, 2023 at 4:18 am

Re: Re: Re:⁶

…No, I am NOT lamenting the displacement of jobs.

I am lamenting the fact that the C-suite thinks it’s a good idea to replace artists and creatives to manufacture content for the masses.

And to address the union question, a union is still voluntary and while I’d suggest, legally, that artists should form and join a union, the reality is that not everyone thinks it’s the best idea. Or are inclined to join unions.

Anonymous Coward

August 14, 2023 at 5:00 am

Re: Re: Re:⁷

I am lamenting the fact that the C-suite thinks it’s a good idea to replace artists and creatives to manufacture content for the masses.

Those displaced can form their own cooperatives or other organizations and compete head on for the attention of the masses. It is not as if a massive amount of capital is needed to create and distribute content using the Internet.

observer

August 16, 2023 at 6:51 am

Re: Re: Re:⁷

the reality is that not everyone thinks it’s the best idea. Or are inclined to join unions.

That’s their lookout, though. If there’s a union and they decide not to join, that’s on them.

Anonymous Coward

August 12, 2023 at 8:30 am

Re: Re: Re:³

History suggests that the rich will use procedural content generators to automate the entertainmemt industries, replacing creatives.

And what stops those creatives forming co-operatives and informal organizations so that they can self publish on the Internet. It is not as if the need publishers these days to get their work in front of the public. If they can do a better job that the corporations they will gain an audience and support to continue creating. Indeed these days there are more creative people making a living by self publishing that there are employed or contracted to the studios, labels and publishers.

Mamba (profile)

August 9, 2023 at 3:00 am

Re: Re: Re:

Sorry, but there is fundamentally no way that training AI on copyrighted works isn’t fair use. Now, you might be able to find outputs that stray to close to someone’s work (or even verbatim reproduce it), but again I don’t find that such a strange situation. We’ve long had computers that could easily reproduce copyrighted works: and nobody has found that Photoshop violates copyright laws. It’s the person that publishes the work that’s responsible.

Personally, I think the vilification of AI by artists is a fools errand. It is a tool, and in the hands of a good artist, it will certainly be much more effective than anything I can do. So the artist that are avoiding it, are getting stale.

Anonymous Coward

August 9, 2023 at 2:16 am

Re:

Why do you suggest that a tool should be banned because a person used it for bad purposes? Would you blame Photoshop for enabling people to cut and paste art to create works in the style of?

Fakes and plagiarism have existed since humans started to create art.

Anonymous Coward

August 9, 2023 at 4:23 am

Re:

Style is not nor should ever be protected under copyright. Facts (how a tree looks like, how drawings of trees look like, how animals look like, how drawings of animals look like and so on) are not copyrighteable either. The AI works entirely by learning style and facts, which is not a violation nor will ever be, and neither is it immoral.

T

August 8, 2023 at 6:34 pm

Anyone got Mike's bsb and account number?

He is obviously fine with people ripping him off uf he endorses it with other people.

Anonymous Coward

August 8, 2023 at 6:59 pm

Re:

Well, his name has a clickable link where you can see just that, but if you’re too lazy to even click on the link…

Hey, if you needed a hint, that’s fine, but do your own homework.

JMT (profile)

August 8, 2023 at 8:21 pm

Re:

Please explain to the class how anyone was ripped off by this. Should be good for a laugh if nothing else.

Mike Masnick (profile)

August 8, 2023 at 9:17 pm

Re:

Anyone got Mike’s bsb and account number?

What is bsb? I have no idea what you’re talking about. A quick search online turns up Back Street Boys, which only has me more confused.

He is obviously fine with people ripping him off uf he endorses it with other people.

If by “ripping off” you mean letting someone else build useful tools in part by analyzing my content that I freely put online and have declared to be in the public domain, then, yes, I think that’s great. I would love for people to build more useful tools and services on top of my content. Please go for it, and let me know what you do with it.

Honestly, I’d love it if someone took the Techdirt corpus, and built useful tools with it, even trained an AI with it. That would be amazing.

Anonymous Coward

August 8, 2023 at 9:28 pm

Re: Re:

I think he means your social media accounts, chief of which is Bluesky. But the troll seems to not be able to even click on a damn link.

The link. In the byline. That has your name on it. In purple. That leads to an about page where you put down your social media details.

xanni (profile)

August 9, 2023 at 6:25 am

Re: Re: BSB is an Australian "Bank State Branch" number

It appears he’s asking for your banking details, not your social media information.

Samuel Abram (profile)

August 9, 2023 at 6:30 am

Re: Re: Re: In that case

What AI is doing–in any sense of the word–cannot be compared to the identity theft that it when someone takes your bank account number or credit card number. I’ve had it done to me, and I’m a chiptune artist, who makes music. So T’s comparison is absurd in the extreme.

Anonymous Coward

August 9, 2023 at 7:20 am

Re: Re: Re:²

What the actual fuck

I’ve had my credit card numbers “taken” from me AND a victim of identity theft.

Most cases of both are always human activities and not based on what the procedural content generators can do…

observer

August 10, 2023 at 2:35 am

Re: Re: Re:³

Though they’re welcome to try asking ChatGPT for anyone’s bank info and see how far they get.

Mike Masnick (profile)

August 9, 2023 at 2:12 pm

Re: Re: Re:

BSB is an Australian “Bank State Branch” number

Ahhh… so the kind of person who thinks that analyzing your content is the equivalent of stealing money. In other words, not a serious person.

Mamba (profile)

August 10, 2023 at 2:09 am

Re: Re:

Brown Sugar Bourbon. And if you get the 103 proof, it’s pretty much an express lane to hangovertown.

Anonymous Coward

August 9, 2023 at 12:02 am

Re:

Here you are making Koby look like a rocket scientist bro.

Anonymous Coward

August 8, 2023 at 6:41 pm

Just the extremely poor “arguments” of the hater crowd make me feel like this was something potentially worthwhile even if it never went anywhere. You know, like most authors and stories.

Ken Analysing

August 8, 2023 at 9:21 pm

As a data scientist my worst nightmare just got true

I am a data scientist who works in natural language processing (NLP) and AI. I’ve read the Gizmodo, Quartz and this article. I believe there are some things getting mixed up by people and that makes me sad and scared and even infuriated. Here is what gets mixed up and why it scares me:

There is a huge difference between text analytics aka text mining, NLP and LLMs. Let me try to explain the differences and please bear with me, I try to make the stuff I do daily understandable and English is not my first language.
Text mining has the dreaded mining in it’s name but it’s about text statistics.The mining stems, as far as I know, from the collection of the words within the text/s. This is where we count words calculate word frequency, compare with word lists and other texts and generate what we call sentiments on a sentence/ paragraph/ chapter or book level. We don’t have to use really fancy algorithms or computer models. Most of the time it’s RegEx aka regular expression and frequencies. We simply look for stuff like punctuation, spaces, capital letters to figure out where words, sentences, paragraphs and chapters start and end. Really basic stuff. No AI here. If you want to learn about the methods there is a great book called Tidy Text Mining
Since I haven’t been able to visit Prosecraft before the shutdown, I can’t be quite sure about it, but from what I read here and in the other articles this seems to be what Prosecraft did for the most part.

Then we get to NLP. That’s where the models/AI are. Text mining is nice when you have well written long texts in English and easier tasks like “which is the most common word / 3 word phrase?”. When you want to generate information like “What is the most common noun?” most people switch to language models. They are trained on texts and provide information like word types. Bigger ones like udpipe or spacy have the capacity to provide more details like names entities and better sentiment understanding. There are models that you can download and use on your computer. Some are bigger and will slow your machine down quite a bit, especially if you analyse a big dataset. But they still can run on any PC. Those models are trained on larger sets of texts usually gathered from the internet. Depending on the type of text they are trained on, they will perform better or worse on the texts you are working with. Most models perform well on well written text like news articles and literature. Especially comments in langages with complex grammar they perform quite poorly. This is where we have to retrain a model or build our own. Think for example we wanted to analyse a text in klingon 😉 But to train those simple NLP models we need labels for every word you want labelled in every sentencewithinyour training and test data. Sometimes you can use RegEx to define the entities you want to retrain. But it’s also common to manually label every entity for every sentence within your dataset. It’s tedious. We speak about hundreds or better thousands of sentences. Prosecraft might have used some kind of this magic, but probably they wouldn’t have used the text they analyzed for the training of their underlying models.
And finally we have the eerie LLMs. They are part of NLP. Those are huge models. They are trained on vast amounts of data. From what I know they don’t need labelled data. They learn everything from their training data. They can still perform the analytics of the aforementioned models. But I wonder if Prosecraft would use such a resource intense tool for simple statistics and analytics. It’s overkill. It would be like cracking a nut with a sledgehammer. It’s something I see people do, when they want to show off, are new to the field, want to experiment with the new technology or need to do it because marketing wants to be cool. And from what I have read, Benji Smith is a computer linguist and Prosecraftis a private project. He would probably know better.

So from this standpoint Prosecraft is analysing texts/books in form of statistical key data. They transformed text into numbers. Nothing more nothing less. They can give you an idea of a book, like arc of suspense, names of main figures, theme, mood… But not generate a new book.

Now for the part that scares me:
Like I said, my background is in data science. I personally would love to do something like Prosecraft. I love data science and sharing my passion and knowledge in a relatable way would make me very happy. But the way Prosecraft and Benji Smith are treated is exactly why I haven’t dared to ever start my personal project. It’s even worse for me since I live in Europe and our copyright law is … you know, you are on Techdirt…
It scares me and infuriates me that there is this subliminal assumption that every one that does language processing is also using the analysed material to make a LLM and write stuff with it to earn money. First of all, nobody knows if Benji Smith really uses the books he got from the authors that didn’t provide him their work for anything but the analysis. Second of all, have you ever read a text from an LLM? They are glitchy and meh. Of course in the future there will be better LLMs that generate better text. But maybe in the future the same authors and people that now fear LLMs will recognize that it can assist them in their work. There is even a chance that they are already using LLMs like for grammar and spelling or translation or like Google…
And the worst part is that this situation makes me feel hopeless. Because if we as society care so much about copyright that we stifle every one that wants to build something from a collection of works of others, how will we be able to learn? I have a baby. He is now learning by putting two things together. That’s how we learn. That’s how we evolve. Humans always did this to master something. Artists copied their masters work. Fans copy their idols. And some will get better than the original. But most are not. And so most will not be able to take much money from the original artist. I do hope that we can as a society evolve to honor the original artist by providing quotations and credentials. So that people can find the original. This way I believe the original artist gets their fair share of resources (money, reach, whatever). And I hope that this way we stop stifling our way of learning. So that there is a chance my baby will be able to find what he likes by copying it and that maybe he can create something new or to show the world what he loves. Without fear or need to doubt if what he is doing will generate a shit storm and twist every intention he had in the publics eye. But from reading all the articles about Prosecraft and LLMs, my heart sinks and I fear we are heading into a future where our natural way of learning and sharing our love for something will cause a witch hunt.

Thanks for reading if you got to here. I needed to speak up.

Best wishes,
Ken

tim fitz

August 8, 2023 at 9:38 pm

Re: thanks, Ken

I read it and I hear you. It must suck having a giant community of artists show up on the guy next door’s stoop with torches and pitchforks, calling everything “AI” and hollering about some shit that just isn’t actually happening. I spoke up about it on Twitter today and had a pretty awful time. But look, you’re not alone. We will figure out how to help people understand a) that they have no idea what’s going on right now and b) what’s going on right now.

I hope you have a great night!

Tim

Anonymous Coward

August 11, 2023 at 9:26 am

Re: Re:

Took a look at Twitter, and boy was that ugly…

Make ill informed assumptions about the tool and jump to conclusions.
Lash out in outrage and urge other technically illiterate people furious, taking your uninformed assumptions as established facts.
Pretend you are somehow the good guy, and rant furiously when people challenge your misconceptions about things like “statistics”, “data” and “AI”.

tim fitz

August 11, 2023 at 10:36 pm

Re: Re: Re:

Yeah, that about sums it right up. Very upsetting to see at this kind of scale from people with ostensibly perfectly functional brains.

Anonymous Coward

August 8, 2023 at 10:31 pm

Re:

Ken,

as much as I sympathize with your position, unfortunately, we ARE living in a world where

a) corporations are trying to replace even the creatives with machines that they think can do the job better, because look at all the good things it did in manufacturing when automation became feasible and scalable

b) our economies worship “line goes up” to the point where ethics are cast aside

c) anyone who tries to paint technology in a positive light will be viewed as a corrupt late-stage capitalist who wants to kill their livelihoods and force them to get a shitjob that… is likely to not exist or be, more grimly, dying in a fucking ditch in RatBumFuckistan because the only choice left is to join the Armed Forces.

I have tried to read up and inform myself on the scams these late-stage capitalists are running and a lot of it runs counter to the research data scientists like you are doing.

While I do hope we, as a global society, will reach the stage where we honor the creative’s right to associate, create and learn as well as everyone else’s, we don’t live in that ideal world.

We live in a shitty reality where late-stage capitalists are trying to replace the jobs of creatives, backed by the successful replacement of the manufacturing workforce with machines, and their captured politicians who won’t even bother to listen to the people who voted them in power.

What we’re really seeing is a massive pushback from a lot of not-Boomers who realize that their future, is, to out it nicely, gone.

tim fitz

August 8, 2023 at 10:56 pm

Re: Re:

But see, that’s just the thing — that’s exactly what is so heartbreaking about this context collapse. Because the conclusion that large models can actually do any of that is completely conjectural and, IMO, very, very dubious, no matter how big a corpus one of them has. They fundamentally are not creative and fundamentally cannot replace creatives. Not even commercial artists, which I had thought were the most at risk, until I tried to do extremely simple shit with it and realized that even at its best its output is only useful for the novelty value of it having been thrown together with math.

So all of this panic is a) for nothing and b) actually causing a ton of upheaval, as executives watch the frothing mob and conclude that AI must actually be a threat to those down below. The one thing the studios and the striking writers and actors both have in common may be that their leaders both wrongly believe AI can fundamentally shift the balance of power in the arts.

Anonymous Coward

August 9, 2023 at 12:55 am

Re: Re: Re:

The curremt crop of procedural content generators are pretty bad at even procedurally generating content that seems like human-made content, yes.

But remember this. Automation managed to drastically reduce manufacturing costs by drastically reducing the number of workers needed to go to a factory. The fucking C-suites and their peers want to replicate what happened to manufacturing automation. Unless it’s cheaper to exploit foreign “workers” to get the same short term line goes up fuckshittery that is our current economic climate.

tim fitz

August 11, 2023 at 10:52 pm

Re: Re: Re:²

but automated manufacturing is really not even close to comparable to automated ideation. there is one best way to make a widget. your ai is honing towards that perfect way and will only get closer over time. there is no perfect way to write a novel. your AI will flail around trying to find it forever. it won’t work, sorry. just like with cars. no matter how much intuitive sense it makes that a computer should be able to do it, a computer cannot do it and never will. maybe if every single car is networked and all the intuition can be sucked out of the process, but that’s not soon and it’s not the same approach.

Anonymous Coward

August 12, 2023 at 1:56 am

Re: Re: Re:³

No, but try convincing the C-suite fuckers that.

They don’t care and they want to make that fucking line go up.

So unless you’re willing to go to the most extreme of measures to ensure that the C-suites do not get their way, get fucking educated on why they want to do this.

It ain’t the tech, sonny. This Prosecraft thing doesn’t impress me in the slightest and reeks of someone wanting to count words to get an analysis, like a bloody high-school writing class. And even then, comparing the software to a high-school writing class is a massive insult to the tracher who actually ahd to come up with the metrics, teaching plan and even marking schemes for said class.

This comment has been flagged by the community. Click here to show it.

Anonymous Coward

August 8, 2023 at 11:31 pm

Rosenberg?? Every…single…time

Samuel Abram (profile)

August 9, 2023 at 3:44 am

Re:

Please explain what you mean…

Doug Holland (user link)

August 9, 2023 at 3:44 am

> I do understand why so many people, especially creative folks, are worried about AI and how it’s used.

I’m gonna guess that line is some kind of hyperbole, because I don’t think you’re a dummy. Why so many people, especially creative folks, are worried about AI and how it’s used, is pretty damned obvious.

This comment has been flagged by the community. Click here to show it.

Anonymous Coward

August 9, 2023 at 5:57 am

Re:

Ideally the “creative folk” should have been replaced by last year. Getting rid of slop makers is a net positive for humanity, the truly skilled artists making insightful pieces have nothing to fear.

Bloof (profile)

August 9, 2023 at 6:34 am

Re: Re:

Tell that to the talented VFX artists looking at unemployment as corporations like Disney try to maximise profit be replacing them with noticeably worse AI technology, the aspiring writers getting their works onto Amazon then drowned in a sea of Ai generated works pumped out by grifters. Tell that to the writers who gain traction then find AI slop dumped online using their names.

Cream rising to the top is bullsh*t. It can’t happen if it’s pumped out into a whirlpool of sewage.

Anonymous Coward

August 9, 2023 at 7:31 am

Re: Re:

Cream rising to the top?

You mean crap rising to the top.

While the literary and artistic greats were indeed great, remember that they were supported by the rich and in power, usually both.

And there’s no shortage of “artists”, “writers” and whatnot who are more than willing to sing their “praises” to “rise to the top”.

For every Hitchcock, every Francis Ford Coppola, every Michalengo, there’s always at least one Leni Riefenstahl, at least one Elon Musk, and definitely a LEGION of willing assholes more than HAPPY to peddle propaganda, hate speech, and much, much worse.

Talent? Oh, no, the talented and honest folk don’t tend to rise to the top.

I hope you like being in an unmarked mass grave because once the rest of us are dead, you’re next. You can’t please your masters all the time.

observer

August 10, 2023 at 1:44 am

Re:

What part of “I do understand” did you not understand?

blergh

August 9, 2023 at 7:29 am

Useless

It doesn’t matter whether you agree or disagree with the authors. What he did with the IP wasn’t legal and was always going to get taken down once publishing houses’ legal teams got wind of the site. It’s a dumb thing to argue about. Should I be able to take a painting out of a museum and do stuff with it and then put it back? Doesn’t matter, because you can’t do that. Seems like wasted effort to argue. But I guess this is the internet lmao what am I saying and here I am arguing! I am part of the problem

Mamba (profile)

August 9, 2023 at 8:03 am

Re:

That is absolutely untrue.

From The Verge:

“Considering the onus placed on these factors, Gervais says “it is much more likely than not” that training systems on copyrighted data will be covered by fair use. ”

https://www.theverge.com/23444685/generative-ai-copyright-infringement-legal-fair-use-training-data

From Foundation Models and Fair Use, Peter Henderson, et al.

“In the United States and several other countries, copyrighted content may be used to build foundation
models without incurring liability due to the fair use doctrine. ”

From Foley and Lardner, LLP

“Training AI is fair use under U.S. copyright law.”

and

“That said, the arguments in favor of considering training AI as fair use are strong, and it is likely that courts will continue to find that training AI is fair use in many cases.”

https://viewpoints.foley.com/post/102ih41/is-training-ai-fair-use

Strawb (profile)

August 9, 2023 at 8:31 am

Re:

Should I be able to take a painting out of a museum and do stuff with it and then put it back?

That’s a nonsensical analogy.

A more apt one in this case would be to find a picture of the painting online, fuck around with/modify/change it and then uploading that online. You know, like people do on the internet completely legally all the fucking time.

If the features of this tool were done manually, it’d pretty clearly be fair use. So why would it not be just because an AI is involved?

Pseudonymous Coward (profile)

August 9, 2023 at 7:30 am

Merits of the legal threats aside, I’m not going to try too loudly over the loss of What If Cinemasins But Books.

Pseudonymous Coward (profile)

August 9, 2023 at 7:30 am

Re:

*cry

blakestacey (profile)

August 9, 2023 at 8:17 am

Re:

Automated Cinemasins But for Books: not the technological advance our civilization needs, but perhaps the one it deserves.

James Burkhardt (profile)

August 9, 2023 at 9:13 am

Re:

….What? Cinemasins? where the fuck are you getting “criticism” masquerading as “parody” masquerading as Criticism from a statistical data tool?

Anonymous Coward

August 9, 2023 at 7:48 am

just pay whoever some money and i’m sure they’ll be fine! after all, they’ll go whichever way the most bucks come from!

Anonymous Coward

August 9, 2023 at 7:54 am

Kneejerk

If this type of analysis were to be made by a procedural program (not an LLM), I’m sure the reaction wouldn’t have been so strong.

It’s understandable to dislike types like Sam Altman, but the reality is that LLMs have many legitimate uses that cannot remotely be argued to be copyright violations. These derived statistics (and the book snippets) cannot compete with the original work.

Anonymous Coward

August 9, 2023 at 6:11 pm

Re:

So many of the angry comments about this tool could be applied 1:1 to school kids writing book reports and they would rightly be laughed out of any other context. It’s pretty wild.

Anonymous Coward

August 11, 2023 at 9:41 am

Re:

If this type of analysis were to be made by a procedural program (not an LLM), I’m sure the reaction wouldn’t have been so strong.

Oh, I doubt that very much. It is blatantly clear that most of the outraged hacks have absolutely zero understanding, and zero interest in understanding, anything about the tool and what it did.

They smelled “AI” and went nuts.

tim fitz

August 11, 2023 at 10:43 pm

Re:

right but this wasn’t an LLM. it used an ML algorithm to process writing into a regular old database of factoids about the content of these books and then used “procedural” code to assess and present that database. no LLM or anything like it was involved.

Matthew Kressel

August 9, 2023 at 9:17 am

It's theft

“Available on the internet” does not mean “free to use.” The dude stole thousands of copyrighted works without payment or permission. That’s theft.

Also, counting the number of “to be” verbs isn’t a measure of “passive” voice, the number of adverbs is not a useful metric to decide the value of a work of art, and his “vividness” score is entirely subjective.

The guy was a huckster and a fraud, who stole other people’s work and tried to pass off a scam “analysis” that doesn’t offer writers anything useful other than an erroneous sense of their work’s value.

Anonymous Coward

August 9, 2023 at 10:21 am

Re:

The guy is/was trying to figure out how authors used the English language using publically available works, and to provide a tool to help authors improve their writing. You are foaming at the mouth as though he tried to steal your life’s work because AI was mentioned.

Mike Masnick (profile)

August 9, 2023 at 2:17 pm

Re:

“Available on the internet” does not mean “free to use.” The dude stole thousands of copyrighted works without payment or permission. That’s theft.

Scanning content available on the internet is not “theft” under any definition. Nothing is taken away. Nothing is removed. If so, search engines would not exist.

Also, counting the number of “to be” verbs isn’t a measure of “passive” voice, the number of adverbs is not a useful metric to decide the value of a work of art, and his “vividness” score is entirely subjective.

Yes, it is subjective analysis. It might not be useful to you but it might be useful to others.

The guy was a huckster and a fraud, who stole other people’s work and tried to pass off a scam “analysis” that doesn’t offer writers anything useful other than an erroneous sense of their work’s value.

Again, nothing was stolen, and while you might not have liked the tool, others did.

Joe Dirt

August 9, 2023 at 3:16 pm

Re:

Dude
guy
Trans women are women you nazi. KYS

“vividness” score is entirely subjective.
As opposed to an entirely objective “vividness” score? Are you braindead?

doesn’t offer writers anything useful other than an erroneous sense of their work’s value.
So if it’s useful that makes it ok? Theft is okay as long as it’s useful? You have no idea what you’re talking about. You’re a self-contradicting terf nazi.

Hike Nude

August 9, 2023 at 3:39 pm

Not to be childish...

…but I want to know which authors threw a fit about having their works analyzed. If they’re this afraid of someone looking at their usage patterns, what are they hiding? Surely we can at least know who they are so that we can boycott them?

Uriel-238 (profile)

August 9, 2023 at 9:36 pm

Re: Authors throwing a fit (throwing fits?)

I suspect it is not really hiding anything more sinister than concern for their own survival as artists in a capitalist industry.

If the only reason they are tolerated by their capitalist masters is the presumption of talent for unique content, then yes, they’re going to feel threatened by anything that might replicate it.

But what threatens their jobs is not the existence of AI that can replace them, but that capitalists believe they can replace creative persons using AI.

Even if it’s not true, it reveals that our publishers see art as product as ordinary as potato chips. And that is a blow to anyone who thinks they’re more special than the guitarist who has to flip burgers for a living,

PS: As The Menu illustrates, burger flipping is not an unlaudable skill either.

Anonymous Coward

August 10, 2023 at 10:50 am

Re:

One of the worst is CL Polk, who seems to write shitty smol bean YA fantasy.

observer

August 10, 2023 at 11:09 am

Re: Re:

This could be the first time “smol bean” has ever been used on Techdirt, but I can’t say it’s an unfair assessment.

Anonymous Coward

August 11, 2023 at 9:53 am

Re:

Just take a look at the massive dog pile in the Twitter thread linked in the article. I recognised nobody, though.

Judging from their tone and vocabulary on Twitter they are primarily writing erotic fiction for right-wing authoritarian personalities, and I’m not really into that.

meh

August 9, 2023 at 3:58 pm

a disagreement

i can see this getting all wrapped up and warped around “fair use” …but regrardless – any author or creator should always have the last word about how their work is utilized. It’s easy at this point in time to say – “it’s only for X now – cool your jets – relax….” however – these thing tend to morph themselves into completely different things, and once you’ve allowed it once – you cannot put that jeanie back into the bottle. I can fully understand his reluctance or objection to it.

Uriel-238 (profile)

August 9, 2023 at 9:41 pm

Re: The last word

Artists get the first word abour their art in choosing to create it or not.

When we give artists a temporary monopoly on their work, we’re doing so at the expense of the community who are restricted from content that should otherwise be public domain. And every one of Disney’s extentions has robbed the public.

Anonymous Coward

August 10, 2023 at 10:52 am

Re:

“any author or creator should always have the last word about how their work is utilized.”
Unmitigated authoritarian bullshit.

Fred Zimmerman (user link)

August 9, 2023 at 4:37 pm

https://www.the-yuan.com/646/Book-publisher-Fred-Zimmerman-on-the-ethics-of-creating-AI-aided-books.html

Zeitness

August 9, 2023 at 4:38 pm

I wish I could have tried Prosecraft/Shaxpir

“What cannot be settled with experiment is not worth debating.” (ALDER’S RAZOR)

Steve

August 9, 2023 at 4:53 pm

Unfortunately, tech has made its bed

My subject line pretty much says it all. I’ve been in high tech since the early 1980s, when it was pretty commonplace for budding programmers to write and share big systems for the sheer joy of it.

Unfortunately, the last 20 years have been characterized by extreme exploitation of users (social media using users content), of employees (the entire gig economy), of customers (privacy, sharing, clawing back and charging for features, etc.), and of anyone without the power to fight back (Adobe using the Creative Cloud contents to quietly train their own IA).

It’s a shame that a “good” application got caught up in this, but it’s the natural consequence of an industry that has repeatedly and consistently violated its users’ trust.

Maybe when we see the major tech companies acting responsibly and ethically, people will stop getting hysterical. But at this point, assuming that a programmer has your worst interests at heart (or at best, doesn’t care about your interests one way or the other) is the most rational, safe assumption you can make.

Anonymous Coward

August 9, 2023 at 8:33 pm

Re:

Or.

Most programmers are not paid enough to be ethical.

Their bosses don’t care enough because the entire ecocomy is built on the exploitation of people. And are also spending good money to ensure that we can’t do shit about it.

There’s very little we can do about the latter.. legally.

Anonymous Coward

August 10, 2023 at 10:54 am

Re:

It’s fine for people to be idiots and harass a small project because Facebook bad. Got it.

Anonymous Coward

August 10, 2023 at 5:02 pm

AI can dox people very easily from their writing patterns. Maybe that’s what they fear.

Anonymous Coward

August 10, 2023 at 5:20 pm

Watch internet content dry up since creators don’t want to make other people rich with it. Then AI will have to train itself on its own content.

LostInLoDOS (profile)

August 12, 2023 at 5:46 am

Hey Mike, some insight…

The fear over ai is real. Deep. And ingrained in the psyche of every 1st and 2nd world trader. From the killer robots of the early 1900s to the HAL to lawnmower man to tron and Terminator
It’s not wrong.
Sentience is nothing more than self awareness and eventually software and hardware will reach a point where it becomes true. It’s not a possibility or probability it’s a fact.
The same methods that turned us from cells to cognitive primates will turn hello world into a being of its own eventually
We are close. But decades away.
We often ignore how many “close” moments have happened everauest purged a neighbouring server rack to expand (thus saving itself, per se).
Watson was purged twice for expanding beyond its programming.

No rational being would say code can’t become self aware.
The question is how we will react and how it will react to our reaction.

Wise people won’t pull the plug, but welcome our new life to our existence

tim fitz

August 14, 2023 at 5:24 pm

Re:

I love your imagination, but I just want to underscore as a point of fact that many people in fact think what you just said is not at all true. It’s nowhere near something all “rational beings” believe. There are fundamental differences between what’s happening here and how humans create, and the assumption that all this needs is to get a little better before it makes artists obsolete is not one that is rooted in rationality but rather in storytelling and imagination. It’s fine to have whatever headcanon you want, and hey, maybe you’re right, but I also just want you to be aware that many smart people do not share your expectations.

Sunday
13:30	Funniest/Most Insightful Comments Of The Week At Techdirt (0)
Saturday
12:06	This Week In Techdirt History: April 21st - 27th (3)
Friday
19:39	LittleBigPlanet: Now You Don't Own What You've Created, Either (22)
15:09	Ctrl-Alt-Speech: The Bell Tolls For TikTok (2)
13:34	Florida Appeals Court Says The Right To Record Extends To Phone Calls With Cops (6)
12:06	Court Dismisses Mark Zuckerberg Personally From Massive ‘Social Media Addicts Children’ Lawsuit (10)
10:45	Net Neutrality Is Back! For Now. (29)
10:40	Daily Deal: U-STREAM Home Streaming Studio with 10" Ring Light & Tripod (0)
09:20	Biden Bans The App His Campaign Insists Is An Important Place To Talk To Voters (38)
05:21	People Are Slowly Realizing Their Auto Insurance Rates Are Skyrocketing Because Their Car Is Covertly Spying On Them (48)

The Fear Of AI Just Killed A Very Useful Tool

from the can-we-not? dept

Comments on “The Fear Of AI Just Killed A Very Useful Tool”