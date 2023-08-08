The Fear Of AI Just Killed A Very Useful Tool
I do understand why so many people, especially creative folks, are worried about AI and how it’s used. The future is quite unknown, and things are changing very rapidly, at a pace that can feel out of control. However, when concern and worry about new technologies and how they may impact things morphs into mob-inspiring fear, dumb things happen. I would much rather that when we look at new things, we take a more realistic approach to them, and look at ways we can keep the good parts of what they provide, while looking for ways to mitigate the downsides.
Hopefully without everyone going crazy in the meantime. Unfortunately, that’s not really the world we live in.
Last year, when everyone was focused on generative AI for images, we had Rob Sheridan on the podcast to talk about why it was important for creative people to figure out how to embrace the technology rather than fear it. The opening story of the recent NY Times profile of me was all about me in a group chat, trying to suggest to some very creative Hollywood folks how to embrace AI rather than simply raging against it. And I’ve already called out how folks rushing to copyright, thinking that will somehow “save” them from AI, are barking up the wrong tree.
But, in the meantime, the fear over AI is leading to some crazy and sometimes unfortunate outcomes. Benji Smith, who created what appears to be an absolutely amazing tool for writers, Shaxpir, also created what looked like an absolutely fascinating tool called Prosecraft, that had scanned and analyzed a whole bunch of books and would let you call up really useful data on books.
He created it years ago, based on an idea he had years earlier, trying to understand the length of various books (which he initially kept in a spreadsheet). As Smith himself describes in a blog post:
I heard a story on NPR about how Kurt Vonnegut invented an idea about the “shapes of stories” by counting happy and sad words. The University of Vermont “Computational Story Lab” published research papers about how this technique could show the major plot points and the “emotional story arc” of the Harry Potter novels (as well as many many other books).
So I tried it myself and found that I could plot a graph of the emotional ups and downs of any story. I added those new “sentiment analysis” tools to the prosecraft website too.
When I ran out of books on my own shelves, I looked to the internet for more text that I could analyze, and I used web crawlers to find more books. I wanted to be mindful of the diversity of different stories, so I tried to find books by authors of every race and gender, from every different cultural and political background, writing in every different genre and exploring all different kinds of themes. Fiction and nonfiction and philosophy and science and religion and culture and politics.
Somewhere out there on the internet, I thought to myself, there was a new author writing a horror or romance or fantasy novel, struggling for guidance about how long to write their stories, how to write more vivid prose, and how much “passive voice” was too much or too little.
I wanted to give those budding storytellers a suite of “lexicographic” tools that they could use, to compare their own writing with the writing of authors they admire. I’ve been working in the field of computational linguistics and machine learning for 20+ years, and I was always frustrated that the fancy tools were only accessible to big businesses and government spy agencies. I wanted to bring that magic to everyone.
Frankly, all of that sounds amazing. And amazingly useful. Even more amazing is that he built it, and it worked. It would produce useful analysis of books, such as this example from Alice’s Adventures in Wonderland:
And, it could also do further analysis like the following:
This is all quite interesting. It’s also the kind of thing that data scientists do on all kinds of work for useful purposes.
Smith built Prosecraft into Shaxpir, again, making it a more useful tool. But, on Monday, some authors on the internet found out about it and lost their shit, leading Smith to shut the whole project down.
There seems to be a lot of misunderstanding about all of this. Smith notes that he had researched the copyright issues and was sure he wasn’t violating anything, and he’s right. We’ve gone over this many times before. Scanning books is pretty clearly fair use. What you do with that later could violate copyright law, but I don’t see anything that Prosecraft did that comes anywhere even remotely close to violating copyright law.
But… some authors got pretty upset about all of it.
I’m still perplexed at what the complaint is here? You don’t need to “consent” for someone to analyze your book. You don’t need to “consent” to someone putting up statistics about their analysis of your book.
But, Zach’s tweet went viral with a bunch of folks ready to blow up anything that smacks of tech bro AI, and lots of authors started yelling at Smith.
The Gizmodo article has a ridiculously wrong “fair use” analysis, saying “Fair Use does not, by any stretch of the imagination, allow you to use an author’s entire copyrighted work without permission as a part of a data training program that feeds into your own ‘AI algorithm.’” Except… it almost certainly does? Again, we’ve gone through this with the Google Book scanning case, and the courts said that you can absolutely do that because it’s transformative.
It seems that what really tripped up people here was the “AI” part of it, and the fear that this was just another a VC funded “tech bro” exercise of building something to get rich by using the works of creatives. Except… none of that is accurate. As Smith explained in his blog post:
For what it’s worth, the prosecraft website has never generated any income. The Shaxpir desktop app is a labor of love, and during most of its lifetime, I’ve worked other jobs to pay the bills while trying to get the company off the ground and solve the technical challenges of scaling a startup with limited resources. We’ve never taken any VC money, and the whole company is a two-person operation just working our hardest to serve our small community of authors.
He also recognizes that the concerns about it being some “AI” thing are probably what upset people, but plenty of authors have found the tool super useful, and even added their own books:
I launched the prosecraft website in the summer of 2017, and I started showing it off to authors at writers conferences. The response was universally positive, and I incorporated the prosecraft analytic tools into the Shaxpir desktop application so that authors could privately run these analytics on their own works-in-progress (without ever sharing those analyses publicly, or even privately with us in our cloud).
I’ve spent thousands of hours working on this project, cleaning up and annotating text, organizing and tweaking things. A small handful of authors have even reached out to me, asking to have their books added to the website. I was grateful for their enthusiasm.
But in the meantime, “AI” became a thing.
And the arrival of AI on the scene has been tainted by early use-cases that allow anyone to create zero-effort impersonations of artists, cutting those creators out of their own creative process.
That’s not something I ever wanted to participate in.
Smith took the project down entirely because of that. He doesn’t want to get lumped in with other projects, and even though his project is almost certainly legal, he recognized that this was becoming an issue:
Today the community of authors has spoken out, and I’m listening. I care about you, and I hear your objections.
Your feelings are legitimate, and I hope you’ll accept my sincerest apologies. I care about stories. I care about publishing. I care about authors. I never meant to hurt anyone. I only hoped to make something that would be fun and useful and beautiful, for people like me out there struggling to tell their own stories.
I find all of this really unfortunate. Smith built something really cool, really amazing, that does not, in any way, infringe on anyone’s rights. I get the kneejerk reaction from some authors, who feared that this was some obnoxious project, but couldn’t they have taken 10 minutes to look at the details of what it was they were killing?
I know we live in an outrage era, where the immediate reaction is to turn the outrage meter up to 11. I’m certainly guilty of that at times myself. But this whole incident is just sad. It was an overreaction from the start, destroying what had been a clear labor of love and a useful project, through misleading and misguided attacks from authors.
did he buy all those books legally or did he download a bunch of them?
Re:
It doesn’t matter even a little bit how he got the books.
Re:
Think of it this way: If someone pirates a movie, that’s unlawful. But if that person then writes a review of that movie for their blog, that review is perfectly legal, regardless of how they obtained their copy of the movie.
Prosecraft was basically a fancy “review” of books with extra bells and whistles. Even if he’d pirated the books, it would have no bearing on Prosecraft itself.
Re:
Think of it this way: If someone pirates a movie, that’s unlawful. But if that person then reviews that movie for their blog or YouTube channel or what have you, that review in itself is perfectly legal, regardless of how they obtained their copy of the movie.
Prosecraft was basically a fancy “review” of books with extra bells and whistles. Even if he’d pirated the books, he might get in trouble for that if found out, but it would have no bearing on Prosecraft itself.
Re:
Yes.
And also, the public domain is also a thing if you really wanted to argue.
How many authors actually make a living from their writing compared to how many people author stories and publish for free? It seems to me that those shouting the loudest are the very few who won the lottery and found a publisher, and they want to stop a younger hungrier author replacing them in sweep stakes for success.
“Well, I don’t know what it is, but I know I hate it.”
brought to you by the author of “How Dare You Read My Books!”
The “analysis” it did was quite bad, so no big loss regardless. Things like identifying horror scenes as happy based on the words used, etc.
Re:
Perhaps a more in-depth review of precisely what analysis was being performed might be in order. You seem to think that this program was intended to have human-level recognition of content, and thus are attempting to attribute to the analysis abilities that were never claimed.
Re: Re:
Everything about it was that they wanted it to be a tool for other humans to use. That would require human level analysis that looks like it just wasn’t there.
Re: Re: Re:
Just like how a spell-checker needs human level analysis? Or any other tool for that matter.
Reminds me of Searchtodon
This is very reminiscent of the outrage that was directed at Searchtodon, where someone provided a genuinely useful service, and then it was a target of people accusing it of being “created by an out of touch tech bro”, and saying that it causes harm through a mechanism that it doesn’t actually implement.
Commenters on the Gizmodo piece are pointing out how the analysis of the books, and also the books chosen to be in there, were suspect and makes it sound not all that useful. Like the meanings of words could be solved with math and misunderstanding what “passive voice” means. Also, why were self-help books from Faith G Harper in there, for instance? That sure ain’t any “prose”…
Re:
Many humans who complain about the passive voice have neither a clue what linguists mean by that nor a coherent idea of what they’re complaining about. So, a program written to spot it will quite likely be misguided, and a model trained on a corpus of people kvetching about it will probably be completely incoherent.
I read this article on the whole thing. Diane Urban had a “most vivid page” that was the most spoilery page of the climax of the book, something not really publicly available. The Quartz piece also points out that back in March, Benji was looking for help to fine tune/train an LLM.
This Mashable article also points out that Shaxpir has a monthly subscription option and parts of it were made with Prosecraft’s database.
Yeah, this does sound like a tech bro thing built off of work that wasn’t paid for the more I’m digging into it.
Re:
Perhaps you can expand on that last bit. Shaxpir built off of Prosecraft’s database? As in “one product the author built using the database of another product the author built”?
Or are you thinking about all the books scanned into the database? The post here kinda describes that the database itself is almost certainly within Fair Use limits.
(Or, as you were still looking into it, are you now looking for an edit button? 🙂 )
Re: Re:
The way that Prosecraft had decidedly non-prose stuff like a Faith G Harper’s self-help book, some pages featured as the most vivid were definitely-not-publicly-available stuff like the most spoilery pages of a climax while Benji says he supposedly cares about authors, Benji was looking for help with another LLM, Shaxpir being a service with a paid subscription tier and Shaxpir using data from Prosecraft… it all just reeks of the tech dude getting caught by authors and having to issue some fake apologia about it, rather than Prosecraft and Shaxpir being some legitimately misunderstood wonder-tools.
Re: Re: Re:
How does anyone have anything not publicly available? Spoilers don’t mean squat
Re:
Revealing spoilers doesn’t violate copyright law.
And, if a tool like this is ruining your book it’s probably not a very good book. No one is using a tool like this to find out how a book ends.
So what?
Re: Re:
There are a lot of good books whose twists and turns hinge on info or action that happens on a single page. It’s more an issue with the tool than the book, methinks.
And yeah, spoiling a book doesn’t violate copyright law. However, the way he talks a big game about caring about authors and the written word, but everything about the “analysis” that Prosecraft spits out, the non-prose books in there as well, and the way that he was itching to work with an LLM trained on a ton of books gives me the vibe that he actually doesn’t. Like he just tossed a bunch of books he found on sketchy download sites while trawling the web into his thing.
Re:
From your report, it sounds like nothing of the sort.
Just because it is (probably, at the moment) legal doesn’t make it useful, or even on a path to becoming useful. That “most passive/most vivid” example certainly doesn’t indicate a whit of utility. Nothing in the former is grammatically passive; “Who stole the tarts?” is not more vivid than dozens of other passages in Alice.
Re:
Things should be shut down because you personally don’t have use for them? That’s nonsense.
A stupid idea from a stupid premise
I was noticing that Linken Park releases versions of its songs that separate the instrumental tracks and the a cappella voice tracks, I assumed inviting transformative use (some results of which I’d seen on YouTube), and it reminded me of the story about Micheal Jackson talking with Daryl Hall about borrowing the base line of I Can’t Go for That (No Can Do) for Billy Jean
I’d expect that authors and writers who are actually published would be fine if other writers studied their material with full intent to emulate or borrow certain styles if it was a human being doing it. But right now the notion they seem to fear is not that their book will be fed into an AI in order to create additional, transformative product, but to write what they’d create in their stead.
Generative AI is a long, long way from being able to write us a new Hemmingway novel, or to draw us a new Maurice Sendak book. And even content IP owners might only be able to slow the development of AI to where it’s ability to Sendak convincingly meets and exceeds the original material.
But I don’t think we want to withhold all our art like Prince did, rather I think creative people who depend on those incomes are afraid of losing them and being left to the elements.
We’re not afraid of AI, we’re afraid of capitalism. And the solution is not going to be in delaying AI, but confronting that exploitation of labor is going to leave more and more people unemployed and hungry until it reaches a crisis point.
Gizmodo
Man, they sure have fallen a long way. Weren’t they the ones that tried the AI written content recently, that fucked up right out of the gate?
Unfortunately, the human writers aren’t any better. They regularly write opinion pieces with a complete lack of fundamental understanding. Their piece on the Canadian link tax was just as completely ignorant.
Re:
Also, I can’t see the comments, because Kinja sucks and I can’t read much on the phone. The article was painful enough, but the comments just refuse to load.
This is what I fear...
https://arstechnica.com/information-technology/2023/08/author-discovers-ai-generated-counterfeit-books-written-in-her-name-on-amazon/
A number of cartoonists have been struggling with this as well. Cartoons in their style are being created and monetized that they did not create.
Mike, I’m curious what you think of this. It seems like the counterargument would be, “well it’s fraud.” Which, ok, but what if they didn’t submit it under the authors name or the cartoonists name and just monetized it. I’m not convinced the market would sort it out.
Re:
What’s your solution? Outlaw AI? Change fair use?
Anyone got Mike's bsb and account number?
He is obviously fine with people ripping him off uf he endorses it with other people.
Just the extremely poor “arguments” of the hater crowd make me feel like this was something potentially worthwhile even if it never went anywhere. You know, like most authors and stories.