334

Sarah Silverman Sues Maker Of ChatGPT For Copyright Infringement (www.huffpost.com)

submitted 1 year ago by dl007@lemmy.ml to c/technology@lemmy.ml

123 comments fedilink hide all child comments

top 50 comments

sorted by: hot top controversial new old

[-] dartos@reddthat.com 25 points 1 year ago

I’ve noticed that the lemmy crowd seems more accepting of AI stuff than the Reddit crowd was

[-] aniki@lemm.ee 60 points 1 year ago

I mean for tech stuff it's fantastic. I could spend 30 minutes working out a regex to grep the logs in the format I need or I could have a back and forth with ChatGPT and get it sorted in 5.

I still don't want it to write my TV or movies. Or code to a significant degree.

[-] Karyoplasma@discuss.tchncs.de 2 points 1 year ago

I use ChatGPT to romanize Farsi script from song texts and such. There is no other tool that works even remotely well and the AI somehow knows how to properly transliterate.

[-] TWeaK@lemm.ee 2 points 1 year ago

I for one welcome our SkyNet overlords. They can't be much worse than the current global leaders...

[-] Karyoplasma@discuss.tchncs.de 2 points 1 year ago

I always say "please" and "thank you" when using chatGPT. When the AI finally takes over and subsequently and inevitably concludes that the world would be a better place without humans, it may remember that myself specifically was always friendly. Maybe it'll then have the courtesy to nuke my house directly instead of making me ultimatively succumb to nuclear winter.

[-] gamer@lemm.ee 1 points 1 year ago

That’s genius! I’ve been trying to figure out how to incorporate ChatGPT-like bots into my work, but haven’t found it to be that useful. I don’t write a lot of regex, but hate it every time I do, so I’ll definitely be trying this next time I need it.

load more comments (4 replies)

[-] AlexWIWA@lemmy.ml 4 points 1 year ago

Accepting of AI as a concept yes. But we're not too accepting of the current generation of theft-markov-generators that companies want to try and replace us with.

[-] dartos@reddthat.com 2 points 1 year ago

They’re a lot more than markov generators, but yeah. I don’t really think, in the long run, we’re going to see too many jobs displaced by AI.

Im not convinced that our statistics based training methods will lead to true iRobot style AGI.

And any company (except maybe visual novel shops) that fires people in favor of AI is going to regret it within 2 years.

[-] AlexWIWA@lemmy.ml 2 points 1 year ago

Yeah I'm being facetious when I call them markovs. I'm mainly just saying that they are basically regurgitating copyrighted material based on statistics, so I believe they are just automated copyright violations.

Completely agree with your comment.

load more comments (3 replies)

[-] MaxPower@feddit.de 25 points 1 year ago* (last edited 1 year ago)

I like her and I get why creatives are panicking because of all the AI hype.

However:

In evidence for the suit against OpenAI, the plaintiffs claim ChatGPT violates copyright law by producing a “derivative” version of copyrighted work when prompted to summarize the source.

A summary is not a copyright infringement. If there is a case for fair-use it's a summary.

The comic's suit questions if AI models can function without training themselves on protected works.

A language model does not need to be trained on the text it is supposed to summarize. She clearly does not know what she is talking about.

IANAL though.

[-] jmcs@discuss.tchncs.de 19 points 1 year ago

I guess they will get to analyze OpenAI's dataset during discovery. I bet OpenAI didn't have authorization to use even 1% of the content they used.

[-] maynarkh@feddit.nl 12 points 1 year ago

That's why they don't feel they can operate in the EU, as the EU will mandate AI companies to publish what datasets they trained their solutions on.

[-] Jaded@lemmy.dbzer0.com 7 points 1 year ago

Things might change but right now, you simply don't need anyones authorization.

Hopefully it doesn't change because only a handful of companies have the data or the funds to buy the data, it would kill any kind of open source or low priced endeavour.

[-] Flaky@iusearchlinux.fyi 4 points 1 year ago

FWIW, Common Crawl - a free/open-source dataset of crawled internet pages - was used by OpenAI for GPT-2 and GPT-3 as well as EleutherAI's GPT-NeoX. Maybe on GPT3.5/ChatGPT as well but they've been hush about that.

load more comments (5 replies)

[-] Riptide502@lemm.ee 15 points 1 year ago

AI is a duel sided blade. On one hand, you have an incredible piece of technology that can greatly improve the world. On the other, you have technology that can be easily misused to a disastrous degree.

I think most people can agree that an ideal world with AI is one where it is a tool to supplement innovation/research/creative output. Unfortunately, that is not the mindset of venture capitalists and technology enthusiasts. The tools are already extremely powerful, so these parties see them as replacements to actual humans/workers.

The saddest example has to be graphic designers/digital artists. It’s not some job that “anyone can do.” It’s an entire profession that takes years to master and perfect. AI replacement doesn’t just mean taking away their job, it’s rendering years of experience worthless. The frustrating thing is it’s doing all of this with their works, their art. Even with more regulations on the table, companies like adobe and deviant art are still using shady practices to unknowingly con users into building their AI algorithms (quietly instating automatic OPT-IN and making OPT-OUT options difficult). It’s sort of like forcing a man to dig their own grave.

You can’t blame artists for being mad about the whole situation. If you were in their same position, you would be just as angry and upset. The hard truth is that a large portion of the job market could likely be replaced by AI at some point, so it could happen to you.

These tools need to be TOOLS, not replacements. AI has it’s downfalls and expert knowledge should be used as a supplement to both improve these tools and the final product. There was a great video that covered some of those fundamental issues (such as not actually “knowing” or understanding what a certain object/concept is), but I can’t find it right now. I think the best comes when everyone is cooperating.

[-] Steeve@lemmy.ca 11 points 1 year ago

Even as tools, every time we increase worker productivity without a similar adjustment to wages we transfer more wealth to the top. It's definitely time to seriously discuss a universal basic income.

[-] TheSaneWriter@lemm.ee 14 points 1 year ago

If the models were trained on pirated material, the companies here have stupidly opened themselves to legal liability and will likely lose money over this, though I think they're more likely to settle out of court than lose. In terms of AI plagiarism in general, I think that could be alleviated if an AI had a way to cite its sources, i.e. point back to where in its training data it obtained information. If AI cited its sources and did not word for word copy them, then I think it would fall under fair use. If someone then stripped the sources out and paraded the work as their own, then I think that would be plagiarism again, where that user is plagiarizing both the AI and the AI's sources.

[-] ayaya@lemmy.fmhy.ml 8 points 1 year ago* (last edited 1 year ago)

It is impossible for an AI to cite its sources, at least in the current way of doing things. The AI itself doesn't even know where any particular text comes from. Large language models are essentially really complex word predictors, they look at the previous words and then predict the word that comes next.

When it's training it's putting weights on different words and phrases in relation to each other. If one source makes a certain weight go up by 0.0001% and then another does the same, and then a third makes it go down a bit, and so on-- how do you determine which ones affected the outcome? Multiply this over billions if not trillions of words and there's no realistic way to track where any particular text is coming from unless it happens to quote something exactly.

And if it did happen to quote something exactly, which is basically just random chance, the AI wouldn't even be aware it was quoting anything. When it's running it doesn't have access to the data it was trained on, it only has the weights on its "neurons." All it knows are that certain words and phrases either do or don't show up together often.

[-] Zetaphor@zemmy.cc 12 points 1 year ago* (last edited 1 year ago)

Quoting this comment from the HN thread:

On information and belief, the reason ChatGPT can accurately summarize a certain copyrighted book is because that book was copied by OpenAI and ingested by the underlying OpenAI Language Model (either GPT-3.5 or GPT-4) as part of its training data.

While it strikes me as perfectly plausible that the Books2 dataset contains Silverman's book, this quote from the complaint seems obviously false.

First, even if the model never saw a single word of the book's text during training, it could still learn to summarize it from reading other summaries which are publicly available. Such as the book's Wikipedia page.

Second, it's not even clear to me that a model which only saw the text of a book, but not any descriptions or summaries of it, during training would even be particular good at producing a summary.

We can test this by asking for a summary of a book which is available through Project Gutenberg (which the complaint asserts is Books1 and therefore part of ChatGPT's training data) but for which there is little discussion online. If the source of the ability to summarize is having the book itself during training, the model should be equally able to summarize the rare book as it is Silverman's book.

I chose "The Ruby of Kishmoor" at random. It was added to PG in 2003. ChatGPT with GPT-3.5 hallucinates a summary that doesn't even identify the correct main characters. The GPT-4 model refuses to even try, saying it doesn't know anything about the story and it isn't part of its training data.

If ChatGPT's ability to summarize Silverman's book comes from the book itself being part of the training data, why can it not do the same for other books?

As the commentor points out, I could recreate this result using a smaller offline model and an excerpt from the Wikipedia page for the book.

load more comments (7 replies)

[-] RoundSparrow@lemm.ee 9 points 1 year ago

The comic's suit questions if AI models can function without training themselves on protected works.

I doubt a human can compose chat responses without having trained at school on previous language. Copyright favors the rich and powerful, established like Silverman.

[-] RedCanasta@lemmy.fmhy.ml 7 points 1 year ago

Copyright laws are a recent phenomenon and should have never been a thing imo. The only reason it's there is not to "protect creators," but to make sure upper classes extract as much wealth over the maximum amount of time possible.

Music piracy has showed that it's got too many holes in it to be effective, and now AI is showing us its redundancy as it uses data to give better results.

it stifles creativity to the point it makes us inhuman. Hell, Chinese writers used to praise others if they used a line or two from other writers.

[-] TheSaneWriter@lemm.ee 6 points 1 year ago

I think that copyright laws are fine in a vacuum, but that if nothing else we should review the amount of time before a copyright enters the public domain. Disney lobbied to have it set to something awful like 100 years, and I think it should almost certainly be shorter than that.

[-] Marxine@lemmy.ml 4 points 1 year ago

VC backed AI makers and billionaire-ran corporations should definitely pay for the data they use to train their models. The common user should definitely check the licences of the data they use as well.

[-] vlad76@lemmy.sdf.org 3 points 1 year ago

I was under impression that there was no real definitive way to tell what ChatGPT or similar AI use for their training. Am I wrong?

[-] Asafum@lemmy.world 1 points 1 year ago

I feel like when confronted about a "stolen comedy bit" a lot of these people complaining would also argue that "no work is entirely unique, everyone borrows from what already existed before." But now they're all coming out of the woodwork for a payday or something... It's kinda frustrating especially if they kill any private use too...

load more comments (13 replies)

load more comments

this post was submitted on 10 Jul 2023

334 points (94.9% liked)

Technology

34785 readers

304 users here now

This is the official technology community of Lemmy.ml for all news related to creation and use of technology, and to facilitate civil, meaningful discussion around it.

Ask in DM before posting product reviews or ads. All such posts otherwise are subject to removal.

Rules:

1: All Lemmy rules apply

2: Do not post low effort posts

3: NEVER post naziped*gore stuff

4: Always post article URLs or their archived version URLs as sources, NOT screenshots. Help the blind users.

5: personal rants of Big Tech CEOs like Elon Musk are unwelcome (does not include posts about their companies affecting wide range of people)

6: no advertisement posts unless verified as legitimate and non-exploitative/non-consumerist

7: crypto related posts, unless essential, are disallowed

founded 5 years ago

MODERATORS

MinutePhrase@lemmy.ml