Technology

59596 readers

3422 users here now

This is a most excellent place for technology news and articles.

Our Rules

Follow the lemmy.world rules.
Only tech related content.
Be excellent to each another!
Mod approved content bots can post up to 10 articles per day.
Threads asking for personal tech support may be deleted.
Politics threads may be removed.
No memes allowed as posts, OK to post as comments.
Only approved bots from the list below, to ask if your bot can be added please contact us.
Check for duplicates before posting, duplicates may be removed

Approved Bots

founded 1 year ago

MODERATORS

448

2 authors say OpenAI 'ingested' their books to train ChatGPT. Now they're suing, and a 'wave' of similar court cases may follow. (www.businessinsider.com)

submitted 1 year ago by L4s@lemmy.world to c/technology@lemmy.world

115 comments fedilink hide all child comments

Two authors sued OpenAI, accusing the company of violating copyright law. They say OpenAI used their work to train ChatGPT without their consent.

you are viewing a single comment's thread
view the rest of the comments

[–] Eccitaze@yiffit.net 0 points 1 year ago (1 children)

"right" and "probable" text are distinctions without difference. The simple fact is that an AI is incapable of handling anything outside its learning dataset. If you ask an AI to talk like a pirate, and it hasn't had any pirate speak fed to it by a human via its training dataset, it will utterly fail. If I ask an AI to produce a Powershell script, and it hasn't had code fed to it by a human via its training dataset, it will fail utterly. An AI cannot proactively buy a copy of Learn Powershell In a Month of Lunches and teach itself how to use Powershell. That fundamental shortcoming--the inability to self-improve, to proactively teach itself and apply that new knowledge to existing concepts--is a crucial, necessary element of transformative effort required to produce a derivative work (or fair use).

When that happens, maybe I'll buy that AI is anything more than the single biggest copyright infringement scheme the world has ever seen. Until then, though, I will wholeheartedly support the efforts of creative minds to defend their intellectual property rights against this act of blatant theft by tech companies profiting off their work.

[–] ClamDrinker@lemmy.world 1 points 1 year ago* (last edited 1 year ago) (1 children)

You realize LLMs are designed not to self improve by design right? It's totally possible and has been tried - It's just that they usually don't end up very well once they do. And LLMs do learn new things, they're just called new models. Because it takes time and resources to retrain LLMs with new information in mind. It's up to the human guiding the AI to guide it towards something that isn't copyright infringement. AIs don't just generate things on their own without being prompted to by a human.

You're asking for a general intelligence AI, which would most likely be comprised of different specialized AIs to work together. Similar to our brains having specific regions dedicated to specific tasks. And this just doesn't exist yet, but one of it's parts now does.

Also, you say "right" and "probable" are without difference, yet once again bring something into the conversation which can only be "right". Code. You cannot create code that is incorrect or it will not work. Text and creative works cannot be wrong. They can only be judged by opinions, not by rule books which say "it works" or "it doesn't".

The last line is just a bit strange honestly. The biggest users of AI are creative minds, and it's why it's important that AI models remain open source so all creative minds can use them.

[–] Eccitaze@yiffit.net -1 points 1 year ago

You realize LLMs are designed not to self improve by design right? It’s totally possible and has been tried - It’s just that they usually don’t end up very well once they do.

Tay is yet another example of AI lacking comprehension and intelligence; it produced racist and antisemitic content because it had no comprehension of ethics or morality, and so it just responded to the input given to it. It's a display of "intelligence" on the same level as a slime mold seeking out the biggest nearby source of food--the input Tay received was largely racist/antisemitic, so its output became racist/antisemitic.

And LLMs do learn new things, they’re just called new models. Because it takes time and resources to retrain LLMs with new information in mind. It’s up to the human guiding the AI to guide it towards something that isn’t copyright infringement.

And the way that humans do that is by not using copyrighted material for its training dataset. Using copyrighted material to produce an AI model is infringing on the rights of the people who created the material, the vast majority of whom are small-time authors and artists and open-source projects composed of individuals contributing their time and effort to said projects). Full stop.

Also, you say “right” and “probable” are without difference, yet once again bring something into the conversation which can only be “right”. Code. You cannot create code that is incorrect or it will not work. Text and creative works cannot be wrong. They can only be judged by opinions, not by rule books which say “it works” or “it doesn’t”.

Then why does ChatGPT invent Powershell cmdlets out of whole cloth that don't exist yet accomplish the exact precise task that the prompter asked it to do?

The last line is just a bit strange honestly. The biggest users of AI are creative minds, and it’s why it’s important that AI models remain open source so all creative minds can use them.

The biggest users of AI are techbros who think that spending half an hour crafting a prompt to get stable diffusion to spit out the right blend of artists' labor are anywhere near equivalent to the literal collective millions of man hours spent by artists honing their skill in order to produce the content that AI companies took without consent or attribution and ran through a woodchipper. Oh, and corporations trying to use AI to replace artists, writers, call center employees, tech support agents...

Frankly, I'm absolutely flabbergasted that the popular sentiment on Lemmy seems to be so heavily in favor of defending large corporations taking data produced en masse by individuals without even so much as the most cursory of attribution (to say nothing of consent or compensation) and using it for the companies' personal profit. It's no different morally or ethically than Meta hoovering all of our personal data and reselling it to advertisers.