this post was submitted on 01 Aug 2023
91 points (91.7% liked)
Technology
59223 readers
3330 users here now
This is a most excellent place for technology news and articles.
Our Rules
- Follow the lemmy.world rules.
- Only tech related content.
- Be excellent to each another!
- Mod approved content bots can post up to 10 articles per day.
- Threads asking for personal tech support may be deleted.
- Politics threads may be removed.
- No memes allowed as posts, OK to post as comments.
- Only approved bots from the list below, to ask if your bot can be added please contact us.
- Check for duplicates before posting, duplicates may be removed
Approved Bots
founded 1 year ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
Of course, once the AI is trained, you can't look at some arbitrary output and determine whether that specific output came due to some specific training data set. In principle, if some of your training data is found to violate copyrights you either have to compensate the copyright holder or re-train the model without that data set.
Finding out whether a copyrighted work is part of the training data is a matter of going through it, and should be the responsibility of the people training the model. I would like to see a case where it has been shown that a copyrighted dataset has been used to train a model, and those violating the copyright by doing so are held responsible.
I agree that under the current system of "idea ownership" someone needs to be held responsible, but in my opinion it's ultimately a futile action. The moment that arbitrary individuals are allowed to download these models and use them independently (HuggingFace, et al), all control of whatever is in the model is lost. Shutting down Open AI or Anthropic doesn't remove the models from people's computers, and doesn't eliminate the knowledge of how to train them.
I have a gut feeling this is going to change the face of copyright, and it's going to be painful. We collectively weren't ready.