this post was submitted on 15 Dec 2023
502 points (97.9% liked)
memes
10226 readers
2240 users here now
Community rules
1. Be civil
No trolling, bigotry or other insulting / annoying behaviour
2. No politics
This is non-politics community. For political memes please go to !politicalmemes@lemmy.world
3. No recent reposts
Check for reposts when posting a meme, you can only repost after 1 month
4. No bots
No bots without the express approval of the mods or the admins
5. No Spam/Ads
No advertisements or spam. This is an instance rule and the only way to live.
Sister communities
- !tenforward@lemmy.world : Star Trek memes, chat and shitposts
- !lemmyshitpost@lemmy.world : Lemmy Shitposts, anything and everything goes.
- !linuxmemes@lemmy.world : Linux themed memes
- !comicstrips@lemmy.world : for those who love comic stories.
founded 1 year ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
It's interesting, because people say they can only get better, but I'm not sure that's true. What happens when most new text data is being generated by LLMs or we accidentally start labeling images created through diffusion as real. Seems like there is a potential for these models to implode.
They actually tested that, trained a model using only the outputs of the previous generation of model. It takes less iterations of that to completely lose quality than you'd think.
Do you have any links on that, it was something I had wanted to explore, but never had the time or money.
They go insane pretty quickly don't they? As in it all just become a jumble.
Given that people quite frequently try and present AI generated content as real, I'd say this will be a huge problem in the future.
Microsoft has shown with Phi-2 (https://www.microsoft.com/en-us/research/blog/phi-2-the-surprising-power-of-small-language-models/) that synthetic data generation can be a great source for training data.