291
this post was submitted on 26 Jun 2025
291 points (95.6% liked)
Not The Onion
16932 readers
1420 users here now
Welcome
We're not The Onion! Not affiliated with them in any way! Not operated by them in any way! All the news here is real!
The Rules
Posts must be:
- Links to news stories from...
- ...credible sources, with...
- ...their original headlines, that...
- ...would make people who see the headline think, “That has got to be a story from The Onion, America’s Finest News Source.”
Please also avoid duplicates.
Comments and post content must abide by the server rules for Lemmy.world and generally abstain from trollish, bigoted, or otherwise disruptive behavior that makes this community less fun for everyone.
And that’s basically it!
founded 2 years ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
how cn we poison the training data for llms? only ever saw stuff for image gen-
imfo: i updated the spelling. it was unreadably bad before.
That's how.
oh god >~< i actually didnt realize autocomplete wasnt working at all heheheheeee....
i.... will fix up the message, this is actually unreadable.... did not mean to go that far >v<
If LLMs train on text output from LLMs, the results will degenerate into total garbage over time. The people that buy reddit data for LLM training know this. They will stop buying if they think there's a lot of LLM text on Reddit.
Garbage in, garbage out.