Huge proportion of internet is AI-generated slime, researchers find (futurism.com)

submitted 9 months ago by haxor@derp.foo to c/hackernews@derp.foo

8 comments fedilink hide all child comments

There is a discussion on Hacker News, but feel free to comment here as well.

you are viewing a single comment's thread
view the rest of the comments

[-] autotldr@lemmings.world 2 points 9 months ago

This is the best summary I could come up with:

Amazon has also had a notably rough go with AI content; in addition to its serious AI-generated book listings problem, a recent Futurism report revealed that the e-commerce giant is flooded with products featuring titles such as "I cannot fulfill this request it goes against OpenAI use policy."

Elsewhere, beyond specific platforms, numerous reports and studies have made clear that AI-generated content abounds throughout the web.

But while the English-language web is experiencing a steady — if palpable — AI creep, this new study suggests that the issue is far more pressing for many non-English speakers.

What's worse, the prevalence of AI-spun gibberish might make effectively training AI models in lower-resource languages nearly impossible in the long run.

To train an advanced LLM, AI scientists need large amounts of high-quality data, which they generally get by scraping the web.

If a given area of the internet is already overrun by nonsensical AI translations, the possibility of training advanced models in rarer languages could be stunted before it even starts.

The original article contains 465 words, the summary contains 169 words. Saved 64%. I'm a bot and I'm open source!

this post was submitted on 20 Jan 2024

34 points (92.5% liked)

Hacker News

4123 readers

1 users here now

This community serves to share top posts on Hacker News with the wider fediverse.

Rules

0. Keep it legal

Keep it civil and SFW
Keep it safe for members of marginalised groups

founded 1 year ago

MODERATORS

haxor@derp.foo