this post was submitted on 29 May 2024
39 points (100.0% liked)

technology

23308 readers
345 users here now

On the road to fully automated luxury gay space communism.

Spreading Linux propaganda since 2020

Rules:

founded 4 years ago
MODERATORS
 

Consider https://arstechnica.com/robots.txt or https://www.nytimes.com/robots.txt and how they block all the stupid AI models from being able to scrape for free.

you are viewing a single comment's thread
view the rest of the comments
[โ€“] vk6flab@lemmy.radio 31 points 5 months ago (12 children)

The robots.txt construct is completely voluntary and some bots use it to specially target content.

In my opinion, anyone relying on this to protect their content has no business publishing anything online.

See: https://en.m.wikipedia.org/wiki/Robots.txt

[โ€“] neo@hexbear.net 10 points 5 months ago (8 children)

Of course it's voluntary, but if entities like OpenAI say they will respect it then presumably they really will.

load more comments (7 replies)
load more comments (10 replies)