this post was submitted on 29 May 2024
39 points (100.0% liked)

technology

23872 readers
267 users here now

On the road to fully automated luxury gay space communism.

Spreading Linux propaganda since 2020

Rules:

founded 5 years ago
MODERATORS
 

Consider https://arstechnica.com/robots.txt or https://www.nytimes.com/robots.txt and how they block all the stupid AI models from being able to scrape for free.

you are viewing a single comment's thread
view the rest of the comments
[โ€“] vk6flab@lemmy.radio 31 points 1 year ago (12 children)

The robots.txt construct is completely voluntary and some bots use it to specially target content.

In my opinion, anyone relying on this to protect their content has no business publishing anything online.

See: https://en.m.wikipedia.org/wiki/Robots.txt

[โ€“] neo@hexbear.net 10 points 1 year ago (8 children)

Of course it's voluntary, but if entities like OpenAI say they will respect it then presumably they really will.

load more comments (7 replies)
load more comments (10 replies)