this post was submitted on 22 Jun 2023
16 points (100.0% liked)
Technology
37801 readers
220 users here now
A nice place to discuss rumors, happenings, innovations, and challenges in the technology sphere. We also welcome discussions on the intersections of technology and society. If it’s technological news or discussion of technology, it probably belongs here.
Remember the overriding ethos on Beehaw: Be(e) Nice. Each user you encounter here is a person, and should be treated with kindness (even if they’re wrong, or use a Linux distro you don’t like). Personal attacks will not be tolerated.
Subcommunities on Beehaw:
This community's icon was made by Aaron Schneider, under the CC-BY-NC-SA 4.0 license.
founded 2 years ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
If Reddit just charged the AI people for API access and left 3rd party apps alone I doubt anyone would have given a shit, but they had to go and two-birds-with-one-stone it. Then they insisted on digging their hole deeper by running their mouths and making the situation worse.
They would have gone straight to scraping if they couldn't reach a deal. Sam Altman is on the board of reddit. He knows which way the wind blows there.
LLMs are already relying on web scraping and always have. They are getting data from the entire Internet, do people really think OpenAI is doing individual integrations with every single website throughout the Internet?! Are Google and Bing doing that, too?
It's complete FUD.
There may be some complexity with legality here though. Obviously Google and other search engines already have most of Reddit's content indexed, but there are some legal arguments as to whether they can use the content to create derivative works.
If Reddit opens up its API and specifically allows AI companies to use the content to create LLMs and other AI tools then from a legal point of view they may find this much more preferable to facing potential legal action further down the road.
Reddit could reach the same agreemen without an API, too.
I suspect they have signed an exclusivity deal with some kind of third party to use the API. It could be for "AI" or it could be for more nefarious purposes.
That's why it's important to go back thru our comment history and replace them with linguistic garbage. To ensure Reddit can't profit off our donations. I'm not in the business of subsidizing Reddit, after all.
"Plonked up behind the radio them ready the plastic manuscript who observe Jerry's can." Or whatever.
If I were implementing this nefarious Reddit I probably wouldn't have edits wipe out the original data. It's certainly not necessary to implement edits that way.
In fact, the editing log itself can be used as more data.
We actually know for a fact they don't do it that way, since Reddit has already been caught undoing peoples "delete" edits after they've gone