Researchers train AI chatbots to 'jailbreak' rival chatbots - and automate the process (www.tomshardware.com)

submitted 10 months ago by throws_lemy@lemmy.nz to c/technology@beehaw.org

8 comments fedilink hide all child comments

you are viewing a single comment's thread
view the rest of the comments

[-] rufus@discuss.tchncs.de 1 points 10 months ago* (last edited 10 months ago)

Yeah, I don't want to be negative, but half the article is a bit stupid. I hope they don't do that. I tried writing a murder mystery story and ChatGPT would lecture me how killing people was immoral instead of helping. It's ridiculous and I'm sure there are lots of other analogies. It's neither possible to achive it 100% nor is it useful.

Thinking it through properly: AI is a tool. It would be like re-designing a knife so nobody can be stabbed anymore. It'd end up you not being able to cut pineapples or melons any more.

And I could still do harm to people with other tools than a knife. Or in this example: I can give harmful advice or write a pornographic story myself. What's the benefit of any chatbot maker having to implement protections? Who decides on what moral is the correct one?

I think the correct approach is to study AI safety and expose ethics and make it controllable. Make users able to constrain/restrict or guide output to align with their use-case. I mean a company that replaces their helpdesk with AI would be interested the chatbot doesn't tell their clients lewd stories. But it could be a valid use-case for other people. And giving advice or helping with scenarios or computer code also involves talking about issues and potential risks. You can't entirely switch that off without 'lobotomizing' the AI and making it unusable except for casual talk.

And the article is a bit inconsistent. First they say researchers found an attack that can be used even if patched by developers. And then they offer the solution to patch it...?! Which one is it, then?

this post was submitted on 01 Jan 2024

48 points (100.0% liked)

Technology

37705 readers

81 users here now

A nice place to discuss rumors, happenings, innovations, and challenges in the technology sphere. We also welcome discussions on the intersections of technology and society. If it’s technological news or discussion of technology, it probably belongs here.

Remember the overriding ethos on Beehaw: Be(e) Nice. Each user you encounter here is a person, and should be treated with kindness (even if they’re wrong, or use a Linux distro you don’t like). Personal attacks will not be tolerated.

Subcommunities on Beehaw:

This community's icon was made by Aaron Schneider, under the CC-BY-NC-SA 4.0 license.

founded 2 years ago

MODERATORS

alyaza@beehaw.org

TheRtRevKaiser@beehaw.org

gyrfalcon@beehaw.org

rs5th@beehaw.org

coldredlight@beehaw.org

Los@beehaw.org

SemioticStandard@beehaw.org

TheRtRevKaiser@kbin.social

remington@beehaw.org