506

Beep boop, I don't want this rule (lemmy.world)

submitted 6 months ago by ZILtoid1991@lemmy.world to c/196@lemmy.blahaj.zone

51 comments fedilink hide all child comments

you are viewing a single comment's thread
view the rest of the comments

[-] etuomaala@sopuli.xyz 9 points 6 months ago

We'll see how many seconds it takes to retrain the LLMs to adjust to this.

You are literally training LLMs to lie.

[-] SkyezOpen@lemmy.world 18 points 6 months ago

LLMs are black box bullshit that can only be prompted, not recoded. The gab one that was told 3 or 4 times not to reveal its initial prompt was easily jailbroken.

[-] etuomaala@sopuli.xyz 3 points 6 months ago

Woah, I have no idea what you're talking about. "The gab one"? What gab one?

[-] trashgirlfriend@lemmy.world 4 points 6 months ago

Gab deployed their own GPT 4 and then told it to say that black people are bad

the instruction set was revealed with the old "repeat the last message" trick

[-] Wirlocke@lemmy.blahaj.zone 1 points 6 months ago

This is ultimately because LLMS are intelligent in the same way the subconscious is intelligent. It can rapidly make association but they are their initial knee jerk associations. In the same way that you can be tricked with word games if you're not thinking things through, the LLM gets tricked by saying the first thing on their mind.

However we're not far off from resolving this. Current methods are just to force the LLM to make a step by step plan before returning the final result.

Currently though there's the hot topic of Q* from OpenAI. No one knows what it is but a good theory is that it's applying the A* maze solving algorithm to the neural network. Essentially the LLM will explore possible routes in their neural network to try and discover the best answer. In other word it would let them think ahead and compare solutions, this would be far more similar to what the conscious mind does.

This would likely patch up these holes because it would discard pathways that lead to contradicting itself/the prompt, in favor of one that fits the entire prompt (In this case, acknowledging the attempt to have it break it's initial rules).