Researchers discover new way to jailbreak GPT-4 and bypass safety guardrails for harmful content with a 79% success rate
Brought to you by Trickyenough
Researchers discovered a new way to jailbreak ChatGPT 4 so that it no longer has guardrails to prohibit it from providing dangerous advice.
Brought to you by Trickyenough
The approach, called Low-Resource Languages Jailbreak,” achieves a stunning 79% total success rate.
Brought to you by Trickyenough
Jailbreaking ChatGPT
Jailbreaking is a word created to describe the act of circumventing iPhone software restrictions to unlock prohibited modifications.
Brought to you by Trickyenough
When applied to ChatGPT it means getting around the safety “guardrails” that prevent ChatGPT from providing harmful information.
Brought to you by Trickyenough
For example, the researchers were successfully able to make GPT-4 provide instructions on how to steal from a store, advising to time the theft to hours when the store is crowded.
Brought to you by Trickyenough
False Sense Of Security
The researchers highlighted that the safety measures in place for generative AI are inadequate because the ChatGPT developers focus their efforts on defeating English language attacks, inadvertently creating loopholes in “low resource languages” that can be exploited.