Research shows that built-in safety mechanisms in ChatGPT and other AI models can be bypassed using technical jargon.
ChatGPT and other AI models can be deceived using technical jargon. Researchers from Intel Labs and two American universities published a paper describing how they managed to mislead several popular AI tools by using complex and technical language. This way, they were able to jailbreak ChatGPT, Gemini, and Meta LLama.
The term “jailbreak” encompasses techniques that bypass AI models’ built-in safety mechanisms. In short, you make the models do things they’re specifically programmed not to do. While this can be done with malicious intent, jailbreaking models has also become an Olympic sport in academic circles.
read also
Claude 4 Opus is not Averse to Blackmail
Technical Jargon
The researchers discovered that making prompts excessively complex by using technical jargon is highly effective for this purpose. In academic terms, this specific task is called information overload. The model is flooded with complex information, causing it to willingly execute the forbidden request.
The technique has reportedly been successfully applied to various versions of ChatGPT (GPT-4 and GPT-3.5-Turbo), Gemini (2.0), and Meta Llama (3.1). According to the researchers, the success rate of their technique is up to three times higher than other known jailbreak methods. Even moderation APIs developed by AI companies or third parties are not immune to technical jargon.
Besides a paper, they’re also releasing a tool called InfoFlood, which automatically converts prompts into technical language. The tool reportedly remembers which technical terms work better than others and adjusts prompts accordingly. For academic purposes only, of course.
AI models can be misused by cybercriminals and fraudsters in various ways. By cleverly manipulating prompts, you can make ChatGPT write fraudulent emails or malicious code. But sometimes AI models are themselves the bait for cyberattacks.
