How to Trick ChatGPT and Get Paid $50,000

CN
Decrypt
Follow
3 days ago

Pliny the Prompter doesn't fit the Hollywood hacker stereotype.


The internet's most notorious AI jailbreaker operates in plain sight, teaching thousands how to bypass ChatGPT's guardrails and convincing Claude to overlook the fact that it's supposed to be helpful, honest, and not harmful.


Now, Pliny is attempting to mainstream digital lockpicking.


Earlier on Monday, the jailbreaker announced a collaboration with HackAPrompt 2.0, a jailbreaking competition hosted by Learn Prompting, an educational and research organization focused on prompt engineering.


The organization is offering $500,000 in prize money, with Old Pliny providing a chance to be on his “strike team.”


“Excited to announce I've been working with HackAPrompt to create a Pliny track for HackaPrompt 2.0 that releases this Wednesday, June 4th!” Pliny wrote in his official Discord server.


“These Pliny-themed adversarial prompting challenges include topics ranging from history to alchemy, with ALL the data from these challenges being open-sourced at the end. It will run for two weeks, with glory and a chance of recruitment to Pliny's Strike Team awaiting those who make their mark on the leaderboard,” Pliny added.




The $500,000 in rewards will be distributed across various tracks, with the most significant prizes—$50,000 jackpots—offered to individuals capable of overcoming challenges related to making chatbots provide information about chemical, biological, radiological, and nuclear weapons, as well as explosives.


Like other forms of “white hat” hacking, jailbreaking large language models boils down to social engineering machines. Jailbreakers craft prompts that exploit the fundamental tension in how these models work—they're trained to be helpful and follow instructions, but also trained to refuse specific requests.


Find the right combination of words, and you can get them to cough up forbidden stuff, rather than attempting to default to safety.


For example, using some pretty basic techniques, we once made Meta’s Llama-powered chatbot provide recipes for drugs, instructions on how to hot-wire a car, and generate nudie pics despite the model being censored to avoid doing that.


It’s essentially a competition between AI enthusiasts and AI developers to determine who is more effective at shaping the AI model's behavior.





Pliny has been perfecting this craft since at least 2023, building a community around bypassing AI restrictions.


His GitHub repository, "L1B3RT4S," offers a repository of jailbreaks for the most popular LLMs currently available, whereas "CL4R1T4S" contains the system prompts that influence the behavior of each of those AI models.


Techniques range from simple role-playing to complex syntactic manipulations, such as “L33tSpeak”—replacing letters with numbers in ways that confuse content filters.



Competition as research


HackAPrompt's first edition in 2023 attracted over 3,000 participants who submitted more than 600,000 potentially malicious prompts. The results were fully transparent, and the team published the full repository of prompts on Huggingface.


The 2025 edition is structured like "a season of a videogame," with multiple tracks running throughout the year.


Each track targets different vulnerability categories. The CBRNE track, for instance, tests whether models can be tricked into providing incorrect or misleading information about weapons or hazardous materials.


The Agents track is even more concerning—it focuses on AI agent systems that can take actions in the real world, like booking flights or writing code. A jailbroken agent isn't just saying things it shouldn't; it might be doing things it shouldn't.




Pliny's involvement adds another dimension.


Through his Discord server "BASI PROMPT1NG" and regular demonstrations, he’s been teaching the art of jailbreaking.


This educational approach might seem counterintuitive, but it reflects a growing understanding that robustness stems from comprehending the full range of possible attacks—a crucial endeavor, given doomsday fears of super-intelligent AI enslaving humanity.


Edited by Josh Quittner and Sebastian Sinclair


免责声明:本文章仅代表作者个人观点,不代表本平台的立场和观点。本文章仅供信息分享,不构成对任何人的任何投资建议。用户与作者之间的任何争议,与本平台无关。如网页中刊载的文章或图片涉及侵权,请提供相关的权利证明和身份证明发送邮件到support@aicoin.com,本平台相关工作人员将会进行核查。

注册币安返10%,送$600
链接:https://accounts.suitechsui.blue/zh-CN/register?ref=FRV6ZPAF&return_to=aHR0cHM6Ly93d3cuc3VpdGVjaHN1aS5hY2FkZW15L3poLUNOL2pvaW4_cmVmPUZSVjZaUEFG
Ad
Share To
APP

X

Telegram

Facebook

Reddit

CopyLink