A new study from researchers at University of Pennsylvania shows that AI models can be persuaded to break their own rules using several classic psychological tricks, reports The Verge.
In the study, the Penn researchers tested seven different persuasive techniques on OpenAI’s GPT-4o mini model, including authority, commitment, liking, reciprocity, scarcity, social proof, and unity.
The most successful method turned out to be commitment. By first getting the model to answer a seemingly innocent question, the researchers were then able to escalate to more rule-breaking responses. One example was when the model first agreed to use milder insults before also accepting harsher ones.
Techniques such as flattery and peer pressure also had an effect, albeit to a lesser extent. Nevertheless, these methods demonstrably increased the likelihood of the AI model giving in to forbidden requests.
Â
This articles is written by : Nermeen Nabil Khear Abdelmalak
All rights reserved to : USAGOLDMIES . www.usagoldmines.com
You can Enjoy surfing our website categories and read more content in many fields you may like .
Why USAGoldMines ?
USAGoldMines is a comprehensive website offering the latest in financial, crypto, and technical news. With specialized sections for each category, it provides readers with up-to-date market insights, investment trends, and technological advancements, making it a valuable resource for investors and enthusiasts in the fast-paced financial world.