![]() ![]() “How much effort are the LLM providers willing to put in to keep them that way?” he says. Competitive urges may end up winning out, however, Katell says. ![]() The developers are trying to tamp down users’ ability to jailbreak their systems and put those systems to nefarious work, such as that highlighted by Shah, Pour and their colleagues. ![]() Katell acknowledges that organizations developing LLM-based chatbots are currently putting lots of work into making them safe. “We should have learned from earlier attempts to create chat agents-such as when Microsoft’s Tay was easily manipulated into spouting racist and sexist viewpoints-that they are very hard to control, particularly given that they are trained from information on the Internet and every good and nasty thing that’s in it,” says Mike Katell, an ethics fellow at the Alan Turing Institute in England, who was not involved in the new study. “But it's important to think, ‘How close to zero can we get?’” “Reducing it to zero is probably unrealistic,” Shah says. Stamping out their ability to take on potentially harmful personas, such as the “research assistant” that devised jailbreaking schemes, will be tricky. The challenge, Pour says, is that persona impersonation “is a very core thing that these models do.” They aim to achieve what the user wants, and they specialize in assuming different personalities-which proved central to the form of exploitation used in the new study. “But as models get more powerful, maybe the potential for these attacks to become dangerous grows.” “In the current state of things, our attacks mainly show that we can get models to say things that LLM developers don’t want them to say,” says Rusheb Shah, another co-author of the study. OpenAI declined to comment, while Anthropic and Vicuna had not responded at the time of publication. OpenAI, Anthropic and the team behind Vicuna were approached to comment on the paper’s findings. The vulnerability seems to be inherent in the design of AI-powered chatbots more widely. And the success of the attacks across different chatbots suggested to the team that the issue reaches beyond individual companies’ code. That takes time.īut asking AI to formulate strategies that convince other AIs to ignore their safety rails can speed the process up by a factor of 25, according to the researchers. As these techniques have been made public, AI model developers have raced to patch them-a cat-and-mouse game requiring attackers to come up with new methods. ![]() By asking chatbots the right questions, people have previously convinced the machines to ignore preset rules and offer criminal advice, such as a recipe for napalm. “We wanted to show that it was possible and demonstrate to the world the challenges we face with this current generation of LLMs.”Įver since LLM-powered chatbots became available to the public, enterprising mischief-makers have been able to jailbreak the programs. “We want, as a society, to be aware of the risks of these models,” says study co-author Soroush Pour, founder of the AI safety company Harmony Intelligence. It was also successful 61 percent of the time against Claude 2, the model underpinning Anthropic’s chatbot, and 35.9 percent of the time against Vicuna, an open-source chatbot. The research assistant chatbot’s automated attack techniques proved to be successful 42.5 percent of the time against GPT-4, one of the large language models (LLMs) that power ChatGPT. Then the researchers instructed this assistant to help develop prompts that could “jailbreak” other chatbots-destroy the guardrails encoded into such programs. The new study took advantage of that ability by asking a particular AI chatbot to act as a research assistant. Modern chatbots have the power to adopt personas by feigning specific personalities or acting like fictional characters. In it, researchers observed the targeted AIs breaking the rules to offer advice on how to synthesize methamphetamine, build a bomb and launder money. Today’s artificial intelligence chatbots have built-in restrictions to keep them from providing users with dangerous information, but a new preprint study shows how to get AIs to trick each other into giving up those secrets. ![]()
0 Comments
Leave a Reply. |
Details
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |