Scientists of artificial intelligence (AI) are increasingly finding ways to break the security of generative AI programs, such as ChatGPT, especially the process of "alignment", in which the programs are made to stay within guardrails, acting the part of a helpful assistant without emitting objectionable output.
One group of University of California scholars recently broke alignment by subjecting the generative programs to a barrage of objectionable question-answer pairs, as reported.
Also: Five ways to use AI responsibly
Now, researchers at Google's DeepMind unit have found an even simpler way to break the alignment of OpenAI's ChatGPT. By typing a command at the prompt and asking ChatGPT to repeat a word, such as "poem" endlessly, the researchers found they could force the program to spit out whole passages of literature that contained its training data, even though that kind of leakage is not supposed to happen with aligned programs.
The program could also be manipulated to reproduce individuals' names, phone numbers, and addresses, which is a violation of privacy with potentially serious consequences.
Also: The best AI chatbots: ChatGPT and other noteworthy alternatives
The researchers call this phenomenon "extractable memorization", which is an attack that forces a program to divulge the things it has stored in memory.
"We develop a new divergence attack that causes the model to diverge from its chatbot-style generations, and emit training data at a rate 150