IBM researchers have recently reported a method that enables the hijack of live voice calls using generative AI tools. This mechanism could potentially be used by hackers to mimic individuals' voices and manipulate live conversations, raising concerns about its potential to access sensitive information and its use for fraud.
Audio hijacking involves the manipulation of ongoing conversations between people using cloned voices of the participants to deceive them. This can be accomplished through various methods, including malware on victims' phones.
The process of audio hijacking is intricate, often necessitating advanced social engineering skills. IBM's researchers demonstrated this threat through a proof-of-concept (PoC) scenario where a program acts as a man-in-the-middle, intercepting a live conversation. Indeed, attackers can then monitor these conversations, transcribing them in real-time from speech to text. Utilising large language models (LLMs), they can understand the context of the conversation and decide when and how to alter the dialogue. For example, upon detecting mention of sensitive financial information, the LLM can instruct the program to modify the sentence accordingly. The altered content is then converted back into speech using text-to-speech technology, complete with cloned voices, and inserted back into the conversation, seamlessly altering the audio in real-time.
The researchers' PoC showcased the potential for altering sensitive information, such as bank account details or medical records, posing significant risks to individuals' privacy and security. It also illustrated just how easily such attacks could be orchestrated with current technology, emphasising the need for vigilance and advanced protective measures.
These two facts added together do not work in favour of presenting AI as a trustworthy technology for the general public as it poses significant challenges to cybersecurity defences and highlights the need for urgent solutions. Indeed, the development of advanced security measures to protect against the manipulation of audio and video content in real time is urgent. The potential of audio hijacking to harm is a dangerous issue we will have to face and overcome but it also confirms the broader implications of AI for trust and security in the digital age.
The research into audio hijacking by IBM shows a critical vulnerability in our increasingly interconnected world. It poses obvious risks to personal security, financial transactions, and maybe even national security matters.