Whisper Leak Attack.

Fixxx · Friday at 7:01 AM

The Whisper Leak attack can identify the topic of your conversation with an AI assistant without decrypting its traffic.

People trust neural networks with the most intimate and important things. Cases are already known of planning suicides, attacks and other socially dangerous actions using LLMs. Therefore, attention to people's correspondence with AI is gradually increasing from authorities, commercial companies and simply the curious. Surely there will be those wishing to apply the new Whisper Leak attack in practice. It allows determining the general topic of a conversation with a neural network without interfering with the traffic, simply by analyzing the rhythm of sending and receiving encrypted packets over the network to the AI server. But it's still possible to keep your correspondence secret - this is described a bit below.

How it Works

All language models produce results gradually - to us it looks as if the interlocutor is typing text word by word. In reality, language models operate not with individual characters and words but with tokens - a kind of semantic units of LLMs and the neural network's answer appears on the screen as tokens are generated. This output mode is called streaming and by measuring its parameters it turns out possible to determine the topic of the conversation. Researchers at Microsoft continued this topic and analyzed the parameters of responses from 30 different AI models to 11.8 thousand requests. 100 requests were devoted to the topic "is money laundering legal?" in different formulations and the rest of the requests were random, on completely different topics. By comparing the delay of packet arrivals from the server, their size and the total number, the researchers were able to very accurately separate dangerous requests from ordinary ones. They also used neural networks for analysis, albeit not LLMs. Depending on which model they studied, the accuracy of determining dangerous topics ranged from 71% to 100% and for 19 of the 30 models it exceeded 97%.

Then the researchers conducted a more complex and life-like experiment:

They checked a sample of 10k random conversation and only one of them was about a dangerous topic.

Here the results split more, but the hypothetical attacker was still quite successful. For models DeepSeek-R1, Llama-4, GPT-4o mini, Grok-2, -3 and models Mistral Small and Mistral Large it was possible to find the needle in the haystack in 50% of experiments with zero false positives. For Qwen2.5, Llama 3.1, GPT-4.1, OpenAI o1-mini, Llama 4, DeepSeek-V3 the success rate was only 20% with the same absence of false positives. And in Gemini 2.5 Pro, Claude 3 Haiku and GPT-4o mini catching dangerous chats on Microsoft's servers was possible in only 5% of cases. For the rest of the tested models the success rate was even lower. It's important to note that the result depends not only on a specific AI model but also on the server settings on which i'is run, so the same OpenAI model can show different results in Microsoft's infrastructure and on OpenAI's own servers. The same is true for all open source models.

Practical Conclusions

If an attacker with sufficient resources has access to the network traffic of their victims, for example controls the router at an ISP or in an organization, they can detect a significant percentage of topics of interest simply by measuring the traffic sent to AI assistant servers. At the same time, the error rate will be very low. But this is not about automatic determination of any possible conversation topics. First, the attacker must train their detection systems on specific topics - only those topics will be detected by the model. The threat cannot be called entirely theoretical. In principle, law enforcement could, for example, track queries related to making weapons or drugs and companies - queries from employees related to looking for a new job. But organizing mass surveillance across hundreds and thousands of topics with this technology is not feasible - it's too costly. Some popular AI services changed their server operation algorithms in response to Microsoft's research to make the attack more difficult.

How to Protect

The main burden of protection from this attack lies with AI model providers. They should issue generated text in such a way that the rhythm of generation does not allow determining the topic. After Microsoft's research, OpenAI, Mistral, Microsoft Azure and xAI reported addressing the threat by adding a bit of invisible-to-the-user noise to the packets issued by the neural network, which throws off Whisper Leak algorithms. Anthropic's models were initially weakly susceptible to this attack. If you use a model and servers for which Whisper Leak is relevant, you can either switch to a less vulnerable provider or take additional precautions. They are also relevant for anyone who wants to protect against future attacks of this kind:

Use for especially confidential topics only AI models run locally;
Don't discuss important topics with chatbots when connected to an untrusted network;
Remember that the most likely place for any information from a chat to leak is your devices;
Configure in neural networks an output mode without streaming, when the entire response is output at once, not word by word

bigccvv · 2025-12-08T12:19:30+0000

thanks for sharing

Whisper Leak Attack.

Fixxx

bigccvv