AI Chatbot – Hackers Hiding Malware in Images Served by LLMs
As AI tools become more embedded in everyday workflows, the security risks surrounding them are taking new forms. Researchers at Trail of Bits recently demonstrated an attack technique where malicious prompts can be hidden inside images, only to be revealed during processing by large language models (LLMs).
The exploit leverages how AI systems downscale images for efficiency. During resizing, subtle patterns invisible in the original file emerge as legible instructions that the model interprets. This idea builds on a 2020 study from TU Braunschweig, which first identified image scaling as a potential attack surface.
Trail of Bits showed that carefully crafted images could manipulate platforms such as Gemini CLI, Vertex AI Studio, Google Assistant on Android, and Gemini’s web interface. In one proof-of-concept, Google Calendar data was exfiltrated to an external email account without the user’s consent.
The method relies on interpolation techniques like nearest neighbour, bilinear, or bicubic resampling. When an image is intentionally designed, downscaling introduces aliasing artifacts that reveal hidden text. For example, bicubic resampling caused dark regions to shift and expose concealed black lettering, which the LLM then treated as genuine user input.
To the user, nothing unusual is visible, the system simply executes instructions embedded within the image alongside their legitimate queries.
To highlight the risk, Trail of Bits released Anamorpher, an open-source tool that generates such adversarial images across different scaling methods. While specialised, the technique could be reproduced by others if defences remain weak.
The implications are serious: multimodal AI systems are now deeply integrated into calendars, communication apps, and workflow platforms. A single image upload could potentially trigger data leaks or even identity theft if sensitive information is exfiltrated.
Mitigation requires practical safeguards such as restricting input dimensions, previewing downscaled images, and requiring explicit user approval for sensitive actions. Traditional defences like firewalls are not designed to catch this type of manipulation, leaving a security gap.
Ultimately, the researchers emphasise that layered protections and secure design practices are the most reliable way forward. As they conclude, the strongest defence is to adopt systematic safeguards against all forms of prompt injection, not just multimodal variants.