For all its guardrails and safety protocols, Google’s Gemini large language model (LLM) is as susceptible as its counterparts to attacks that could cause it to generate harmful content, disclose sensitive data, and execute malicious actions.
HiddenLayer’s tests — largely run on Gemini Pro — are part of ongoing vulnerability research the company has been conducting on different AI models. As the company’s associate threat researcher Kenneth Yeung explains, the vulnerabilities are not unique to Google’s Gemini and are present in most LLMs, with varying degrees of impact.
The first security issue that HiddenLayer tested for in Gemini was susceptibility to system prompt leakage. System prompts are essentially the initial prompts or instructions provided to an LLM to set up its behavior, persona, and constraints on what it can or cannot generate.
To help protect our users from vulnerabilities, we consistently run red-teaming exercises and train our models to defend against adversarial behaviors like prompt injection, jailbreaking, and more complex attacks