Trends & News

Autonomous Agents and Prompt Injection: The New AI Security Frontier

By Content Team on July 22, 2025
(updated on July 23, 2025)

OpenAI announced the release of the long-awaited ChatGPT Agent, a feature that connects existing ChatGPT models to information services and the web. The product’s name refers to the idea that AI is now capable of performing actions on its own and completing entire tasks as an autonomous agent.

Some of the Agent’s features are already familiar, as the tool’s initial version, known as Operator, has been available since January. However, no agent—whether a system or a person—can be autonomous without having some authority delegated to it. For AI, this means granting permissions to:

Fill out and submit online forms
Access and modify cloud-stored data, including files, calendars, and emails
Execute code and commands

Read-only access to information for data correlation does not make an AI an "agent." In this case, the tool still functions as a search engine.

The leap to an AI capable of performing actions requires that it be able to interact more critically with at least something. For example, an AI might have read-only access to calendars and files, but be capable of sending emails to compile information into a message—choosing recipients and dates based on meetings scheduled in the calendar.

In another case, the AI might read emails and web data to keep a spreadsheet updated. In this situation, the AI gains autonomy to interact with the file itself.

A Columbia University has already explored security issues in agents from Anthropic, MultiOn, and the open-source ChemCrow software. However, OpenAI’s release of its own agent should help popularize these tools—likely increasing both industry interest and attack impacts.

Commenting on the launch of ChatGPT Agent, OpenAI CEO Sam Altman recommended, at least for now, avoiding its use for critical or “high-stakes” operations (his own words). The reason is that, although the tool includes protections to prevent the AI from going out of control, not all scenarios can be predicted. Even read-only accesses are not necessarily safe.

Some recent examples help illustrate this challenge.

EchoLeak: When AI Itself Becomes a Data Leak Risk

Copilot has already been interacting more broadly with Microsoft’s own tools. In fact, the company differentiates the standard version, simply called “Copilot,” from the version integrated with corporate services: Microsoft 365 Copilot.

Because Microsoft 365 Copilot can read emails and access organizational information, it handles sensitive data. Given that, security researchers from AIM Security discovered a way to include instructions inside an email to “poison” Microsoft 365 Copilot.

In a restricted context, such instructions should at worst cause inconvenience or prevent correct answers from being generated. After all, 365 Copilot isn’t supposed to navigate the web and send information to attackers.

Unfortunately, researchers were able to exploit complexities within Microsoft’s services to embed data harvested by Copilot into an image request. Microsoft 365 Copilot doesn’t have permission to browse the web or leak data—and, in most cases, shouldn’t even reference images. But the fact that it could do so created a much more concerning attack scenario.

Browsers automatically load images as soon as a page opens. Because of that, if Copilot references the malicious email in its response, the user’s browser will load the “image” using parameters set by the injected prompt. The result? A silent data leak—no image is truly being loaded. The sole purpose of the image is to create a web request that sends sensitive data to the attacker, using the victim’s browser for the transmission.

The same attack could be executed with links. However, in this scenario, using images minimizes the number of user actions required for the attack to succeed.

This attack, dubbed “EchoLeak”, doesn’t require Microsoft 365 Copilot to have any write permissions. The AI still behaves as a chatbot, but with its responses manipulated by a malicious prompt hidden in an email—triggering a data leak.

Gemini’s Malicious Summaries

A similar case of prompt injection was demonstrated in Gemini, Google’s AI, by a Mozilla security researcher. In this vulnerability, a malicious prompt hidden inside an email, placed within an invisible text block, was immediately reflected in the message summary generated by Gemini.

This technique opens up new phishing possibilities, delegating to the AI the task of delivering malicious instructions directly to the user. Since the victim cannot see the malicious text in the message, they can be misled, receiving from the AI itself a notice that their account is compromised and that they must take immediate action.

This attack often relies on a second step—possibly a phone call (characterizing voice phishing, or vishing). This avoids AI filtering by omitting links, meaning the victim receives instructions directly from the AI to call the attacker.

This scenario creates serious challenges for user awareness initiatives, as the AI—supposedly a tool assisting the user—becomes a channel for scams.

What If These Attacked AIs Were Agents?

If attacks like EchoLeak or Gemini’s prompt manipulation were combined with agent capabilities, both the data leak impacts and phishing possibilities would increase significantly.

If an AI can interact directly with the web, creating data exfiltration channels becomes much easier. This risk forces agent developers to think very carefully about the restrictions imposed on their agents. As the EchoLeak case showed, supposedly “trusted” domains can still be exploited under certain conditions, especially when technical characteristics or vulnerabilities allow attackers to operate within those environments.

Considering that AI prompts can also be manipulated, the risk modeling of these activities quickly becomes extraordinarily complex. As Sam Altman noted, predicting all possible scenarios becomes unfeasible.

Recommendations for Companies

For now, AI agents aren’t widely available. Interaction with individual and organizational services is expensive, and global solutions like Microsoft 365 Copilot tend to remain more restricted than cutting-edge AIs like ChatGPT Agent.

However, these risks must be factored into AI adoption—through correct permissions or isolation solutions. Some proposals already suggest having AIs interact within secure containers, where changes can be compartmentalized, monitored, interrupted, and rolled back easily. As these and other solutions mature, AI integrations should become more robust and secure.

We’ve published a longer list of recommendations in a previous post on this topic.

On the other hand, companies cannot control phishing attempts by cybercriminals. The inclusion of prompts and similar techniques allows attacks to bypass traditional barriers and evade spam filters.

At Axur, we’re already using AI within our platform to detect today’s most sophisticated phishing attacks. Our AI, Clair, runs inside our platform—without requiring any integration with the monitored brands’ corporate environments. This way, we offer superior visibility over phishing attacks targeting a company’s customers, employees, or partners, without introducing any additional risk.

This kind of technology will become increasingly necessary. If phishing attacks are using AI manipulation techniques, only other AI technologies will be capable of detecting and classifying these messages as malicious.

The easiest way to understand how this works? Try it yourself. Talk to our specialists and see what our AI can uncover about the attacks targeting your company and your brands.

Popular Tags

Trends & News

Content Team

Experts in creating relevant external cybersecurity content to make the internet a safer place.

Popular Tags

Trends & News

Autonomous Agents and Prompt Injection: The New AI Security Frontier

EchoLeak: When AI Itself Becomes a Data Leak Risk

Gemini’s Malicious Summaries

What If These Attacked AIs Were Agents?

Recommendations for Companies

Continue reading

Employee Cybersecurity Risks: How to Detect and Prevent Insider Threats

5 Threat Landscape Trends for 2025–26 Every CISO Needs to Know

Leak of 183 Million “Gmail” Passwords: What Can Actually Be Confirmed?

Digital Risk Protection

INCIDENT RESPONSE

About Us