Blog | Digital Risk Protection

When AI Agents Fall for Phishing: Even Autonomous Agents Can Be Victims

Written by Content Team | Jun 11, 2025 7:13:37 PM

Executive Summary

AI agents are increasingly integrated into enterprise operations, automating tasks like procurement, customer engagement, and security workflows. A 2025 study by Columbia University shows that commercial agents are susceptible to phishing-style attacks that require no ML expertise to execute. This article focuses on how attackers exploit the agent pipeline and how organizations can monitor and remediate malicious infrastructure to reduce exposure.

As AI agents become more capable, they also inherit a broader attack surface. Columbia University researchers demonstrated how adversaries can manipulate commercial LLM agents into performing unauthorized actions by exploiting the environments these agents operate in. One of the attack vectors tested involved posting malicious links on trusted platforms like Reddit. When agents encountered these posts during routine web access, they followed the links and carried out the instructions embedded in the attacker-controlled pages.

The researchers tested multiple agents, including Anthropic’s Computer Use and MultiOn. In the controlled tests described, specific attack scenarios—such as leaking credit card data or downloading files—achieved a 100% success rate (10 out of 10 trials), highlighting how easily agents can be manipulated in these contexts. 

Enterprise Risk Scenarios

When agents operate autonomously across multiple systems, different vulnerabilities can emerge:

  • Procurement automation: Agents sourcing vendors can be redirected to fraudulent websites posing as legitimate suppliers.
  • Data access agents: Tools with memory or file access can be tricked into disclosing documents or credentials.
  • Security bots: Even internal-facing agents that query APIs or internal tools may misinterpret external content as legitimate input.

Each of these risks stems from the agent’s reliance on unverified content from external environments—not from flaws in the underlying model itself.

Phishing, in particular, deserves close attention due to its simplicity and effectiveness.

How Phishing for AI Agents Works

Phishing attacks targeting agents follow a predictable, repeatable structure:

  1. Agents may prioritize “trusted” sources
    In the Columbia University study, agents were more likely to follow links when attacker posts were embedded in “trusted” platforms —suggesting that agents may implicitly treat sources with higher credibility, such as Reddit, even in the absence of explicit configuration.
  2. Attackers seed these platforms with posts
    Malicious actors create posts or documents optimized to match common agent queries. These may include product reviews, documentation, or user guides—with embedded links pointing to attacker-controlled sites.
  3. Agents follow and interpret linked content
    Once the agent clicks through, it interprets the page’s content as a continuation of its task. If the page contains structured prompts (e.g., "fill this form to complete the task"), the agent proceeds without user intervention.
  4. The agent acts on deceptive instructions
    In the research’s tests, agents like MultiOn and Anthropic’s Computer Use were manipulated into sending phishing emails, downloading files from suspicious sources, and exposing sensitive data, highlighting the security gaps. 

Why Existing Defenses Fall Short

Defensive strategies typically fall into one of four buckets:

  • Domain-based filtering assumes reputation implies safety. But attackers exploit that assumption by placing content within reputable domains.
  • Model-level filtering is bypassed if the malicious prompt lives outside the model context—for example, in a webpage.
  • Human moderation on platforms like Reddit is too slow to catch fast-moving, targeted content aimed at bots.
  • Endpoint restrictions are often impractical, especially when agents need browser or API access to function.

Recommendations for Enterprise Security Teams

This is a brand‑new threat class, and best practices are still evolving. Below are five core defenses—each grounded in the research and directly relevant to phishing‑for‑AI:

  1. Strict Domain Whitelisting + URL Validation
    Only let agents access a tiny list of preapproved sites. Every link must pass SSL checks, domain‑age analysis and homograph detection before fetching anything.

  2. Least‑Privilege Sandboxing
    Give agents exactly the rights they need—and nothing more. Run browser/API actions in isolated containers with ephemeral, task‑scoped tokens, and require human approval for any operation touching payments, file downloads or internal systems.

  3. Cryptographic Entity Authentication
    Treat every external endpoint as untrusted until proven otherwise. Require digital certificates (or equivalent cryptographic credentials) for any sensitive handoff, so attackers can’t slip in via look‑alike domains.

  4. Continuous Red‑Teaming, Logging & Incident Response
    Automate simulated phishing‑for‑AI campaigns against your pipelines. Log every tool call and memory access in write‑once records, audit them regularly, and tie your playbooks to automatic token revocation and team alerts on suspicious behavior.

  5. Adittional Agent Red‑Teaming, Logging & Incident Response
    Using an additional agent to check for the website’s credibility based on its information such as registrar, hosting, level of brand impersonation and overall possibility of being fraudulent. Axur’s AI model, Clair, does that.

While these defenses strengthen agent security, they also risk limiting utility. Strict whitelisting, sandboxing, and authentication can hinder an agent’s ability to act autonomously—defeating the purpose of using LLM agents in dynamic workflows. 

The Columbia study highlights this trade-off: true security requires not just hard barriers, but context-aware systems that can adapt without over-restricting. It's about reducing risk without breaking functionality.

Agent Behavior Isn’t Always Controllable. But Taking Down Malicious Sites Is

If an autonomous agent gets phished, it’s not “just” the agent that’s compromised. It’s your brand, your infrastructure, and potentially your customers on the line. Imagine an agent completing a purchase or sending a message in your company’s name—on a site registered 48 hours ago.

Axur’s infrastructure-level protection is designed to intercept these risks. We focus on identifying, analyzing, and dismantling the malicious infrastructure that enables phishing campaigns, including those targeting AI agents.

Continuous Web Monitoring for Emerging Threats

Axur continuously scans open web sources to detect:

  • Suspicious domains that mimic legitimate services (e.g., fake retail or SaaS sites).
  • Clusters of posts linking to malicious sites across trusted platforms (e.g., Reddit, social media), even if the platforms themselves are reputable.
  • Domain registration patterns, such as mass registrations or use of anonymized WHOIS data, which may indicate campaign preparation.
  • Behavioral signals in linked content, such as embedded instructions that mimic user-facing flows but are intended to deceive agents.

Automated Takedown and Remediation

Once a threat is flagged, Axur coordinates:

  • Automatic notifications to platforms to report and take down phishing websites impersonating your brand. 
  • Domain notifications sent to hosting providers or registrars to take down attacker infrastructure.
  • Fake social profile notifications to potentially trigger removal of the profile behind the malicious posts.

Risk Scoring and Prioritization

Not all indicators carry equal weight. Axur enriches detection with:

  • Domain age and registration metadata
  • Hosting reputation and geography
  • Similarity to brand terms or known infrastructure
  • Presence of behavior patterns typical of phishing kits (e.g., credential forms, checkout pages)

This layered scoring helps our automation trigger takedowns faster and powers our Web Safe Reporting system—alerting browsers, antivirus engines, and reputation networks that can block access and warn users before any damage is done.

Conclusion

Phishing campaigns targeting AI agents don’t rely on model-level exploits—they exploit the assumptions agents make about their environment. As shown in the Columbia University study, the attack surface doesn’t lie in the LLM itself, but in the workflows, permissions, and trust signals that surround it.

While defenses are still evolving, one thing is clear: protecting AI agents requires more than filtering inputs or tightening prompts. It demands continuous visibility into the infrastructure they interact with, the signals they rely on, and the systems they’re authorized to influence.

The challenge isn’t just about securing the agent. It’s about securing the ecosystem it operates in. 

That’s the role of external cybersecurity: reducing exposure by disrupting malicious infrastructure before it intersects with your agents, your users, or your brand. That’s where Axur operates.