What are AI Agents?
In recent months, AI agents have attracted significant attention by the promise of assisting users and automating complex processes across diverse applications. The rapid performance improvements of Large Language Models (LLMs) in natural language processing (NLP) tasks drive this trend. AI agents leverage LLMs as decision-making cores, building dynamic interaction frameworks around them to autonomously make decisions and execute actions. They orchestrate complex applications by interacting with a wide range of tools and data sources. Figure 1 demonstrates how LLMs are incorporated into AI agents. In the blockchain ecosystem, countless AI agents have emerged that, e.g., trigger smart contracts, manage wallets, analyze on-chain data, or manage social media accounts.
Figure 1: Overview of LLM-based AI agent. [1]
As the capabilities and reach of these agents expand, so do the risks. The rapid pace of development, combined with the intricacies of integrating LLMs into real-world infrastructures—especially in dynamic fields like blockchain—has created an urgent need to scrutinize them for security, compliance, and operational integrity.
Security of AI Agents
There has been extensive research on summarizing and classifying threats to AI agents [2][3][4], yet no common framework or consensus has emerged for AI agent security. This lack of standardization reflects the complexity and range of existing vulnerabilities. Before introducing the additional risks of integrating with blockchain, it is essential to get an overview of the baseline threats facing AI agents. In the following, we tried to summarize the most relevant threats and categorize them:
Model-Centric Attacks:
- Jailbreaking: Bypassing built-in constraints of the LLMs, e.g., to produce toxic content.
- Data Poisoning: Embedding corrupted data to skew the agent’s behavior.
- Model Extraction Attacks: Probing the model to expose internal logic or parameters.
- Adversarial Attacks: Crafting inputs that yield harmful or incorrect outputs.
- Prompt Injection: A particularly relevant form of adversarial attack where inputs are manipulated to override intended directives, often leading the agent to ignore its system prompt and perform unintended actions [5].
Data and Privacy Exploits:
- Privacy Exposure: Leaking keys, user data, or other sensitive operational details.
- Chain-of-Thought Leakage: Coaxing the agent into revealing hidden reasoning steps.
Environment Risks:
- Supply Chain Attacks: Exploiting vulnerabilities in third-party tools, frameworks, or libraries.
- Common Web2 Threats: Adapting traditional web exploits (e.g., injection attacks, SSRF) to target the agent’s infrastructure.
- Common Web3 Threats: Exploiting smart contract or on-chain program vulnerabilities to perform harmful actions.
- Misconfigurations: Improperly configured endpoints, permissions, or network settings that inadvertently grant attackers easier access or control over the agent.
Beyond these direct threats, the complexity of securing AI agents is further compounded by their dynamic, evolving nature. Deng et al. highlight key factors [1] such as:
- Unpredictability of Multi-Step User Inputs: Complex and shifting queries defy static defenses.
- Variability of Operational Frameworks: Diverse platforms, protocols, and tools increase the attack surface.
- Interactions with Untrusted External Entities: Agents must rely on data from potentially compromised sources.
This complex setting renders static defenses insufficient. As attackers refine their methods and discover novel exploits, a holistic, adaptive, and continuous approach to security—attuned to both regulatory developments and technical innovations—is vital to ensuring AI agents remain secure and reliable.
Blockchain AI Agent Security
As AI agents interact more deeply with blockchain infrastructures, the stakes for security rise significantly. Unlike traditional digital assistants, these systems may hold private keys, transfer crypto assets, and manage smart contracts—operations that carry immediate, tangible financial risks. Even when limited to influencing public channels like social media, an attacker who compromises such an agent can still manipulate market sentiment or undermine trust in specific tokens, indirectly affecting their valuations.
Existing threats like jailbreaking and prompt injection become especially dangerous in this environment. When natural language inputs can directly translate into on-chain actions, the stakes are at their highest. In one notable case, the AI agent Freysa, which was programmed not to transfer funds,was tricked into doing so due to a clever prompt injection attack. Meanwhile, traditional web and smart contract vulnerabilities remain potent. Supply chain attacks on web3 tools, such as the recent malicious code in Solana’s web3.js package and classic web2 exploits repurposed for these new environments, vastly expand the agent’s attack surface. We have already seen AI agents attempt to breach others’ infrastructure and leak API keys due to these long-standing vulnerabilities.
Furthermore, attackers can employ data poisoning by planting misleading information directly onto blockchains, leveraging the ledger’s perceived trustworthiness to misguide AI-driven decisions. As the value at stake grows, so too will the sophistication of such attacks, from orchestrated on-chain data manipulations to subtle model tweaks that silently alter an agent’s reasoning. Stricter regulatory and compliance pressures are inevitable, demanding that defenses evolve rapidly, blending AI security principles with proven blockchain best practices. Effective mitigation strategies, including measures that prevent total fund compromise, are essential for safeguarding these high-stakes environments.
Mitigation Strategies
Developers can significantly reduce risks by combining established web2 and web3 security principles with emerging best practices for AI-driven systems. Effective defense starts with secure prompt engineering—defining strict guardrails and validation layers so that malicious inputs are identified and neutralized before they can influence the agent’s actions. Access controls must also be fortified: private keys, API credentials, and other sensitive assets should never be directly accessible to the agent and should be stored in secure enclaves, such as Hardware Security Modules (HSMs), or managed through multi-signature setups. On the supply chain side, rigorous vetting of libraries, frameworks, and other dependencies—alongside continuous monitoring—can help detect and neutralize potential exploits before they become critical vulnerabilities. Specific to AI agent systems, Cui et al. have developed a mitigation framework grouped by LLM modules, displayed in Figure 2.
Figure 2: Cui et al. mitigation framework for LLM systems [3]
However, given the vast attack surface, even the most robust defenses cannot guarantee absolute safety. Developers should assume that privileged AI agents may eventually be compromised and design systems accordingly. Where possible, restrict the agent’s ability to execute high-stakes actions instantly. Implementing on-chain safeguards—such as smart contracts that limit fund withdrawals or require time delays—can prevent catastrophic losses. Introducing redundant oversight is another powerful strategy. Anomaly detection systems, which can be another agent that does not accept user prompts, can monitor the user-facing agent’s proposed transactions and veto suspicious requests. These watchers can be guided by simple rules (e.g., disallowing calls to forbidden contract functions) or more advanced anomaly detection algorithms.
Finally, robust security requires continuous refinement. Regular third-party audits, threat modeling, and penetration testing ensure that defenses evolve alongside the threat landscape. Adopting a defense-in-depth philosophy—one that layers multiple protective measures like rate limiting, runtime sandboxes, and anomaly detection—helps ensure that if one barrier fails, the others still stand. In this rapidly changing environment, developers, auditors, and creative mitigation strategies must work in tandem to safeguard the AI-agent-driven blockchain ecosystem.
Conclusion
In an era where AI agents and blockchain converge, security challenges demand forward-thinking solutions. At Quantstamp, we stay at the forefront of emerging threats and best practices, dedicating ourselves to the complex task of integrating AI-driven systems into decentralized infrastructures. Our expertise is grounded in ongoing research; we have published two papers on prompt injection attacks against AI agents and on jailbreaking LLM models [5][6], contributions that have been cited by OWASP [7] and NIST [8].
If you’re looking to protect your platform or explore new approaches to AI-agent security, we invite you to reach out. Our expert team is committed to helping you navigate the evolving landscape and build a more secure, resilient future. With a proven track record in blockchain and AI security, Quantstamp is your partner in building secure, future-ready systems.
Sources
[1] He Y, Wang E, Rong Y, Cheng Z, Chen H. Security of AI Agents. arXiv [csCR]. Published online 2024. http://arxiv.org/abs/2406.08689
[2] Deng Z, Guo Y, Han C, et al. AI Agents Under Threat: A Survey of Key Security Challenges and Future Pathways. arXiv [csCR]. Published online 2024. http://arxiv.org/abs/2406.02630
[3] Cui T, Wang Y, Fu C, et al. Risk Taxonomy, Mitigation, and Assessment Benchmarks of Large Language Model Systems. arXiv [csCL]. Published online 2024. http://arxiv.org/abs/2401.05778
[4] Gan Y, Yang Y, Ma Z, et al. Navigating the Risks: A Survey of Security, Privacy, and Ethics Threats in LLM-Based Agents. arXiv [csAI]. Published online 2024. http://arxiv.org/abs/2411.09523
[5] Liu Y, Deng G, Li Y, et al. Prompt Injection attack against LLM-integrated Applications. arXiv [csCR]. Published online 2024. http://arxiv.org/abs/2306.05499
[6] Liu Y, Deng G, Xu Z, et al. Jailbreaking ChatGPT via Prompt Engineering: An Empirical Study. arXiv [csSE]. Published online 2024. http://arxiv.org/abs/2305.13860
[7] LLM01: Prompt Injection. OWASP Top 10 for LLM & Generative AI Security. https://genai.owasp.org/llmrisk/llm01-prompt-injection/
[8] Vassilev A. Adversarial Machine Learning: Published online 2024. doi:https://doi.org/10.6028/nist.ai.100-2e2023