What is prompt injection?

Prompt injection is when an attacker crafts input that causes an AI agent to behave unexpectedly — executing unintended commands, leaking data, or bypassing restrictions. It is the most common attack vector against AI systems today, and defences must be built into the system design.

What tools should an AI agent NOT have access to?

AI agents should not have access to tools with irreversible or high-blast-radius actions unless there is a human review gate. This includes: sending bulk communications, deleting data, making financial transactions above a threshold, and modifying access controls. Apply the principle of least privilege.

AI Security

How Do I Secure AI Agents?

Securing AI agents requires controlling what tools they can access, what data they can read or write, and how they respond to unexpected inputs. The core principles are least-privilege access, prompt injection defences, output validation, and continuous audit logging.

By Maksym Miedvied

AI agents are different from traditional software in one critical way: they make decisions. A conventional application executes code exactly as written. An AI agent interprets instructions and chooses actions. That flexibility is what makes them useful — and what makes them a security concern when not properly constrained.

The most important control is tool access. Every tool an agent can invoke is a potential attack surface. An agent that can send emails, query a database, and browse the web has a vastly larger blast radius than one that can only read a calendar. Apply least-privilege: give the agent only the tools it needs for its specific function, and nothing more.

Prompt injection is the most common attack vector. It occurs when an attacker crafts input — through a user message, a document, or a web page the agent reads — that causes the agent to deviate from its instructions. Defences include structured system prompts with explicit boundaries, input sanitisation, and output validation before any tool is called.

Every tool call should be logged. When an agent acts unexpectedly, logs are the only way to diagnose what happened. For high-risk actions — sending bulk communications, modifying data, executing financial transactions — add a human review gate. The agent proposes the action; a human approves it.

Key Points

Apply least-privilege: agents should only have tools they need
Harden system prompts with explicit boundaries and refusal instructions
Validate all outputs before tool execution
Log every tool call and response for auditability
Add human review gates for high-risk or irreversible actions
Test for prompt injection before deploying any agent