AI Security
How Do I Secure AI Agents?
Securing AI agents requires controlling what tools they can access, what data they can read or write, and how they respond to unexpected inputs. The core principles are least-privilege access, prompt injection defences, output validation, and continuous audit logging.
AI agents are different from traditional software in one critical way: they make decisions. A conventional application executes code exactly as written. An AI agent interprets instructions and chooses actions. That flexibility is what makes them useful — and what makes them a security concern when not properly constrained.
The most important control is tool access. Every tool an agent can invoke is a potential attack surface. An agent that can send emails, query a database, and browse the web has a vastly larger blast radius than one that can only read a calendar. Apply least-privilege: give the agent only the tools it needs for its specific function, and nothing more.
Prompt injection is the most common attack vector. It occurs when an attacker crafts input — through a user message, a document, or a web page the agent reads — that causes the agent to deviate from its instructions. Defences include structured system prompts with explicit boundaries, input sanitisation, and output validation before any tool is called.
Every tool call should be logged. When an agent acts unexpectedly, logs are the only way to diagnose what happened. For high-risk actions — sending bulk communications, modifying data, executing financial transactions — add a human review gate. The agent proposes the action; a human approves it.
Key Points
- Apply least-privilege: agents should only have tools they need
- Harden system prompts with explicit boundaries and refusal instructions
- Validate all outputs before tool execution
- Log every tool call and response for auditability
- Add human review gates for high-risk or irreversible actions
- Test for prompt injection before deploying any agent