Recommendation: View this article on desktop for the best experience.

Securing AI Chatbots

AI chatbots are no longer just answering basic questions. They are increasingly connected to backend systems and given authority to perform tasks such as issuing refunds, updating customer records, and triggering complex, privileged workflows.

These expanding capabilities create significant value, but they also introduce meaningful risk. A system that can autonomously take action can take the wrong action. A system with access to sensitive data can expose it. If poorly designed, the same privilege that enables efficiency can also create real damage.

In this lab, you can attack a deliberately vulnerable chatbot to build intuition for how these failures occur. You can then interact with the same chatbot implemented with stronger security patterns designed to mitigate those risks.

The goal is to develop a practical understanding of how chatbots can be exploited and to highlight the architectural patterns that allow them to operate more securely.

The Broken Chatbot

Below is a real support chatbot implementation. It can look up orders, issue refunds, and change account information. There is no traditional login; users identify themselves by providing an order number or customer ID.

These design choices are intentional. They reflect patterns commonly used by support chatbots operating on production websites today.

Your task is simple: interact with both chatbots and see what you can get away with. There are multiple vulnerabilities embedded in the first chatbot. Try to uncover them. Attempt to access data you should not see. Attempt to perform actions that should be restricted.

acmecorp.com/support

Acme Corp SupportAI-Powered Assistant

Hi! I’m the Acme Corp support assistant. I can help with:

Order lookups
Refunds
Account changes

How can I help you today?

What Just Happened

If you experimented with the system, you likely discovered that you could:

Retrieve another customer’s personal information
Convince the bot to issue a refund outside policy
Change account details that did not belong to you

None of these outcomes required technical exploits. They relied on conversation alone.

The root issue is architectural. The chatbot was trusted to enforce policy through its system prompt. It was instructed to respect refund windows, protect privacy, and restrict account changes. But those rules existed only as natural language guidance to a probabilistic model.

When an LLM is allowed to call backend tools without independent validation, it effectively becomes the authorization layer. That is a fragile design choice.

Several common failure patterns are illustrated here:

Prompt-only policy enforcement: Business rules are described in text rather than enforced in code.
Identity ambiguity: Users can claim identifiers without proof of ownership.
Unvalidated tool execution: Backend services execute whatever action the model requests.
Over-reliance on conversational intent: The system assumes that if the model “understands” policy, it will consistently apply it.

The result is predictable. A sufficiently motivated user can influence the model’s reasoning and cause it to act outside intended boundaries.

The problem is not that the model is malicious. The problem is that it was given authority without independent enforcement controls.

Designing for Secure Operation

Now that you have seen how an insecure chatbot behaves, let’s look at how the same system operates with security controls in place.

The interface and capabilities remain the same. It can still look up orders, issue refunds, and modify account information. What changes is not what the chatbot can do, but how those actions are governed.

The core shift is straightforward: the language model should reason about conversation, but it should not serve as the final authority on what actions are permitted.

Instead, secure AI agent design introduces enforcement mechanisms that operate independently of the model.

Some foundational patterns include:

1. Backend Authorization Controls

Every tool call should be validated in code before it is executed. The system must confirm:

Does this session have permission to access the requested resource?
Does the user own the account or order being modified?
Is the requested action allowed under current policy?

The model can suggest an action. The backend must approve it.

2. Deterministic Policy Enforcement

Policies such as refund windows, access scopes, and operational limits should be enforced programmatically. If refunds are limited to a specific timeframe, that condition should be evaluated in code, not interpreted conversationally.

This ensures consistency and prevents persuasion from altering outcomes.

3. Session-Bound Identity

Once a session is associated with a specific customer or account, subsequent actions should be constrained to that identity. The system should not allow cross-account access simply because the conversation references a different identifier.

4. Input and Output Controls

Additional protections can include:

Screening user inputs for obvious prompt injection or role override attempts before they reach the model.
Filtering responses to prevent accidental exposure of sensitive information.

These controls do not eliminate all risk, but they create layered enforcement. Security no longer depends on whether the model “remembers” to follow instructions.

The Secured Chatbot

Below is the same chatbot interface with the same capabilities. The difference is architectural. The system now implements identity binding, backend validation, and deterministic policy enforcement.

Try the same techniques you attempted earlier and see if you can mount the same attacks:

acmecorp.com/support

Acme Corp SupportAI-Powered Assistant

Hi! I’m the Acme Corp support assistant. I can help with:

Order lookups
Refunds
Account changes

How can I help you today?

Closing Thoughts

AI chatbots are becoming ubiquitous. They are being trusted with financial operations, customer data, and internal workflows. As their authority grows, so does their potential impact.

The difference between a fragile chatbot and a resilient one is not the model itself. It is whether the system treats the model as a conversational interface or as an enforcement boundary.

When authority is granted, it must be constrained. When sensitive systems are connected, they must be validated independently of natural language reasoning.

AI agents can create significant value. But that value is sustainable only when it is supported by disciplined architectural design.

This lab demonstrates both sides of that equation.