Safe and Controlled PII Handling in Agentic Systems

11 May, 2026

5 min read

Agentic AI systems are designed to combine three capabilities: access to private data, exposure to untrusted content, and the ability to communicate externally. Security researcher Simon Willison calls this the "lethal trifecta" – and he's right. Any system that combines all three is trivially exploitable for data theft. The problem isn't theoretical. It's architectural.

CONTENTS

Why Guardrails Won’t Save You

The Privacy Vault Pattern

The Infrastructure Wasn’t Built for This

What Deterministic Controls Unlock

EchoLeak (CVE-2025-32711, CVSS 9.3) coerced Microsoft 365 Copilot into exfiltrating Outlook, Teams, OneDrive, and SharePoint data through a single crafted email – zero clicks required. ShadowLeak hit OpenAI’s Deep Research agent with 100% success rate extracting names and addresses from Gmail. ForcedLeak (CVSS 9.4) let attackers embed instructions in a Salesforce web form that exfiltrated customer PII and payment details using an expired whitelisted domain purchasable for $5. Prompt injection – ranked #1 in the OWASP LLM Top 10 for 2025 – remains fundamentally unsolved. The LLM vendors are not going to save us.

Why Guardrails Won’t Save You

The industry’s first instinct was to add guardrails – lexical filters, semantic classifiers, ML-based detection layers. Practitioners who shipped these systems report running three passes (lexical, semantic, and ML-based classification) and still acknowledge it’s not airtight. The emerging consensus on Hacker News and among builders working in production is blunt: if your PII protection depends on prompt engineering, you don’t have PII protection.

The uncomfortable truth for enterprise deployments: your biggest PII risk probably isn’t a sophisticated prompt injection attack. It’s the consultant at 11pm pasting client data into ChatGPT. IBM’s 2025 data shows 40% of files uploaded to GenAI tools contain PII or PCI data, with shadow AI breaches costing an average of $4.63 million. Guardrails treat the symptom. They don’t address the root cause: the LLM should never see raw PII in the first place.

The Privacy Vault Pattern

The architectural shift gaining traction among practitioners is what the community calls the “privacy vault” – and it’s straightforward in concept. You put a vault between your data and your LLM. Real PII goes in, tokens come out. The agent only ever sees tokens like CUST_NAME_045 or CC_TOKEN_123. The same real entity always maps to the same token, so the LLM can still reason about relationships and patterns, but it never touches the raw data. When you need to rehydrate for the human user, only an authorized service with short-lived credentials can call the vault to swap tokens back. The LLM provider literally never sees a real name, number, or address.

We implemented this pattern in SIMA – Sigma Software’s agentic corporate navigator. SIMA handles outward-facing requests where users ask questions and receive full context, but administrators reviewing conversation history for quality assurance or troubleshooting never see raw PII. It’s masked in the system’s audit trail.

SIMA has processed over 78,000 requests from 3,725 unique users across 1,100+ documents. The system is agentic – it doesn’t just answer questions, it executes workflows, creates tickets, schedules meetings, and integrates via MCP with services that expose compatible APIs.

“Even in system logs and audit trails, PII never appears in plaintext – it’s masked by default,” explains Oleksiy Hoyev, ML Engineer at Sigma Software who built SIMA’s PII handling. ” We started with on-prem detection using NER model for PII then moved to Azure Language Services. The shift improved accuracy and cut costs – we avoided expensive full on-prem deployment while keeping queries sanitized.”

Without adequate PII handling built into the architecture from the start, none of observability options, such as logging, tracing, admin evaluation – would be safe to use with production data.

The Infrastructure Wasn’t Built for This

Even with privacy vaults in place, the next operational challenge is monitoring and access control. A thread on Hacker News this month asked: “How are you monitoring AI agents in production?” The top-voted reply was brutal: “Most observability tools in this space are dashcams. They show you what happened after you already got robbed.” The tooling exists – OpenTelemetry, Langfuse, Braintrust – but the deeper problem is that the infrastructure agents connect to was never designed for non-human actors.

The emerging practitioner pattern is what one builder called “default deny with time-boxed permissions” – nothing runs without explicit approval, access is granted for minutes not forever, and everything is logged. The most practical advice came from a commenter in the healthcare space: “The LLM does not act on production. It builds scripts. You run those scripts on cloned data. Consider that it will fail each time it’s able to. Even with all these precautions, you still get 100x productivity.”

That last point is the one most organizations miss. You don’t have to choose between agent autonomy and control. You can give the agent a sandbox, treat its outputs as proposals not actions, and still get enormous value. The control layer isn’t a tax on productivity. It’s what makes the productivity safe to scale.

What Deterministic Controls Unlock

When PII handling is built into the architecture – not bolted on as a guardrail – the economics of AI deployment shift. SIMA is going commercial. We’re offering it to clients now, including pilots and customizations tailored to their systems. That would not be possible if PII protection depended on prompt engineering or if compliance required full on-prem infrastructure.

The boring infrastructure patterns from traditional security – least privilege, network isolation, immutable audit logs, tokenized data flows – matter more here than any fancy AI-native solution. Vaults, allow-listed tool calls, typed parameter validation, sandboxed execution, outbound request proxying: these are deterministic controls that operate independently of the model. They don’t ask the LLM to protect itself. They make it structurally impossible for the LLM to leak what it never had access to.

Agentic systems can operate at scale in regulated industries – healthcare, finance, legal – not because we solved prompt injection, but because we stopped relying on the model to enforce boundaries it was never designed to respect. The data stays in the vault. The agent works with tokens. The human gets the rehydrated answer. The industry spent two years asking how to teach models to be safe. The next phase belongs to companies building systems where safety is a property of the architecture, not a behavior we hope the model learned.

Why Guardrails Won’t Save You

The Privacy Vault Pattern

The Infrastructure Wasn’t Built for This

What Deterministic Controls Unlock