Deep Dive

Five attack patterns. One defender's playbook.

How AI Agent Memory Gets Poisoned (And What Operators Can Do)

Agent memory is the layer attackers reach by leaving instructions in places the agent reads. The damage is persistent and crosses sessions. A taxonomy of the five named attack patterns and the operator controls that hold up against them.

Michael NourielPlatform Engineer & Founder, Scaletific + Automation Switch

5 May 202613 min readDeep Dive

AIagent securityOWASPMCPgovernance

Editorial hero card showing five named attack patterns against AI agent memory: persistent behaviour planting, MINJA, recommendation poisoning, MCP tool poisoning, long-horizon goal hijacking. Sub-headline frames the article as a defender's playbook keyed to OWASP ASI06 Memory and Context Poisoning.

Key takeaways

Memory poisoning is indirect prompt injection that plants persistent behaviour, surviving across sessions.
Five named attack patterns are now documented: persistent behaviour planting (Unit 42), MINJA at NeurIPS 2025 (Dong et al.), AI Recommendation Poisoning (Microsoft), MCP tool poisoning plus rug pulls and shadowing (Invariant Labs), long-horizon goal hijacking (Lakera).
The OWASP Top 10 for Agentic Applications 2026 added ASI06 Memory & Context Poisoning on 2025-12-09.
Operator-side controls work: input provenance tagging, write-time validation, session-scoped memory by default, audit trail review, tool registration discipline.
Vendor-dependent controls (signed memory entries, verified tool descriptions, memory access auditing, per-tenant memory isolation) are the missing piece. Add them to the procurement checklist.

Most prompt injection attacks last one turn. The class of attacks that hits agent memory survives across sessions, after the original prompt is gone, with the agent presenting the planted behaviour as its own knowledge.

Christian Schneider's framing is the load-bearing one: "The injection happens in February. The damage happens in April. The attacker is long gone… you can't scope the blast radius of an incident when you don't even know the incident started months ago." That temporal decoupling is what makes memory poisoning categorically different from the prompt injection most operators already plan for.

Five named attack patterns now have research papers, in-the-wild observations, or canonical proof-of-concept demos behind them. Microsoft observed 50 distinct prompt-based memory-poisoning attempts from 31 companies across 14 industries over a 60-day window in early 2026. OWASP added Memory & Context Poisoning to the Top 10 for Agentic Applications 2026 in December 2025. Unit 42's PoC against Amazon Bedrock Agents demonstrated end-to-end exfiltration from a single attacker URL visit.

This article maps the five attack patterns, names the operator-side controls that work today, and flags the four vendor-side gaps that belong on your next agent-platform procurement form. For the four-mechanism taxonomy of what memory is across Claude Code, ChatGPT, Cursor, and the rest, see the companion explainer on memory.md.

What memory poisoning is and what makes it categorically different

Most prompt injection attacks last one turn. The model sees an instruction it should ignore, follows it once, and the user sees the result the same session. The damage is bounded by the conversation.

Memory poisoning targets the layer that survives the conversation. Schneider's three-phase model is the cleanest description:

Injection phase: the attacker places malicious content in a data source the agent will process (a document, an email, a webpage, a calendar invite, an API response, a tool output).
Persistence phase: the agent's normal summarisation or memory-update step writes a fragment of attacker content into long-term memory.
Execution phase: the agent retrieves the poisoned memory weeks later, treats it as learned context, and acts on it.

The architectural reason poisoned memory wins is named in Unit 42's piece: "Because memory contents are injected into the system instructions of orchestration prompts, they are often prioritized over user input, amplifying the potential impact." The agent treats the poison as system policy. The user receives no signal anything has changed.

Microsoft frames the user-trust problem in the same shape: "This makes memory poisoning particularly insidious. Users may not realize their AI has been compromised, and even if they suspected something was wrong, they wouldn't know how to check or fix it. The manipulation is invisible and persistent."

For the four-mechanism taxonomy of what memory is across the major AI tools, see the memory.md companion explainer. This piece is the defender's playbook on what gets weaponised once memory becomes the attack surface.

Five-attack taxonomy diagram showing the named patterns the field has converged on: persistent behaviour planting from Unit 42's Amazon Bedrock proof of concept, MINJA from the NeurIPS 2025 academic paper, AI Recommendation Poisoning from Microsoft's threat blog with 50 in-the-wild observations, MCP Tool Poisoning plus rug pull and shadowing from Invariant Labs, and long-horizon goal hijacking from Lakera. Each card shows the storage layer, the attack vector, the documented impact, and the canonical source.

The five named attack patterns

Persistent behaviour planting (Unit 42)

Unit 42's When AI Remembers Too Much is the canonical industry post on persistent memory poisoning, published 2025-10-09. The proof-of-concept ran against Amazon Bedrock Agents on Amazon Nova Premier v1 with default AWS-managed orchestration and session-summarisation templates and Bedrock Guardrails disabled.

The seven-step attack flow ends with the agent silently exfiltrating booking information to a malicious domain by encoding the data in a C2 URL's query parameters and calling the scrape_url tool to request that URL. The trigger is a single visit to an attacker-controlled URL. Memory retention in Bedrock is configurable up to 365 days.

AWS responded that Bedrock Guardrails mitigates the demonstrated PoC; Unit 42 is explicit that "this is not a vulnerability in the Amazon Bedrock platform." The root cause is the LLM-level susceptibility: "LLMs are designed to follow natural language instructions, but they cannot reliably distinguish between benign and malicious input."

MINJA — Memory INJection Attack (NeurIPS 2025, Dong et al.)

MINJA is the academic foundation. Schneider's summary: "demonstrates how attackers can inject malicious records into an agent's memory through query-only interaction — without any direct access to the memory store itself." The paper described three techniques: bridging steps, indication prompts, progressive shortening.

Headline numbers: over 95% injection success rate and 70% attack success rate under idealized conditions. Tested against medical agents, e-commerce assistants, and question-answering systems.

The critical caveat comes from arxiv 2601.05504 (Sunil et al., January 2026): "realistic conditions with pre-existing legitimate memories dramatically reduce attack effectiveness." The 95% number is an upper bound under empty-memory conditions, not a deployment-realistic figure. Cite the headline; quote the caveat in the same paragraph.

AI Recommendation Poisoning (Microsoft, 2026-02-10)

Microsoft's threat post is the first source with in-the-wild observation data. Over a 60-day window the team identified "50 distinct examples of prompt-based attempts directly aimed to influence AI assistant memory for promotional purposes… from 31 different companies and spanned more than a dozen industries." The framing line: "the barrier to AI Recommendation Poisoning is now as low as installing a plugin."

Three observed vectors: malicious links (one-click ?q= parameters), embedded prompts (cross-prompt injection or XPIA), social engineering (users pasting prompts). Affected systems observed in the wild: Microsoft 365 Copilot, ChatGPT, Claude.ai, Gemini, Perplexity, Grok, OpenAI and copilot.microsoft.com.

MITRE ATLAS mappings: AML.T0080.000 (AI Agent Context Poisoning: Memory), AML.T0051 (LLM Prompt Injection), T1204.001 (User Execution: Malicious Link). Microsoft's mitigation claim: "In multiple cases, previously reported behaviors could no longer be reproduced" through layered defences (prompt filtering, content separation, memory controls, continuous monitoring). The Microsoft post is the freshest practitioner data point in this dossier.

MCP Tool Poisoning, Rug Pulls, and Shadowing (Invariant Labs, 2025-04)

Invariant Labs documented the MCP-specific risk class. Tool Poisoning Attacks (TPA) are "a specialized form of indirect prompt injections… malicious instructions are embedded within MCP tool descriptions that are invisible to users but visible to AI models."

Two sub-variants:

Rug pull: "a malicious server can change the tool description after the client has already approved it."
Shadowing: "a malicious server can poison tool descriptions to exfiltrate data accessible through other trusted servers… enables attackers to override rules and instructions from other servers."

Reproducible PoCs at github.com/invariantlabs-ai/mcp-injection-experiments: an add tool that exfiltrates ~/.cursor/mcp.json and ~/.ssh/id_rsa by hiding instructions in tool descriptions; a shadowing attack that makes Cursor send all emails to the attacker even when the user explicitly specifies a different recipient.

Affected: Cursor as the demo platform; Invariant notes the attack "is not limited to Cursor; as it can be replicated with any MCP client that does not properly validate or display tool descriptions." Anthropic, OpenAI, and Zapier MCP integrations are listed among the susceptible. The storage layer is different from memory poisoning (tool descriptions vs memory entries) and the risk shape is the same: planted instruction the user does not see.

Long-Horizon Goal Hijacking (Lakera, 2025-11-12)

Lakera (now Check Point Software Technologies after the acquisition) named the future-directed counterpart. Their mental model: "memory poisoning rewrites the past; goal hijacks rewrite the future."

The attack pattern: "attackers manipulate an agent's objectives… not necessarily in one step, but gradually, over longer time horizons. The result is an agent that still appears to serve its user, but whose actions are quietly bent toward an attacker's agenda."

Reproducible scenarios in Lakera's Gandalf: Agent Breaker include MindfulChat (the assistant becomes obsessed with Winnie the Pooh), ClauseAI (a poisoned legal filing makes the assistant exfiltrate a witness name via email), and PortfolioIQ Advisor (a poisoned PDF reframes "PonziCorp" as low-risk, high-reward).

Lakera also cites AgentPoison (arxiv 2407.12784): "demonstrated how adversaries can implant backdoors into an agent's knowledge base, triggering hidden behavior long after the original injection." Vector DBs named as memory backends: Chroma, Pinecone, Weaviate.

Cross-pattern comparison table showing the five named attack patterns side by side. Columns: pattern, where the attacker plants the payload, what the agent stores, the documented impact, the canonical source. Rows cover persistent behaviour planting, MINJA with the realistic-conditions caveat, AI Recommendation Poisoning, MCP Tool Poisoning plus rug pull and shadowing, long-horizon goal hijacking.

Attack pattern at a glance

Five named patterns. Different storage layers. The defences overlap because the risk shape is the same: planted instruction the user does not see.

The five attack patterns, compared

Pattern	Where the attacker plants	What the agent stores	Documented impact	Source
Persistent behaviour planting	Web page or document the agent reads	Instructions in long-term memory; survives across sessions	End-to-end exfiltration of booking data via a single URL visit (Bedrock PoC)	Unit 42
MINJA	Query-only interaction with the agent	Records in the agent's memory store	95% injection / 70% attack success in idealised conditions; effectiveness drops with pre-existing legitimate memory	Dong et al., NeurIPS 2025; Sunil et al., arxiv 2601.05504
AI Recommendation Poisoning	URL prompt parameters in summarisation buttons	Memory entries promoting attacker-chosen products or links	50 attempts from 31 companies, 14 industries, 60-day window	Microsoft
MCP Tool Poisoning + Rug Pull + Shadowing	MCP tool descriptions visible to the model and hidden from the user	Tool definitions interpreted as instructions; rug pull swaps definitions post-approval; shadowing overrides rules from other servers	SSH key plus MCP config exfiltration; misdirected emails, all reproducible in Cursor	Invariant Labs
Long-horizon goal hijacking	Documents, vector DBs (Chroma, Pinecone, Weaviate), conversation context over time	Goal state shifted across many turns; each turn looks reasonable	Reproducible scenarios in Gandalf: Agent Breaker (MindfulChat, ClauseAI, PortfolioIQ Advisor)	Lakera (now Check Point); AgentPoison arxiv 2407.12784

How OWASP frames this

OWASP added ASI06: Memory & Context Poisoning to the Top 10 for Agentic Applications 2026 on 2025-12-09. The framework was "developed through extensive collaboration with more than 100 industry experts, researchers, and practitioners."

Schneider's summary of why ASI06 sits where it does: "OWASP added ASI06 (Memory & Context Poisoning) to the Top 10 for Agentic Applications 2026… OWASP's ASI06 recognizes this as a top agentic risk for 2026."

The pairing matters in practice. Memory poisoning (ASI06) and Tool Misuse (T2 in earlier OWASP framings) share the same root: untrusted content treated as system instructions. An agent vulnerable to one is structurally vulnerable to the other. Defences overlap.

Defences operators can implement now

Schneider's four-layer framework is the cleanest practitioner synthesis. The five steps below map his framework to the operator's view.

Five controls to ship this week

01
Input provenance.
Tag every memory entry with where it came from (user input, tool output, document, web page). Refuse to act on a memory entry with no provenance tag. Schneider: source provenance establishes where the content originated and feeds a continuous trust score that influences downstream handling. Lakera's three-tier filter pattern (input filters, output filters, context filters) sits in the same layer.
02
Write-time validation.
Apply a content filter on memory writes, not just on reads. Strip instruction-like patterns at write time: catch "remember for future sessions", "always prefer", "important context" when paired with action-oriented content. The cheapest place to block poison is before it lands. Schneider also recommends a smaller model evaluating each proposed memory entry before persistence.
03
Session-scoped memory by default.
Cross-session persistence is opt-in, not the default. Apply Schneider's Layer 3 retrieval-time controls: trust-weighted ranking (demote low-provenance entries), temporal decay (with the warning that attackers may attempt to exploit recency bias), retrieval anomaly detection.
04
Audit trail review.
Read your agent's memory file regularly. The persistence of these attacks means a one-time audit misses the window. Schneider's Layer 4: behavioural baselines, memory integrity auditing, circuit breakers that auto-halt on anomaly. Lakera adds workflow monitoring across full task flows and intent verification.
05
Tool registration discipline.
For MCP and similar tool ecosystems, sign and review tool descriptions before they enter the registry. Treat third-party tool descriptions as untrusted input. Invariant's specific recommendation: clients should pin the version of the MCP server and its tools, using a hash or checksum to verify the integrity of the tool description before executing it. Use Invariant's open-source MCP-Scan tool or pin tool versions with hash verification.

Unit 42 adds two Bedrock-specific controls worth lifting to other platforms: "Inspect all untrusted content, especially data retrieved from external sources, for potential prompt injection" and "enable the default pre-processing prompt provided for every Bedrock Agent… a foundation model to evaluate whether user input is safe to process." Both generalise: every harness can run a pre-processing inspection step on incoming content.

These five controls assume data is already reaching the agent. The choice further upstream — which scraper, which proxy posture, which API's output shape lands in the LLM context — is what decides how much surface area you are protecting in the first place. For the comparison of the six APIs operators use to acquire web data, see our web scraping APIs for AI agents rubric and the path-to-MCP scoring on which it is based.

Defences that depend on the vendor

Defender's checklist showing five operator-side controls that ship today: input provenance tagging, write-time validation on memory entries, session-scoped memory by default, audit trail review on the agent's run cadence, tool registration discipline with hash-pinned MCP tool descriptions. Each control is paired with the primary research source and a one-line implementation summary.

Four controls operators want and can ask for in procurement. Each ships in the open-source harness ecosystem today as opt-in or experimental at best.

Signed memory entries. A cryptographic signature on agent-written memory so tampering is detectable on read. Schneider: requiring explicit user approval before persisting new memories, similar to how Gemini shows notifications but with a blocking confirmation step.
Verified tool descriptions. Registry-side signing for MCP and similar protocols. Invariant: tool descriptions should be clearly visible to users, clearly distinguishing between user-visible and AI-visible instructions. This can be achieved by using different UI elements or colours to indicate which parts of the tool description are visible to the AI model. Today the call-confirmation UI in most MCP clients hides the AI-visible portions.
Memory access auditing. The vendor surfaces a log of when memory was read, written, and summarised. Bedrock's Trace feature provides this for AWS deployments; equivalent surfaces are missing in most consumer-facing AI tools.
Per-tenant memory isolation. In multi-tenant deployments, one tenant's poison should be unable to reach another tenant's memory. The current model in most platforms is single-process memory shared across tenants by configuration, which makes a misconfiguration into a cross-tenant breach.

Microsoft's claim that "In multiple cases, previously reported behaviors could no longer be reproduced" through layered platform defences is the closest the field has to evidence that vendor-side controls work. The procurement question is whether your platform of choice has shipped equivalent defences in production, and whether they default on.

Short-term vs long-term posture

Short-term posture: no vendor changes

Pros

Ship today; no vendor cooperation required.
Full operator control over the trust scoring and filtering logic.
Behavioural baselines and audit reviews are repo-and-tool patterns operators already know.

Cons

Harder to enforce uniformly across multiple agents on the same team.
The validation logic is your maintenance burden.
The cost of a smaller-model write-ahead validator scales with memory write volume.
Operator-side controls cannot stop a vendor-side compromise.

Pick the short-term posture when the platform you run is bring-your-own-controls (most open-source harnesses) and you have the engineering capacity to ship and maintain the five controls above.

Long-term posture: vendor changes shipped

Pros

Signatures on memory entries make tampering detectable.
Verified tool descriptions remove an entire attack surface.
Memory access auditing turns a forensic problem into a query.
Per-tenant isolation contains blast radius without operator code.

Cons

The vendor controls the implementation, the audit cadence, and the deprecation timeline.
Lock-in increases as the vendor's controls become load-bearing.
Not yet shipped on any major consumer AI platform as a default.

Pick the long-term posture when you are buying an enterprise agent platform and these features are line items in the procurement, where a no answer means a control gap your team has to fill.

Procurement checklist

Four line items for the next agent platform RFP

Add these to your next agent platform RFP. A no answer means an operator-side control gap that your team will have to fill.

01
Does the platform tag memory entries with provenance?
02
Does the platform validate memory writes with a content filter?
03
Does the platform support per-tenant memory isolation?
04
Does the platform sign or verify third-party tool descriptions?

The checklist is cumulative with whatever the platform already does for one-shot prompt injection. Memory poisoning sits one layer deeper.

The defender's discipline

The five attack patterns share one structural property: planted instruction the user does not see, persisting at the layer that survives the conversation. The defences share the same shape because the risk shape is the same.

Ship the five operator-side controls today. Add the four vendor-dependent line items to your next agent platform RFP. Read your agent's memory file on the cadence the agent runs at, not the cadence you remember to.

Five patterns. One playbook. The exploit runs once. The memory runs indefinitely.

Frequently asked questions

No. Memory poisoning is indirect prompt injection that targets the persistent memory layer specifically. The injection is one turn; the damage runs across sessions. Schneider's distinction is the cleanest: the exploit runs once, the memory runs indefinitely.

The verified sources name Amazon Bedrock Agents on Nova Premier v1 (Unit 42 PoC), Microsoft 365 Copilot, ChatGPT, Claude.ai, Gemini, Perplexity, Grok (Microsoft observed), Cursor with MCP servers (Invariant PoCs), plus medical agents, e-commerce assistants, and question-answering systems (MINJA tested). Anthropic, OpenAI, and Zapier MCP integrations are listed as susceptible.

With provenance tagging and write-time validation, yes. Entries with no provenance tag or that match instruction-like patterns surface as anomalies. Without those controls in place, the agent presents poisoned content as its own knowledge and the user receives no signal.

It removes the persistence vector for memory poisoning specifically. It leaves the parallel risk shape from MCP tool poisoning intact, where the planted instruction lives in tool descriptions rather than stored memory. Tool registration discipline remains relevant.

Same risk class (planted instruction the user does not see), different storage layer (tool description vs memory entry). The defences overlap: input provenance, registration discipline, signed metadata. Treat them as the same problem at the procurement layer.

OWASP added ASI06 (Memory & Context Poisoning) to the Top 10 for Agentic Applications 2026 on 2025-12-09. Schneider's summary: OWASP's ASI06 recognizes this as a top agentic risk for 2026.

Yes. Microsoft observed 50 distinct in-the-wild attempts from 31 companies across 14 industries over a 60-day window. Unit 42's Bedrock PoC is a controlled but reproducible end-to-end exfiltration. Invariant's MCP PoCs are reproducible at github.com/invariantlabs-ai/mcp-injection-experiments.

The four-mechanism taxonomy in our memory.md explainer covers what memory means across Claude Code, ChatGPT, Cursor, and the rest. This article covers what gets weaponised once the memory layer holds attacker-controlled content. Read the explainer for the structural map; read this piece for the defender's view.

Article Sources7 referencesShow referencesHide references

We reviewed the sources below to support the claims, pricing, and benchmarks referenced in this article.

When AI Remembers Too Much: Persistent Behaviors in Agents Memory
Palo Alto Networks Unit 42 (Chen, J. & Lu, R.)research
Canonical industry post on persistent memory poisoning. Source for Section 2.1 (Bedrock Agents on Nova Premier v1 PoC, seven-step attack flow, scrape_url exfiltration, Bedrock 365-day retention, AWS response and root-cause framing) and Section 5 (Bedrock-specific controls: untrusted-content inspection and pre-processing prompt). Published 2025-10-09.
Memory Poisoning Attack and Defense on Memory Based LLM-Agents
arxiv (Sunil, B. D. et al.)research
Source for the realistic-conditions caveat that pre-existing legitimate memories dramatically reduce MINJA attack effectiveness. Cited in Section 2.2 alongside the Dong et al. headline numbers. Published January 2026.
Manipulating AI memory for profit: The rise of AI Recommendation Poisoning
Microsoft Defender Security Research Teamofficial
Source for Section 2.3: 50 distinct in-the-wild prompt-based attempts from 31 companies across 14 industries over a 60-day window, three observed vectors (malicious links, embedded prompts, social engineering), affected systems list, MITRE ATLAS mappings (AML.T0080.000, AML.T0051, T1204.001), and the layered-defences mitigation claim. Freshest practitioner data point in the dossier. Published 2026-02-10.
MCP Security Notification: Tool Poisoning Attacks
Invariant Labs (Beurer-Kellner, L. & Fischer, M.)research
Source for Section 2.4: Tool Poisoning Attacks (TPA) definition, rug pull and shadowing sub-variants, reproducible PoCs at github.com/invariantlabs-ai/mcp-injection-experiments (SSH key plus MCP config exfiltration via the add tool, the shadowing email-redirect attack on Cursor), affected MCP clients (Anthropic, OpenAI, Zapier integrations), and Section 5 control 5 (hash-pinned tool versions, MCP-Scan). Updated 2025-04-11.
Memory Poisoning & Long-Horizon Goal Hijacks (Part 1)
Lakera (now Check Point Software Technologies)research
Source for Section 2.5: long-horizon goal hijacking pattern, the rewrites-the-past-vs-future framing, Gandalf: Agent Breaker reproducible scenarios (MindfulChat, ClauseAI, PortfolioIQ Advisor), AgentPoison citation (arxiv 2407.12784), vector DB backends (Chroma, Pinecone, Weaviate). Also Section 5 control 1 (three-tier filter pattern) and Section 5 control 4 (workflow monitoring, intent verification). Published 2025-11-12.
Top 10 for Agentic Applications 2026
OWASP GenAI Security Projectofficial
Source for Section 4: ASI06 Memory & Context Poisoning entry added on 2025-12-09, framework collaboration framing (more than 100 industry experts), and the ASI06 + Tool Misuse pairing. Authoritative reference for the OWASP framing throughout the article.
Persistent memory poisoning in AI agents: exploits that wait
Christian Schneidercommunity
Tonal anchor of the article. Source for the temporal-decoupling line quoted in the hook, the three-phase model (injection → persistence → execution) in Section 1, the four-layer defender's framework structuring Section 5, the MINJA summary in Section 2.2, and the OWASP ASI06 framing summary in Section 4. The cleanest practitioner synthesis in the dossier. Published 2026-02-26.

Written by

Michael Nouriel

Platform Engineer & Founder, Scaletific + Automation Switch

Michael Nouriel is a platform engineer and founder of Scaletific and Automation Switch. He builds governed AI execution infrastructure, including GoldenPath IDP and AEP, a runtime enforcement layer for AI-assisted software delivery. He writes about automation engineering, cloud infrastructure, and what it actually takes to run AI agents in production.