How to Respond to a Discovery Request for AI Agent Actions

How to respond to a discovery request for AI agent actions is now a board-level legal operations problem, not a niche technical edge case. If your product makes pricing, underwriting, hiring, care-navigation, or fraud decisions with agent workflows, you should assume a Rule 34 request is a when question, not an if question.

Most teams still approach discovery for AI agent actions like an incident pull from observability tooling. They start with search, then export whatever logs they can find, then try to explain gaps in a cover letter. That sequence feels practical, but it is legally fragile.

The contrarian point is simple: in discovery, speed does not save you if integrity fails. A fast response built on mutable logs is often worse than a slower response built on defensible evidence. Opposing counsel can work with your timeline. They will attack your chain of custody.

This guide lays out a practical method you can execute under pressure, with legal and technical checkpoints that hold up when the other side is skeptical.

How to respond to a discovery request for AI agent actions: start with legal hold, not search

The first operational mistake is opening Datadog before issuing preservation instructions. Under the Federal Rules of Civil Procedure, your duty to preserve relevant ESI is triggered once litigation is reasonably anticipated. If you start gathering before freezing retention and access, you risk spoliation arguments under Rule 37(e).

Treat the discovery request as a preservation event the moment it lands:

Identify custodians and systems in scope, legal, platform engineering, security operations, model operations, and the product owner for the workflow named in the request.
Issue a litigation hold that explicitly includes AI agent records, prompts, tool-call arguments, model outputs, and related policy configs.
Suspend auto-delete settings and shorten-noise filters in observability systems that might silently drop fields.
Restrict write and admin access to evidence stores for the duration of collection.

If your first move is keyword search without a hold, you may preserve an incomplete corpus and create unfixable integrity questions. Courts do not require perfection. They do expect reasonable, repeatable preservation steps.

Map the request to a precise evidence scope

Discovery requests for AI agent actions are usually broader than they look. "All records related to the denial decision" might include model prompts, retrieved context, feature flags, human override actions, and external API responses that informed the final output.

Build a scope map before extraction:

Decision objects: Every final decision or recommendation the agent produced.
Execution traces: Prompt, system instructions, retrieval payloads, tool calls, and model outputs.
Control context: Model version, policy version, guardrail configuration, and release metadata at execution time.
Human interaction records: Escalations, approvals, overrides, and post-decision edits.
Infrastructure context: Timestamp authority records, signing metadata, and integrity verification artifacts.

This is where many teams under-collect. They produce chat transcripts but omit the tool call that actually changed a customer record. Or they include outputs but not the policy version that constrained them. Opposing experts look for these seams because seams are where doubt enters.

Why "we have logs" fails under challenge

When lawyers ask whether you can authenticate records, they are pointing to Federal Rule of Evidence 901. They need evidence sufficient to support a finding that the item is what you claim it is. If your answer depends on trusting operator access controls alone, you are exposed.

Common failure patterns in AI discovery responses:

Mutable records: Logs can be edited, reindexed, or deleted by admins.
Retention mismatch: Relevant events aged out because retention was set for cost, not legal risk.
Clock ambiguity: Timestamps come from app servers without trusted timestamp attestation.
Schema drift: OpenAI and Anthropic traces captured in different shapes, breaking complete reconstruction.
Missing provenance: No reliable link between the produced record and the runtime event that generated it.

A defensible response typically needs cryptographic signing, append-only or tamper-evident sequencing, and verifiable timestamps, often RFC 3161-backed, plus a reproducible export path. Without those, your production may still be accepted, but your credibility cost rises quickly.

Build a four-layer production package

If you need a repeatable answer for how to respond to a discovery request for AI agent actions, package production into four layers instead of a flat CSV dump.

Layer 1: Factual event corpus

Produce the raw event set for the requested period and entities:

Inputs and outputs for each relevant agent action
Tool call arguments and responses
Model and policy versions
User and account identifiers, redacted where required by protective order

This is the factual substrate. No interpretation here.

Layer 2: Integrity artifacts

Attach proof that events were not altered after capture:

Record signatures
Hash-chain or Merkle proofs
RFC 3161 timestamp tokens or equivalent trusted timestamp receipts
Verification manifest with public keys and validation instructions

This layer is the difference between "records" and "evidence."

Layer 3: Chain-of-custody narrative

Include a concise affidavit-ready narrative:

Where events were captured
How they were signed
Where they were stored
Who had access
How export was performed
How completeness was checked

Keep this technical but plain. A judge and a jury may both read it.

Layer 4: Request-to-record index

Map each request item to produced artifacts:

Request 1(a) -> files X, Y, Z
Request 1(b) -> files M, N
Exceptions or objections with basis

This reduces motion practice because opposing counsel can see where each ask was answered.

How to respond to a discovery request for AI agent actions across multiple model providers

Multi-provider environments are now normal. One product flow might touch OpenAI for reasoning, Anthropic for summarization, and a domain model in Vertex for classification. If your production logic assumes one provider, your discovery response will be incomplete.

Use a normalization-first approach:

Convert provider-native traces into a unified schema before legal review.
Preserve provider-native originals as source artifacts.
Cross-link normalized records back to source IDs so verification is bi-directional.
Record transformation code versions used for normalization.

This mirrors mature eDiscovery practice where native files and processed outputs are both preserved. It prevents the "you transformed this and lost context" challenge.

A practical checklist for counsel and engineering lead review:

Do we have full prompt and output capture for each provider path?
Do we have tool call visibility at the same granularity for each provider?
Can we trace one decision end-to-end across provider boundaries?
Can we prove no records were dropped during normalization?

If any answer is no, note the limitation explicitly rather than hoping it is missed. Courts punish concealment harder than imperfection.

Standards that strengthen admissibility

If your legal team asks what makes a package durable in court, anchor the answer to specific standards rather than vendor adjectives.

FRE 901 (authentication): You need testimony or process evidence showing the records are what you claim.
FRE 902(13) and 902(14): Certified records generated by electronic processes and certified data copied from electronic devices can reduce live witness burden when your certification package is strong.
FRCP 26(g): Counsel certifies discovery responses are complete and reasonable after inquiry. Weak data lineage makes that certification harder.
FRCP 37(e): If ESI that should have been preserved is lost, courts can impose curative measures, and in severe cases adverse-inference sanctions.

For regulated teams, statutory controls also matter during discovery posture. HIPAA Security Rule 164.312(b) requires audit controls for information systems containing ePHI. SOC 2 CC7.2 expects logging and monitoring controls with evidence of operation. EU AI Act Article 12 requires logging for high-risk systems to enable traceability. These are not identical legal tests, but together they shape what "reasonable" looks like when your production is challenged.

Common objections and how to preempt them

Discovery fights around AI records usually converge on three objections.

Objection 1: "Your logs are incomplete." Preempt by documenting coverage rates, provider-by-provider, workflow-by-workflow. Include an explicit completeness statement that identifies known blind spots.

Objection 2: "Your records could have been edited." Preempt with signature verification output, key management documentation, and a short explanation of who cannot alter records without detection.

Objection 3: "Your export process is not reproducible." Preempt with a runbook hash, tool versions, query manifests, and a second-operator validation record showing the same query produced the same result set.

This is where preparation pays off. A one-page reproducibility appendix often saves weeks of motion practice.

One operational detail many teams miss is counsel access to technical verification tooling. If only engineering can run signature checks, legal cannot independently validate production quality before delivery. Give legal a documented verifier workflow they can run themselves on a sample set.

Timing, deadlines, and negotiation strategy

Rule 34 timelines can feel brutal when your evidence architecture is immature. Teams often panic and produce too early. A better approach is to separate immediate obligations from full production quality.

Within 24 hours:

Acknowledge receipt
Confirm preservation measures are in place
Propose a meet-and-confer focused on format, scope, and phased production

Within 3 to 5 business days:

Deliver a scope memorandum listing systems and expected data classes
Surface known limitations transparently, retention gaps, missing historical provider logs, unresolved identity joins
Offer phased production dates with verification milestones

This posture signals good faith and control. It also buys room to avoid producing brittle exports that later need correction.

Monday morning plan: run a discovery readiness drill

If you want to be ready before the request arrives, run a 90-minute drill this week.

Step 1 (15 min): Pick one high-risk workflow, lending, claims, hiring, or medical triage.

Step 2 (20 min): Draft a mock request: "Produce all AI agent actions tied to decision class X from Jan 1 to Mar 31, including prompts, outputs, tool calls, and policy configs."

Step 3 (25 min): Have legal and platform jointly map data sources and identify custody gaps.

Step 4 (20 min): Execute a sample export for one day of data and run integrity verification.

Step 5 (10 min): Record three blockers and assign owners with due dates.

By lunch, you will know whether your current stack can survive a real request. Most teams discover one critical gap immediately, usually retention or cross-provider trace completeness. That is exactly the point of the drill.

Where Notary fits

Notary is built for this exact workflow: capture agent actions across providers, sign records at ingestion, preserve tamper-evident history, and export framework-ready evidence packs with chain-of-custody materials. It is not a replacement for your observability stack. It is the evidentiary layer your observability stack was never designed to be.

If you are formalizing your process for how to respond to a discovery request for AI agent actions, start with your runbook and controls mapping, then evaluate whether your current systems can produce a defensible package without manual reconstruction. The Notary docs outline the architecture, and the evidence packs page shows what production-ready exports look like in practice.