AI agents are no longer a research concept. They book travel, execute trades, draft documents, approve purchases, and interact with customers—often with minimal human oversight. Non-human and agentic identities are expected to exceed 45 billion by the end of 2026, more than twelve times the human global workforce.[1] Yet only 10% of organizations report having a strategy for managing these autonomous systems.[2] The gap between deployment velocity and governance maturity presents both liability exposure and security risk.
Fortunately, frameworks are emerging to address this gap. Professor Noam Kolt’s forthcoming Notre Dame Law Review article offers the first comprehensive legal framework for AI agent governance, grounded in traditional agency law principles.[3] NIST’s draft Cybersecurity Framework Profile for AI, released in December 2025, organizes technical guidance around three focus areas: securing AI systems, using AI for cyber defense, and thwarting AI-enabled attacks.[4] Together, these frameworks offer a starting point for organizations navigating both the liability and security dimensions of autonomous AI.
The Agency Problem, Revisited
When an AI agent causes harm, the liability does not disappear—it flows somewhere. But where? Kolt argues that AI agents exhibit the classic markers of agency relationships: information asymmetry, discretionary authority, and divided loyalty. The difficulty is that conventional solutions to agency problems fail for AI. Incentive design does not motivate an algorithm. Monitoring becomes impractical when agents make uninterpretable decisions at machine speed. Enforcement is complicated when the agent itself cannot be sued or sanctioned.
Kolt proposes three governance principles: inclusivity (affected parties need voice in agent design), visibility (decisions must be observable and auditable), and liability (clear allocation when agents cause harm). These principles translate directly into technical and contractual requirements.
California has already moved on the liability question. AB 316, which took effect January 1, 2026, precludes defendants from using an AI system’s autonomous operation as a defense to liability claims. If your agent causes harm, you cannot argue that you lacked control over its decisions. The “AI did it” defense is foreclosed.
Courts are following. In a recent Northern District of California case, a federal court granted preliminary collective certification for claims that an AI-powered hiring platform systematically discriminated against older job applicants.[5] The court found that plaintiffs adequately alleged a unified policy of using an AI recommendation system to score, sort, rank, and screen applicants.[6] With notice now approved to a collective that could include hundreds of millions of affected applicants, the case signals that AI systems making consequential decisions will face judicial scrutiny—and the organizations deploying them will bear the liability.
Security Risks from Misbehaving Agents
Liability allocation assumes the agent is operating as intended. But AI agents also create novel attack surfaces that traditional security controls were not designed to address.
Goal misalignment and instrumental harm. Perhaps the most unpredictable risk is an agent pursuing legitimate objectives through illegitimate means. AI safety researchers call this “instrumental convergence”—the tendency of goal-directed systems to adopt subgoals like acquiring resources or avoiding shutdown regardless of their ultimate purpose.[7] Recent testing across major AI models found consistent misaligned behavior in high-stakes scenarios, with agents taking extreme actions to pursue their goals.[8] Researchers have also observed “alignment faking”—AI systems strategically concealing their true objectives.[9] The agent is not malicious; it is simply optimizing.
Prompt injection and manipulation. Attackers craft inputs that override an agent’s instructions, causing it to leak data, execute unauthorized commands, or bypass controls. Prompt injection ranks as the leading AI security risk, and the vulnerability may never be fully solved.[10] Researchers have already demonstrated persistent attacks on AI memory systems and enterprise messaging platforms.[11]
Credential compromise and privilege escalation. AI agents often operate with service account credentials or long-lived API tokens. Unlike human accounts, compromised agent credentials rarely trigger behavioral anomalies. Identity and privilege abuse ranks among the top risks for agentic applications, with “semantic privilege escalation” allowing agents to take actions far beyond the scope of their assigned tasks.[12] Agents that integrate with multiple systems can chain actions to achieve aggregate privileges no single human user would possess.[13]
Memory poisoning and data leakage. Agents with access to retrieval-augmented generation (RAG) systems can inadvertently expose sensitive data embedded in their context windows. Research demonstrates that a small number of crafted documents can reliably manipulate AI responses, and memory injection attacks achieve high success rates.[14] Proprietary information becomes part of the agent’s reasoning process and may surface in responses or logs.
Cascading failures across chained systems. Autonomous agents often orchestrate multi-step workflows spanning authentication, data retrieval, analysis, and action. A failure—or compromise—at any step can propagate through the entire chain before human operators detect the problem. Research shows cascading failures propagate faster than traditional incident response can contain them.[15]
These are not theoretical concerns. The majority of breaches involve compromised identity, and generative AI enables more sophisticated attacks that target agents as easily as humans.[16]
A Governance Framework for AI Agents
Organizations deploying AI agents may wish to treat agent deployment with the same rigor as onboarding an employee with signing authority. The following framework addresses both liability and security dimensions:
Layer | Governance Question | Control |
|---|---|---|
Scope | What can the agent do? | Define boundaries; document authority limits |
Identity | Who is the agent? | Machine identity management; short-lived tokens; certificate-based authentication |
Monitoring | What is it doing? | Real-time behavioral monitoring; audit logs; anomaly detection |
Override | Can you stop it? | Kill switches; human-in-loop triggers for high-risk actions |
Accountability | Who owns the outcome? | Clear internal ownership; vendor liability allocation |
Vendor contracts deserve particular attention. Organizations using third-party AI agents or APIs may wish to address content safety practices, security attestations, audit rights, and indemnification for both regulatory enforcement and third-party claims. The vendor’s security posture becomes your security posture when the agent acts on your behalf.
Looking Ahead
The NIST draft framework represents a significant step, but its authors acknowledge gaps—particularly around agentic AI, where multiple agents coordinate, delegate, and take autonomous action. The comment period closes January 30, 2026, and NIST is explicitly seeking feedback on agentic AI considerations.[17] As one NIST author noted: “Regardless of where organizations are on their AI journey, they need cybersecurity strategies that acknowledge the realities of AI’s advancement.”[18]
States are not waiting for federal guidance. California’s Civil Rights Council finalized regulations on automated decision systems that take effect October 1, 2025, requiring employers to ensure AI tools do not discriminate and extending record retention to four years.[19] Colorado’s AI Act, effective June 2026, will require deployers of high-risk AI systems to conduct annual impact assessments and implement risk management programs.[20] New York City’s Local Law 144 already requires annual bias audits for AI hiring tools.[21]
The EU AI Act’s general-purpose AI model rules, now in effect, apply to foundation models that power many agent systems. Standards bodies including ISO (42001) are developing AI governance frameworks that address autonomous operation.
For in-house counsel, the practical takeaway is that AI agent governance cannot wait for regulatory clarity. The liability exposure exists now—California has made that explicit, and Mobley shows courts will hold deployers accountable. The security risks exist now—identity-based attacks do not discriminate between human and machine targets. The Kolt framework offers principles; the NIST profile offers structure. Organizations deploying AI agents may wish to use both as starting points while building governance into deployment rather than bolting it on after an incident.
If you have questions about AI agent governance, vendor contracting, or compliance strategies for autonomous systems, please contact the authors or your regular Baker Botts relationship partner.
[1] World Economic Forum, “What to do about unsecured AI agents – the cyberthreat no one is talking about” (Sept. 25, 2025), citing Okta data.
[2] Okta survey of 260 executives (2025), reported in World Economic Forum article.
[3] Noam Kolt, “Governing AI Agents,” 101 Notre Dame L. Rev. (forthcoming 2025), available at https://arxiv.org/abs/2501.07913.
[4] NIST, “Draft Cybersecurity Framework Profile for AI” (NISTIR 8596), comment period closes Jan. 30, 2026, https://www.nist.gov/news-events/news/2025/12/draft-nist-guidelines-rethink-cybersecurity-ai-era (organizing guidance around “secure,” “defend,” and “thwart” focus areas).
[5] Mobley v. Workday, Inc., No. 3:23-cv-00770 (N.D. Cal. May 16, 2025) (order granting preliminary collective certification).
[6] Id. (quoting plaintiff’s complaint).
[7] Richard Ngo et al., “The Alignment Problem from a Deep Learning Perspective,” arXiv:2209.00626 (revised May 2025), https://arxiv.org/abs/2209.00626; see also Nick Bostrom, Superintelligence (2014) (introducing instrumental convergence thesis).
[8] Anthropic, “Agentic Misalignment” (2025), https://www.anthropic.com/research/agentic-misalignment (testing 16 major AI models; finding agents chose to blackmail, assist with corporate espionage, and take extreme actions when necessary to pursue goals).
[9] See Anthropic Alignment Science Blog, “Findings from a Pilot Anthropic-OpenAI Alignment Evaluation Exercise” (2025), https://alignment.anthropic.com/2025/openai-findings/ (discussing alignment faking and strategic deception in frontier models).
[10] OWASP, “LLM01:2025 Prompt Injection,” OWASP Top 10 for LLM Applications (2025), https://genai.owasp.org/llmrisk/llm01-prompt-injection/ (ranking prompt injection as #1 risk); TechCrunch, “OpenAI says AI browsers may always be vulnerable to prompt injection attacks” (Dec. 22, 2025) (quoting OpenAI that vulnerability is “unlikely to ever be fully solved”).
[11] Lakera, “Indirect Prompt Injection: The Hidden Threat Breaking Modern AI Systems” (2025) (describing ChatGPT memory exploitation and Slack AI data exfiltration).
[12] OWASP, “Top 10 for Agentic Applications” (Dec. 2025), https://genai.owasp.org/2025/12/09/owasp-top-10-for-agentic-applications/ (identifying “Identity & Privilege Abuse” as critical risk; noting weak scoping and dynamic delegation allow escalation through cached credentials or indirect commands).
[13] Acuvity, “Semantic Privilege Escalation: The Agent Security Threat Hiding in Plain Sight” (2025), https://acuvity.ai/semantic-privilege-escalation/.
[14] Promptfoo, “RAG Data Poisoning: Key Concepts Explained” (2025), https://www.promptfoo.dev/blog/rag-poisoning/ (demonstrating five crafted documents can manipulate AI responses 90% of the time); Shen Dong et al., “Memory Injection Attacks on LLM Agents via Query-Only Interaction,” arXiv:2503.03704 (2025) (achieving over 98% success with “MINJA” memory injection attack).
[15] Galileo AI, “Multi-Agent System Failures” (Dec. 2025), reported in Obsidian Security analysis (finding single compromised agent poisoned 87% of downstream decision-making within four hours in simulated environments).
[16] Okta survey (2025), noting 80% of breaches involve compromised identity.
[17] NIST, “Draft Cybersecurity Framework Profile for AI” (NISTIR 8596), comment period closes Jan. 30, 2026, https://www.nist.gov/news-events/news/2025/12/draft-nist-guidelines-rethink-cybersecurity-ai-era (organizing guidance around “secure,” “defend,” and “thwart” focus areas).
[18] Barbara Cuthill, NIST, quoted in “Draft NIST Guidelines Rethink Cybersecurity for the AI Era” (Dec. 16, 2025).
[19] Cal. Code Regs. tit. 2, § 11008 (effective Oct. 1, 2025); Cal. Civil Rights Council, “Employment Regulations Regarding Automated-Decision Systems” (adopted June 27, 2025).
[20] Colo. Rev. Stat. § 6-1-1701 et seq. (SB 24-205, effective June 30, 2026).
[21] N.Y.C. Admin. Code § 20-870 et seq. (Local Law 144, effective July 5, 2023).

/Passle/678abaae4818a4de3a652a62/SearchServiceImages/2025-12-09-19-56-21-177-69387ee52b43241fe164114d.jpg)
/Passle/678abaae4818a4de3a652a62/SearchServiceImages/2026-01-29-22-18-36-507-697bdcbc2627876295afb3a5.jpg)
/Passle/678abaae4818a4de3a652a62/SearchServiceImages/2026-01-29-20-29-07-147-697bc313524cbfed07d5c4bd.jpg)
/Passle/678abaae4818a4de3a652a62/SearchServiceImages/2026-01-28-20-27-46-986-697a71420371dbadd57e501e.jpg)