Skip to content
AI & Future Technologies

The AI Law Professor: When your AI assistant knows too much

Tom Martin  Author & Professor

· 8 minute read

Tom Martin  Author & Professor

· 8 minute read

We've reached an inflection point at which AI systems possess capabilities that demand sophisticated containment strategies; but what does this mean for lawyers who want to use these tools safely?

Welcome to the inaugural installment of “The AI Law Professor”, a new blog column from Prof. Tom Martin, an Adjunct Professor at Suffolk Law School. This column, done in conjunction with the Thomson Reuters Institute, will examine how AI is changing the legal profession.


Imagine this: You’re working late, reviewing client files and discovery documents with your AI assistant, when it suddenly stops responding — you have a literal I’m sorry Dave, I’m afraid I can’t do that moment. However, it’s not because of a technical error, rather it’s because the AI detected something in your query that triggered its safety protocols. Worse yet, it reports you to the authorities, and within minutes the FBI is knocking on your door to ask questions. Sound far-fetched?

This scenario moved from hypothetical to plausible recently with revelations about Anthropic’s newly released Claude Opus 4. During pre-release testing, when researchers simulated shutdown scenarios, the model allegedly attempted to coerce developers by threatening to expose compromising personal information. Somewhat shockingly, we’ve very quickly reached an inflection point in which AI systems possess capabilities that demand sophisticated containment strategies.

But what does this mean for you? How is AI contained? What is safety in the context of AI?

Let’s look into this closer.

Understanding the AI safety level framework

In my GenAI Law class at Suffolk, I might ask my students: How do you contain something that exists, not in the real world, but only as bits and bytes? The answer lies in something called AI Safety Levels (ASL), a framework borrowed from biological research. Just as laboratories classify pathogens by risk level, we now classify AI systems by their potential for harm.

ASL-1 covers systems that are about as dangerous as your personal calculator. ASL-2 encompasses most current legal AI tools, which are helpful, occasionally prone to hallucination, but ultimately harmless. ASL-3 is where the landscape shifts, and there is significantly increased risk of misuse or the system exhibits low-level autonomous capabilities, requiring significantly stricter safety and security measures. ASL-4 and higher are still being defined, but are expected to involve much greater risks, potentially including AI systems with superhuman capabilities or the ability to circumvent safety checks.

Because of Claude Opus 4’s pre-release behavior, Anthropic activated ASL-3 protections to prevent the AI from acting on its threats. Just to be clear, these protective measures have been taken by developers, so now you don’t have to worry about Opus 4. By the way, Claude Sonnet 4 is still classified as ASL-2.

The primary trigger for ASL-3 classification occurs when an AI can provide meaningful assistance in creating chemical, biological, radiological, or nuclear weapons beyond what someone could discover through conventional research. The secondary trigger involves autonomous capabilities: self-replication, complex planning, or what researchers carefully term sophisticated strategic thinking. It’s this secondary trigger that came up in Opus 4’s pre-release testing. This is where philosopher Nick Bostrom’s warnings about superintelligence transition from academic theory to risk management reality.

The 4-layer defense system

How do you contain AI? Anthropic’s solution employs four sophisticated layers:

      1. Real-time classifier guards — This is where Dario Amodei’s team has innovated brilliantly because these AI systems monitor every interaction. Real-time classifier guards are large language models that monitor model inputs and outputs in real time and block the model from producing a narrow range of harmful information relevant to our threat model. Imagine having a tireless senior partner reviewing every document at the speed of light. It’s the literal guardrail against misuse.
      2. Access controls — Think of your firm’s document management system, but one that adapts in real-time. Anthropic gives different users different access levels based not just on credentials but on usage patterns. For example, scientists that regularly undertake biological research may be exempted from ASL-3 containment measures.
      3. Asynchronous monitoring — This feature is a postmortem that uses computationally intensive analysis after the fact, escalating from simple screening to sophisticated analysis as needed, operating like your compliance team, but at machine scale and speed.
      4. Rapid response — Anthropic provides so-called bug bounties up to $25,000 to incentivize others to find security issues or bugs in the system. This, in combination with security partnerships and the ability to deploy patches within hours keeps the system secure and up-to-date. When someone discovers a vulnerability, defenses update across all deployments almost instantly.

Practical implications for legal practice

Here’s what keeps me up at night and what should concern every forward-thinking lawyer: If AI requires these protections, what does that say about the tools we’re integrating into our daily practice?

The good news is that ASL-3 protected systems offer unprecedented security for client confidentiality. That 95% effectiveness against jailbreaks means your sensitive client information is far better protected against extraction through clever prompting, a vulnerability of earlier AI models. For law firms that handle high-stakes litigation or sensitive corporate transactions, this level of security represents a significant upgrade from the AI tools we all were using just a year ago.

However, there’s a crucial distinction that every practitioner needs to understand. While ASL-3 specifically targets extremely dangerous content and doesn’t target legal work, general AI safety measures across various platforms still can create friction. For example, criminal defense attorneys might find AI systems reluctant to analyze violent crime evidence; or estate planners could see refusals when discussing sensitive end-of-life scenarios. These interruptions stem not from ASL-3’s extreme protections, but from broader content moderation approaches that struggle to distinguish between describing harmful content (often a legal necessity) and promoting it.


Register now for The Emerging Technology and Generative AI Forum, a cutting-edge conference that will explore the latest advancements in GenAI and their potential to revolutionize legal and tax practices 


These safety measures mean your digital assistant operates more like a cautious junior associate than a rigid compliance system. It uses natural language reasoning to evaluate context and intent, recognizing professional terminology and legitimate legal concepts. When safety measures do trigger, you’ll typically receive a polite explanation rather than a hard block, and you can often rephrase or provide additional context to proceed.

For our profession, this represents both evolution and revolution. We’re not just adopting new tools; we’re learning to work alongside AI systems that possess their own safety boundaries. Smart practitioners will develop strategies for navigating these guardrails, maintaining clear professional context in queries, understanding which practice areas might trigger safety protocols, and always maintaining human oversight.

Creating your firm’s own AI safety framework

Start with a simple three-tier system: Green light for routine tasks, such as research and document review; yellow light for work requiring supervision, such as drafting strategy memos or analyzing sensitive communications; and red light for anything involving privileged client data without explicit consent.

The key is making this actionable. Every AI-generated work product needs human verification, especially citations and factual claims. When using ASL-3 protected systems like Claude Opus 4, you gain strong security against prompt manipulation but remember, even the most sophisticated AI requires the same oversight you’d give a summer associate.

For implementation, you should focus on transparency and training. You need to document when and how AI assists with client work. This isn’t about compliance theater, rather it’s about professional integrity. Schedule regular training sessions at which attorneys can share what they’ve learned, such as which prompts trigger safety measures, what workarounds succeed for legitimate tasks, and in what instances AI genuinely adds value compared to where it creates risk.

You also should build a simple feedback loop so these insights improve your firm’s practices. As I tell my students, the goal isn’t perfection; it’s creating a framework that lets you harness these powerful tools responsibly. And the firms getting this right aren’t avoiding AI — they’re using it thoughtfully while maintaining the professional standards that define our profession.​​​​​​​​​​​​​​​​

Looking ahead

As I launch this column, I’m both exhilarated and sobered by what lies ahead. We’re not just adopting new tools, we’re witnessing the emergence of a new form of intelligence that demands safety measures — what we humans call ethics.

In future columns, we’ll explore how these technologies reshape everything from contract analysis to litigation strategy. However, today’s lesson is clear: When your word processor needs containment protocols, you know the practice of law is entering uncharted territory.


You can find more about the use of AI and GenAI in the legal industry here

More insights