Skip to content
Legal Technology

The AI Law Professor: When the new AI model disappoints

Tom Martin  Author & Professor

· 7 minute read

Tom Martin  Author & Professor

· 7 minute read

GPT‑5’s rollout demonstrates that as AI becomes more autonomous and reshapes our relationship with it, law firms and other industries should commit to principled governance, not just experimentation, to ensure safe and trustworthy deployment

Key takeaways:

      • GPT‑5 will make a difference — GPT-5 is a new, unified system that automatically routes your prompt to different reasoning modes, which boosts performance but reduces the manual control that users rely on.

      • Results still see errors — Benchmarks and marketing promised “PhD‑level” intelligence, yet highly public misfires like error‑strewn maps show the limits that still matter in practice.

      • Disillusionment sets in with GenAI — Gartner’s recent analysis suggests GenAI is sliding into the trough of disillusionment, which makes governance and expectation-setting more important than ever for legal teams.


Welcome back to The AI Law Professor. In the last column, I unpacked what true AI agents would require and argued for four deployment principles that legal teams can use today: transparency, autonomy, reliability, and visibility. This month, I’m applying that lens to GPT‑5’s release, comparing the promise to the performance and asking what the reaction tells us about control, our relationship with these systems, and practical capability in day‑to‑day legal work.

If you learned to drive on a manual transmission, you remember the feeling of control. You listened to the engine, watched the tachometer, chose the gear, and lived with the result. GPT‑5 — the fifth in OpenAI’s GPT models that was launched in late-August — feels like switching the profession’s favorite stick‑shift to an automatic. It is faster and, in many contexts, smarter, but it also decides for itself when to shift.

OpenAI’s announcement describes GPT‑5 as “one unified system” with a real‑time router that chooses between a fast mode and deeper “thinking” modes. In other words, ChatGPT now decides when to sprint and when to grind through a harder problem. Microsoft echoed that description, emphasizing a router that picks “the right tool for the task” across its Copilot stack.

For many lawyers who are early adopters, that is a gift and grief. It streamlines routine use, yet it also transfers a critical bit of craft from human hands to the AI platform.

Expectations meet a colder reality

On launch day, OpenAI CEO Sam Altman leaned into a powerful metaphor, calling GPT‑5 “like having a team of PhD‑level experts in your pocket.” That framing primes all of us to expect near‑expert performance on whatever we ask. Then the internet filled with examples of GPT‑5 inventing state names on a map of the United States and garbling presidential timelines inside graphics.

The mismatch is not trivial. If you tell people they are getting a pocket full of doctorates, flubs on basic geography feel like malpractice. Yet part of the story sits with us. We co‑author the hype as we cherry‑pick astonishing demos and anthropomorphize styles as smart, warm, or trustworthy. We then confuse style or personality with reliability.

Indeed, GPT‑5’s own release notes promise fewer hallucinations and stronger benchmarks in math, coding, and multimodal tasks, which are real and measurable improvements. However, no benchmark guarantees competence across every quirky real-world request, especially ones that include both text and image rendering into one test.


You can find all The AI Law Professor’s columns here


During the GPT‑5 rollout, OpenAI initially retired several options in the ChatGPT model picker and auto‑mapped old threads to GPT‑5 equivalents. That move disrupted established workflows and, for paying users, it felt like losing colleagues with distinct work habits and talents. After a backlash, OpenAI relented and changed some things back.

Yet, this was all more than an interface tweak. It touches on a legal tech question that is older than large language models: How much control are you willing to trade for convenience?

Relationship, control & capability

Lawyers did not just use prior GPT models, they built relationships with them. People trusted the tone and tempo of different models, and they invested in long threads that felt like real conversations. When those options disappeared, the reaction sounded personal and that was telling. It shows us we are already treating these systems as teammates, not tools, which raises the stakes for change management. Even OpenAI’s notes emphasize they reduced sycophancy and refined style, implicitly acknowledging that tone does matter.

At the same time, capability is not uniform. GPT‑5’s measurable gains are real, yet they coexist with the brittleness of multimodal text‑in‑image rendering and other edge behaviors. That is not hypocrisy — it is a reminder that most capable overall does not mean best for every task. The right lens is comparative advantage, not universal superlatives.

Recent analysis by Gartner suggests generative AI (GenAI) is moving past the peak, sliding toward the trough of disillusionment because many 2024 projects under‑delivered. In fact, less than 30% of AI leaders report their CEOs are happy with AI investment return, which carried an average spend of $1.9 million on GenAI initiatives in 2024. GPT‑5 landed in the middle of that slide. In that climate, one over‑promised demo or one clumsy deprecation can overshadow a long list of genuine improvements.

For practicing lawyers, there is a practical lesson here. Expect steady progress, not magic. Expect continued rapid change with smaller windows for adaptation. Expect platform strategy to change, which means your governance must be able to flex without sacrificing quality or obligations to clients and courts.

What legal teams can do now

There are several steps that corporate legal teams can take to help ease their transition into GPT-5, including:

      • Select specific versions of AI models — If you rely on a specific behavior, insist on options for model pinning, change windows, and rollback. This may require you to use OpenAI’s API rather than the more consumer friendly ChatGPT. Also, if you use ChatGPT Business or Enterprise, learn the legacy model access policy and set internal timelines for migration. These details can mean the difference between legally sound analysis and slop.
      • Test like you bill — Keep a simple checklist of the work you give the AI tool, such as e‑discovery summaries, brief outlines, citation checks, transcript cleanups, and RFP drafts. For each item, define what good looks like and score results on accuracy, completeness, and consistency — not on how persuasive the tone feels.
      • Separate tone from truth — A model that feels right is not necessarily more reliable; where a model that feels blunt may be more honest about uncertainty. GPT‑5’s training explicitly tries to reduce sycophancy and improve clarity. Treat tone as a configurability issue, not a reliability metric.
      • Keep human control visible — Previously, I argued for four principles that also apply here: transparency, autonomy, reliability, and visibility. The router helps with autonomy and sometimes reliability; but your job is to assert transparency and visibility, especially around what the model cannot see. Build logging and review points, for humans in the loop, then make those gaps explicit to the supervising lawyer.

Right‑sizing expectations

So, was GPT‑5 overhyped? Probably. Are we also complicit in that hype? Also, yes. We want one model that is at once a perfect writer, paralegal, researcher, designer, and cartographer. We treat the best average performer as a sure thing on every task, then we feel betrayed when it stumbles.

A better stance for the legal profession is modesty plus rigor. Take the real gains GPT‑5 delivers, such as stronger coding and reasoning modes and fewer hallucinations on many real‑world prompts. Keep manual control where it matters, and do not let any router, however clever, become a change agent you cannot see or understand.

If GPT‑5 is truly the automatic transmission, you need to keep your hand near the gearshift. Know when to let it shift for you and know when to downshift yourself. That is how you get speed without giving up control.


Next column, we’ll examine how to fashion an AI governance policy that actually works

More insights