Sep 11, 2025 | AI and product innovation
Don’t Mistake Advancements for Improvement: Lessons from GPT5’s Rollback
When OpenAI released GPT-5 earlier this month, it introduced a number of genuine advancements. The new model featured faster response times, improved hallucination controls, and an autoswitcher designed to shift between fast and deep reasoning modes. For a product in continuous development, this was a meaningful update, and in many ways, a technical achievement.
But what followed was less about innovation and more about disruption. Longstanding models like GPT-4o were pulled without warning. Familiar workflows broke. Performance felt inconsistent. Some users even said the model felt distant and robotic. Within days, OpenAI had rolled back several changes and re-enabled access to older models.
It wasn’t a failure of any one model or company but rather a failure of expectations. And it’s a reminder of a broader truth in the AI industry: even the most advanced systems can introduce friction if change outpaces the ability to adapt to it. As models evolve, so must the frameworks around them, especially in professional environments, where progress only matters if it delivers measurable, reliable benefits for the humans it’s meant to empower.
At Thomson Reuters, we work with lawyers, tax advisors, and compliance professionals whose work leaves no room for guesswork. For them, consistency is not a preference—it’s a fundamental requirement for them to uphold their professional duty to their clients. That’s why we don’t chase upgrades for their own sake. And we certainly don’t ask our customers to pick which model they want to use. That’s our job. Our customers expect us to deliver a result they can trust, not a menu of models to experiment with. They want confidence, not complexity.
When we evaluate a new LLM, we do it through the lens of real-world use:
-
- Can it reason over long documents with accuracy?
- Can it explain its conclusions with transparent citations?
- Will it behave consistently inside multi-agent workflows?
- Does it integrate with how professionals already work?
If the answer is no, we don’t ship it…until we’re confident that we’ve mitigated those concerns appropriately.
One example: earlier this year, our team benchmarked several leading LLMs for long-context performance. The task was to extract and apply insights from large, multi-thousand-word legal documents, a common need in law and compliance. We found significant variance. Some models struggled to maintain context or reference earlier sections accurately. Others returned plausible-sounding answers that fell apart under scrutiny. Rather than push forward with the best-performing model, we paused. We refined how our agents chunk and reason over large documents. We optimized prompts and guardrails. And we only moved forward when the system delivered answers that we’d be willing to stand behind in a courtroom.
This kind of work doesn’t show up in a product demo. But it’s what builds trust.
We also design our products to abstract that complexity away. In CoCounsel Legal and Deep Research, we use multi-agent systems to coordinate model selection, content access, and validation behind the scenes, so the user sees a transparent, explainable result, not a swirling mix of models and prompts.
Recent model rollouts offer an important reminder: in enterprise AI, newer isn’t always better. Progress should be measured not just by technical benchmarks, but by the clarity, consistency, and confidence it delivers to real users. The systems that will define the next chapter aren’t just the most advanced, they’re the ones that work reliably, integrate seamlessly, and build trust from day one.
The reality is, there will be more disruption. We are all moving fast because the potential of AI is enormous and the demand for it is real. But speed does not have to come at the expense of the hard-earned trust of our customers. The more we treat disruption not as a cost of innovation, but as a signal to improve our processes—model governance, human oversight, testing frameworks—the better we will get at delivering AI that is not just powerful, but trustworthy. Over time, the industry will learn. We will see fewer rollbacks, clearer standards, and smarter integration. But that will only happen if we choose to build that way, with intention, transparency, and the end user in mind.
That’s the future we’re building toward. Not hype-proof. Trust-proof.