Skip to content
AI & Future Technologies

Why consensus is not verification: How to build AI advisors that argue productively

Bryce Engelland  Enterprise Content Lead / Innovation & Technology / Thomson Reuters Institute

· 6 minute read

Bryce Engelland  Enterprise Content Lead / Innovation & Technology / Thomson Reuters Institute

· 6 minute read

Because consensus can amplify shared blind spots, AI is most useful as an executive advisor when it is deliberately designed to gather and preserve meaningful disagreement across diverse agents and models, allowing leaders to slow down, verify, and make better decisions

Key insights:

      • Consensus among AI systems is not the same as correctness — Agreement between AI models often signals shared blind spots, not truth; and AI errors can be highly correlated across instances and even across model families.

      • Productive disagreement must be explicitly designed into AI advisors — Multi‑agent AI systems are most effective when they are intentionally built to preserve meaningful disagreement, not just to synthesize a unified response.

      • The future of AI advisory mirrors long‑standing human decision-making — Modern multi‑agent AI design has a long historical lineage; yet, across all examples, the same principle holds: The best decision systems are engineered for internal conflict.


In this new two‑part blog series, we explore why AI works best as an executive advisor not by delivering consensus answers, but by being intentionally designed to identify, preserve, and productively leverage disagreement. In the first part, we saw why a single AI advisor is structurally vulnerable; now, in this concluding part, we look at what happens when you design disagreement on purpose.

The academic evidence for multi-agent AI systems has been building rapidly, and the most important findings aren’t about the power of agreement. They’re about the danger of it.

In February, Perplexity launched its Model Council, a product that sends every query simultaneously to three frontier AI models (Claude, GPT, and Gemini) then uses a fourth chair model to synthesize a unified answer. The product’s value proposition isn’t that three models produce a better answer than one; rather, it’s that divergence between models is treated as a signal. When models converge, that indicates confidence, but when they diverge, that indicates the user should slow down.

Studies have borne this out. Multi-agent debate consistently improves reasoning accuracy compared to single-model generation, and researchers at the University of Göttingen found that three agents were a strong configuration, with their voting protocols outperforming other decision structures. However, potentially the most important finding cuts against the hype. In a 2026 paper, Consensus is Not Verification, the authors demonstrated that AI model errors are highly correlated both within and across model families. When three instances of the same model agree, it doesn’t mean they’re right, rather it means they may share the same blind spots. Aggregation increases consensus faster than it increases truth.


The future of AI-assisted executive decision-making may look less like a single brilliant oracle and more like a room full of advisors that may often disagree because that’s how the best decisions have always been made.


This finding cuts both ways for practitioners like Thomson Reuters enterprise architect Zafar Khan and his two AI advisors, Adrian and Elara, that were built on the same underlying model but differentiated by their analytical frameworks rather than their architecture. The divergence they produce is real and visible. For example, the analysis the two AI advisors did on a deal undertaken by Eaton Corp., in particular generated genuinely different conclusions because the two advisors were oriented towards different priorities.

Yet, research suggests that same-model divergence, while effective, has a ceiling. Prompt-driven personas can ask different questions, but they share the same training, the same blind spots, and the same failure modes. Khan is candid about this, noting that his current system is in the “very early” stages and is not a finished product. The value right now, he says, isn’t that Adrian and Elara are equivalent to truly independent minds, it’s that even a first-generation version of structured disagreement can identify insights that a single advisor would miss. It’s a large stride rather than an arrival at the ultimate destination.

The future of AI advisory is in the past

The principle behind this diverging analysis concept isn’t new. Indeed, it might be one of the oldest ideas in institutional design, rediscovered independently by many institutions that had to make decisions under uncertainty. Socrates built a philosophical method around cross-examination; Pope Sixtus V formalized opposition by creating the Devil’s Advocate in 1587; and the RAND Corporation operationalized it during the Cold War with the Delphi Method, using structured anonymous iteration to prevent groupthink.

The through-line across two millennia is simply that the best decision-making systems don’t minimize disagreement, rather, they engineer it.

Thomson Reuters’ Zafar Khan

Today, the developer community now uses production-grade code review tools to assign architecture, security, and functionality analysis to separate agents, using majority voting for routine decisions and unanimous consent for irreversible ones. And what Khan has built and what Perplexity, Microsoft’s Agent Framework, and a growing ecosystem of multi-agent tools are now pursuing, are the latest iterations of the simple concept: Internal conflict is not a system failure, it is a design requirement.

The question is no longer “whether”

Khan’s vision for what should sit at the decision table is specific — five AI advisors spanning technology, finance, regulation, workforce, and geopolitical risk. Each applies its own analytical framework, with the human executive responsible for integration and final judgment. The guardrails are three: i) transparency about what data the system uses; ii) verifiability that sources are legitimate; and iii) human accountability at every decision point.

“The race towards AGI [artificial general intelligence] is moving faster,” Khan acknowledges, adding that the human needs to be in the loop in order to bring AI to work in a governance fashion and an ethical way.

“I want to show the interaction between human and AI advisor, how they’re thinking through the problem together,” he explains. “Where the human judgment covers the analysis and where it diverges.” In other words, when the AI advisors agree, that’s your green light. When they diverge, that’s the conversation your board should be having.

The future of AI-assisted executive decision-making may look less like a single brilliant oracle and more like a room full of advisors that may often disagree because that’s how the best decisions have always been made. The technology to build that room now exists; however, the question is whether today’s leaders have the discipline to listen when the room argues back.


For more on AI transformation in the professional services market, you can download the Thomson Reuters Institute’s 2026 AI in Professional Services Report

More insights