AI and metacognition: knowing when to trust a machine, or not, is not always obvious

Share:

Contemporary artificial intelligence (AI) governance frameworks rest on a rarely made-explicit assumption: when a human operator receives output from an AI system, they must be able to meaningfully evaluate it. The provisions of the European AI Act relating to high-risk systems require transparency, explainability, and human oversight.

Explicitly targeted are the systems used in the recruitment and evaluation of workers, access to social benefits, credit granting decisions, border control, administration of justice, and critical health care.

The US AI action plan calls for maintaining meaningful human control over AI decisions with significant consequences. The OECD Principles on AI place human-centeredness at the heart of its commitments.

These commitments are necessary but insufficient. They focus on what AI systems must provide to human operators and leave entirely unanswered the question of what these operators must be able to do to act on what they receive. This gap is not accidental. It is a structural blind spot in the current architecture of AI governance.

The implicit model of the human supervisor in most regulatory texts is that of a competent and attentive professional who, faced with precise and legible outputs, formulates informed judgments. This is a plausible assumption in stable, low-stakes, and well-controlled environments, but a fragile one in high-stakes, time-pressured, and technically opaque contexts—precisely the contexts in which AI systems are increasingly being deployed.

For example, the emergency room nurse responsible for triage who receives a triage score generated by an AI system doesn’t always have access to the explanations behind it. The bank advisor who must decide within minutes whether to block an account based on an automated fraud alert is potentially working with a proprietary model they cannot query. The administrative officer who approves the allocation of social housing or an algorithmically prioritized benefit generally cannot explain why one application was ranked higher than another. The teacher who countersigns an automated exam grading does not have access to the criteria that produced the score. In each of these cases, human oversight is formally present—and essentially impossible.

Metacognitively savvy operators

Metacognition—the ability to monitor and regulate one’s own cognitive processes—is the psychological basis for effective supervision. A metacognitively aware operator knows when they understand something, when they are conjecturing, and when their judgment is influenced by factors they have not consciously registered. This ability cannot be assumed; it varies significantly among individuals, training, and situational pressures.

Research in human-automation interaction has documented a set of failure modes that emerge specifically when humans supervise automated or AI-powered systems. Automation bias—the tendency to overemphasize machine-generated recommendations over one’s own judgment—is one of the most robust findings in the field. In a frequently cited study, researchers Parasuraman and Riley showed in 1997 that humans systematically misuse (i.e., misuse or inappropriately employ) automation by applying it where it is unreliable and neglecting it where it would be beneficial—two types of errors that reflect a failure of metacognitive calibration rather than a failure to provide information. For example, in flight simulator experiments cited by these authors, pilots equipped with an automatic warning system shut down an engine in response to a false alarm – a decision they themselves had stated, prior to the experiment, they would never make based solely on an automated alert.

The challenge is compounded by the specific characteristics of contemporary AI systems. Kahneman’s work on dual-process cognition—also known as System 1/System 2, or the two speeds of thinking—sheds light on this mechanism. When faced with an AI system that produces output smoothly and confidently, the human mind tends to activate rapid and intuitive processing (the kind used for familiar and low-risk tasks), rather than performing a deeper, more deliberate, more logical, and therefore more cognitively demanding, analysis of the situation.

More specifically, an explanation that seems plausible triggers different cognitive responses than one that truly is. When AI system explanations are synthetically fluid, numerically accurate, and visually formatted as authoritative outputs, they eliminate precisely the skepticism required for meaningful oversight .

Perhaps counterintuitively, providing more explanations does not reliably improve human judgment of AI outputs. A research team, in a rigorous experimental study , found that AI-generated explanations did not consistently improve the performance of the human-AI team, and in fact degraded it under several conditions – notably when the explanations were technically accurate but cognitively incompatible with how the operators formed their own judgments.

More specifically, in the sentiment analysis task, the AI ​​explained its judgment by highlighting the words it had identified as positive or negative. However, human participants evaluated the tone of a text holistically, taking into account the context and overall coherence—a process that highlighting individual words cannot replicate. Here, the AI ​​and the human do not arrive at their judgment via the same path: the AI ​​identifies local elements (a word, a sentence), whereas the human constructs a holistic judgment (the entire text, the context, internal coherence). When the explanation provided reflects the machine’s logic rather than human reasoning, it does not give the operator the tools to assess whether the recommendation is reliable—it simply convinces them to follow it.

Explainability is thus a necessary but insufficient condition for effective supervision. What bridges the gap between the two is metacognitive maturity.

Three implications for AI governance

If metacognitive maturity is a real and variable property of human operators, then governance frameworks that mandate explainability without considering operator metacognition are simply incomplete. According to the scientific literature—including research on explainable AI, human-automation interaction, cognitive science, psychology, and the humanities and social sciences— three implications can be stated  :

  • Documentation-centric transparency is insufficient. This isn’t a hunch; research has shown this for thirty years . Simply documenting and explaining a system’s behavior isn’t enough to guarantee sound human decisions without involving individuals in the design processes of these explanations and this documentation, and without taking into account the context of the business need at any given time. Controlled studies have even shown that “too much explanation” can degrade the performance of the human-AI team by burying relevant information in noise .
  • The metacognitive skills of operators should be considered a component of AI governance. This is a gap that research has begun to identify, although no formal framework has yet been established.

More specifically, regulations like the AI ​​Act require human supervisors to be “competent,” but without ever defining what that means—and in particular, no framework assesses what researchers call metacognitive competence, the ability to detect flaws in one’s own reasoning when faced with an opaque system. This competence stems from training and context, not raw intelligence. An important clarification is necessary here. Discussing the metacognitive competence of operators is not about questioning the value or intelligence of the people who supervise AI systems. Nor is it about ranking humans according to their ability to “think well.” Metacognition is neither a personality trait nor an indicator of value. It is a situational skill, sensitive to context, training, cognitive load, and working conditions. For example, an experienced surgeon may exhibit excellent metacognitive calibration in their field and be just as vulnerable to automation bias as a novice facing an opaque AI system in a context for which they have received no specific training.

  • Metacognitive skills—knowing what one understands, detecting one’s own reasoning errors, and regulating one’s cognitive strategies—vary among individuals and are not uniformly distributed within the population, creating a structural security risk. This is a hypothesis, formulated based on research in educational psychology, that has not yet been studied in the context of AI governance. It may be the next area of ​​research that governments should actively encourage. Indeed, while organizations with the best material and human resources can meet the requirements of effective oversight, those without—not because their staff are less capable, but because the conditions for developing this situational skill have not been met—will produce a superficial, insufficient compliance, generating a false sense of security that is particularly dangerous in critical areas.

Author Bio: Ikram Chraibi Kaadoud is an XAI & Cognitive Science Researcher at Inria

Tags: