InEx: Hallucination Mitigation via Introspection and Cross-Modal Multi-Agent Collaboration

Zhongyu Yang1,2,* Yingfang Yuan2,* Xuanming Jiang1† Baoyi An Wei Pang,
AAAI 2026
1Xi’an Jiyun Technology Co., Ltd.,
2BCML, Heriot-Watt University,

*Equal contribution

Corresponding author

Abstract

Hallucination remains a critical challenge in large language models (LLMs), hindering the development of reliable multimodal LLMs (MLLMs). However, existing solutions often rely on human intervention or underutilize the agent's ability to autonomously mitigate hallucination. To address these limitations, we draw inspiration from the way humans make reliable decisions in the real world. In particular, they begin with introspective reasoning to reduce uncertainty and form an initial judgment, then rely on external verification from diverse perspectives to reach a final decision. Motivated by this cognitive paradigm, we propose \textbf{InEx}, a training-free, multi-agent framework designed to autonomously mitigate hallucination. InEx introduces internal introspective reasoning, guided by entropy-based uncertainty estimation, to improve the reliability of the decision agent's reasoning process. The agent first generates a response, which is then iteratively verified and refined through external cross-modal multi-agent collaboration with the editing agent and self-reflection agents, further enhancing reliability and mitigating hallucination. Extensive experiments show that InEx consistently outperforms existing methods, achieving 4\%-27\% gains on general and hallucination benchmarks, and demonstrating strong robustness.

Method

MY ALT TEXT

Overview of \textbf{InEx}. In: A decision agent initiates introspective reasoning guided by unsupervised uncertainty estimation, producing an initial response grounded in internal uncertainty signals. Ex: The response is then iteratively refined through alternating cross-modal verification and introspective updates, where self-reflection agents assess consistency with visual and textual evidence, and the decision agent continues to revise its output accordingly.

BibTeX

<>@article{yang2025script,
title={Script: Graph-Structured and Query-Conditioned Semantic Token Pruning for Multimodal Large Language Models},
author={Zhongyu Yang and Dannong Xu and Wei Pang and Yingfang Yuan},
journal={Transactions on Machine Learning Research},
issn={2835-8856},
year={2025},
url={https://openreview.net/forum?id=F6xKzbgcHq},
note={}
}