MERMAID: Multi-perspective Self-reflective Agents with Generative Augmentation for Emotion Recognition

EMNLP 2025
1Lanzhou University,
2Imperial College London,
3University of Exeter,
4BCML, Heriot-Watt University

Corresponding author
MY ALT TEXT

Results demonstrate the practicality and generalizability of ReChar across various reference style images, characters, and prompts.

We introduce MERMAID, a multi-agent framework that reflects, augments, and verifies emotions across modalities.

Abstract

Multimodal large language models (MLLMs) have demonstrated strong performance across diverse multimodal tasks, achieving promising outcomes. However, their application to emotion recognition in natural images remains underexplored. MLLMs struggle to handle ambiguous emotional expressions and implicit affective cues, whose capability is crucial for affective understanding but largely overlooked. To address these challenges, we propose MERMAID, a novel multi-agent framework that integrates a multi-perspective self-reflection module, an emotion-guided visual augmentation module, and a cross-modal verification module. These components enable agents to interact across modalities and reinforce subtle emotional semantics, thereby enhancing emotion recognition and supporting autonomous performance. Extensive experiments show that MERMAID outperforms existing methods, achieving absolute accuracy gains of 8.70%–27.90% across diverse benchmarks and exhibiting greater robustness in emotionally diverse scenarios.

Poster

<--BibTex citation --> <--

BibTeX

@inproceedings{yang-etal-2025-mermaid,
    title = "{MERMAID}: Multi-perspective Self-reflective Agents with Generative Augmentation for Emotion Recognition",
    author = "Yang, Zhongyu  and
      Song, Junhao  and
      Song, Siyang  and
      Pang, Wei  and
      Yuan, Yingfang",
    editor = "Christodoulopoulos, Christos  and
      Chakraborty, Tanmoy  and
      Rose, Carolyn  and
      Peng, Violet",
    booktitle = "Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing",
    month = nov,
    year = "2025",
    address = "Suzhou, China",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2025.emnlp-main.1252/",
    doi = "10.18653/v1/2025.emnlp-main.1252",
    pages = "24650--24666",
    ISBN = "979-8-89176-332-6"}
}
--> <--End BibTex citation -->