Avatar

NoiseBoost: Alleviating Hallucination with Noise Perturbation for Multimodal Large Language Models

Kai WU1*, Boyuan Jiang1*, Zhengkai Jiang1, Qingdong He1, Shengzhi Wang2, Chengjie Wang1 Qingwen Liu2,
1Tencent Youtu Lab, 2Tongji University
Interpolate start reference image.

NoiseBoost alleviates hallucination by redistributing attention weights of MLLMs with noise perturbations.

Abstract

Multimodal large language models (MLLMs) contribute a powerful mechanism to understanding visual information building on large language models. However, MLLMs are notorious for suffering from hallucinations, especially when generating lengthy, detailed descriptions for images.

Our analysis reveals that hallucinations stem from the inherent summarization mechanism of large language models, leading to excessive dependence on linguistic tokens while neglecting vision information.  In this paper, we propose NoiseBoost, a broadly applicable and simple method for alleviating hallucinations for MLLMs through the integration of noise feature perturbations. Noise perturbation acts as a regularizer, facilitating a balanced distribution of attention weights among visual and linguistic tokens.

Despite its simplicity, NoiseBoost consistently enhances the performance of MLLMs across common training strategies, including supervised fine-tuning and reinforcement learning. Further, NoiseBoost pioneerly enables semi-supervised learning for MLLMs, unleashing the power of unlabeled data. Comprehensive experiments demonstrate that NoiseBoost improves dense caption accuracy by 8.1\% with human evaluation and achieves comparable results with 50\% of the data by mining unlabeled data. The code and data will be made publicly accessible.

Framework

framwork.
Framework for NoiseBoost on supervised finetuning, reinforcement learning, and semi-supervised learning.

Caption Show Cases

Image 1
Captioning for animals
Image 2
Captioning for arts
Image 3
Captioning for people
Image 3
Captioning for scenaries

BibTeX


        @article{wu2024noiseboost,
          title={NoiseBoost: Alleviating Hallucination with Noise Perturbation for Multimodal Large Language Models},
          author={Wu, Kai and Jiang, Boyuan and Jiang, Zhengkai and He, Qingdong and Luo, Donghao and Wang, Shengzhi and Liu, Qingwen and Wang, Chengjie},
          journal={arXiv preprint arXiv:2405.20081},
          year={2024}
        }