西安电子科技大学学报 ›› 2019, Vol. 46 ›› Issue (2): 17-22.doi: 10.19665/j.issn1001-2400.2019.02.004

• • 上一篇    下一篇

一种改进的全局注意机制图像描述方法

马书磊1,2,张国宾2,焦阳1(),石光明1   

  1. 1. 西安电子科技大学 人工智能学院,陕西 西安 710071
    2. 中国电子科技集团公司第二十七研究所,河南 郑州 450047
  • 收稿日期:2018-09-28 出版日期:2019-04-20 发布日期:2019-04-20
  • 通讯作者: 焦阳
  • 作者简介:马书磊(1972-),男 ,研究员,西安电子科技大学博士研究生,E-mail: msl172@163.com.
  • 基金资助:
    国家自然科学基金(61875157);国家自然科学基金(61301288)

Improved method for image caption with global attention mechanism

MA Shulei1,2,ZHANG Guobin2,JIAO Yang1(),SHI Guangming1   

  1. 1. School of Artificial Intelligence, Xidian Univ., Xi’an 710071, China
    2. The 27th Research Institute of China Electronic Technology Group Corporation, Zhengzhou 450047, China
  • Received:2018-09-28 Online:2019-04-20 Published:2019-04-20
  • Contact: Yang JIAO

摘要:

针对现有基于注意机制的图像描述方法全局信息缺失问题,提出了一种改进的全局注意机制图像描述方法。该方法在注意机制的基础上,通过设计全局特征网络来模拟人类感知机制的全过程,对图像全局特征进行增强。将所提方法在相同数据集和网络超参数的情况下与目前最优网络进行实验对比,分析了全局信息对生成文本的影响。实验结果显示,文中提出的方法在更具挑战性的中文文本描述任务上客观评价指标优于目前最优的模型。同时,在主观评价中能够生成更准确的文本内容,也更具丰富性与多样性,接近自然语言描述。

关键词: 图像描述, 注意力机制, 全局特征, 卷积神经网络, 循环神经网络

Abstract:

Aiming at the lack of global information in existing attention based image caption methods, we propose an improved image caption method with global attention mechanism. The proposed method mimics the entire human perception process via designing a global feature extraction network to enhance the global context based on visual attention mechanism. This paper compares the proposed method with the existing attention based image caption technique under the same dataset and hyper parameters, and analyzes the influence of introducing the global feature. The results show that our method outperforms the existing technique in objective evaluations with the challenging Chinese caption dataset. In the subjective evaluation, in the meanwhile, the captions generated by the proposed method describes the image more accurately, vividly and diversely, and they are more close to the natural language.

Key words: image caption, attention mechanism, global feature, convolutional neural network, recurrent neural network

中图分类号: 

  • TP37
Baidu
map