西安电子科技大学学报 ›› 2019, Vol. 46 ›› Issue (2): 152-157.doi: 10.19665/j.issn1001-2400.2019.02.025

• • 上一篇    下一篇

CNN图像标题生成

李勇1,2,3,成红红1,2,3,梁新彦1,2,3,郭倩1,2,3,钱宇华1,2,3   

  1. 1. 山西大学 大数据科学与产业研究院,山西 太原 030006
    2. 山西大学 计算智能与中文信息处理教育部重点实验室,山西 太原 030006
    3. 山西大学 计算机与信息技术学院,山西 太原 030006
  • 收稿日期:2018-09-22 出版日期:2019-04-20 发布日期:2019-04-20
  • 作者简介:李勇(1993-), 男, 山西大学硕士研究生,E-mail:ly_mrty@163.com.
  • 基金资助:
    国家自然科学基金(61672332);国家自然科学基金(61432011);国家自然科学基金(U1435212);山西省教育厅高等学校中青年拔尖创新人才支持计划(02150116072021);山西省回国留学人员科研项目(2017023)

CNN image caption generation

LI Yong1,2,3,CHENG Honghong1,2,3,LIANG Xinyan1,2,3,GUO Qian1,2,3,QIAN Yuhua1,2,3   

  1. 1. Research Institute of Big Data Science and Industry, Shanxi University, Taiyuan 030006, China
    2. Key Lab. of Computational Intelligence and Chinese Information Processing of Ministry of Education, Shanxi University, Taiyuan 030006, China
    3. School of Computer and Information Technology, Shanxi University, Taiyuan 030006, China
  • Received:2018-09-22 Online:2019-04-20 Published:2019-04-20

摘要:

图像标题生成任务需要生成一个有意义的句子来准确地描述该图像的内容,而现有研究通常采用卷积神经网络编码图像信息、循环神经网络来编码文本信息,由于循环神经网络的“串行特性”,导致模型的性能低。为解决该问题,基于卷积神经网络来构建一种模型,采用不同结构的卷积神经网络来同时处理两个模态的数据,得益于卷积运算的“并行特性”,该模型的运行效率有明显提升。在两个公开数据集上进行了实验,实验结果在指定的评价指标上也有一定的提升,表明了该模型对于处理图像标题生成任务的有效性。

关键词: 多模态数据, 图像标题, 长短期记忆, 神经网络

Abstract:

The image caption generation task needs to generate a meaningful sentence which can accurately describe the content of the image. Existing research usually uses the convolutional neural network to encode image information and the recurrent neural network to encode text information, due to the “serial character” of the recurrent neural network which result in the low performance. In order to solve this problem, the model we proposed is completely based on the convolutional neural network. We use different convolutional neural networks to process the data of two modals simultaneously. Benefiting from the “parallel character” of convolution operation, the efficiency of the operation has been significantly improved, and experiments have been carried out on two public data sets. Experimental results have also been improved in the specified evaluation indexes, which indicates the effectiveness of the model for processing the image caption generation task.

Key words: multi-modal data, image caption, long short term memory, neural networks

中图分类号: 

  • TP183
Baidu
map