CNN图像标题生成

doi:10.19665/j.issn1001-2400.2019.02.025

西安电子科技大学学报 ›› 2019, Vol. 46 ›› Issue (2): 152-157.doi: 10.19665/j.issn1001-2400.2019.02.025

CNN图像标题生成

李勇^1,^2,³,成红红^1,^2,³,梁新彦^1,^2,³,郭倩^1,^2,³,钱宇华^1,^2,³

^1. 山西大学大数据科学与产业研究院,山西太原 030006
^2. 山西大学计算智能与中文信息处理教育部重点实验室,山西太原 030006
^3. 山西大学计算机与信息技术学院,山西太原 030006

收稿日期:2018-09-22 出版日期:2019-04-20 发布日期:2019-04-20
作者简介:李勇(1993-), 男, 山西大学硕士研究生,E-mail:ly_mrty@163.com.
基金资助:
国家自然科学基金(61672332);国家自然科学基金(61432011);国家自然科学基金(U1435212);山西省教育厅高等学校中青年拔尖创新人才支持计划(02150116072021);山西省回国留学人员科研项目(2017023)

CNN image caption generation

LI Yong^1,^2,³,CHENG Honghong^1,^2,³,LIANG Xinyan^1,^2,³,GUO Qian^1,^2,³,QIAN Yuhua^1,^2,³

^1. Research Institute of Big Data Science and Industry, Shanxi University, Taiyuan 030006, China
^2. Key Lab. of Computational Intelligence and Chinese Information Processing of Ministry of Education, Shanxi University, Taiyuan 030006, China
^3. School of Computer and Information Technology, Shanxi University, Taiyuan 030006, China

Received:2018-09-22 Online:2019-04-20 Published:2019-04-20

摘要/Abstract

摘要：

图像标题生成任务需要生成一个有意义的句子来准确地描述该图像的内容,而现有研究通常采用卷积神经网络编码图像信息、循环神经网络来编码文本信息,由于循环神经网络的“串行特性”,导致模型的性能低。为解决该问题,基于卷积神经网络来构建一种模型,采用不同结构的卷积神经网络来同时处理两个模态的数据,得益于卷积运算的“并行特性”,该模型的运行效率有明显提升。在两个公开数据集上进行了实验,实验结果在指定的评价指标上也有一定的提升,表明了该模型对于处理图像标题生成任务的有效性。

关键词: 多模态数据, 图像标题, 长短期记忆, 神经网络

Abstract:

The image caption generation task needs to generate a meaningful sentence which can accurately describe the content of the image. Existing research usually uses the convolutional neural network to encode image information and the recurrent neural network to encode text information, due to the “serial character” of the recurrent neural network which result in the low performance. In order to solve this problem, the model we proposed is completely based on the convolutional neural network. We use different convolutional neural networks to process the data of two modals simultaneously. Benefiting from the “parallel character” of convolution operation, the efficiency of the operation has been significantly improved, and experiments have been carried out on two public data sets. Experimental results have also been improved in the specified evaluation indexes, which indicates the effectiveness of the model for processing the image caption generation task.

Key words: multi-modal data, image caption, long short term memory, neural networks

中图分类号:

TP183

李勇,成红红,梁新彦,郭倩,钱宇华. CNN图像标题生成[J]. 西安电子科技大学学报, 2019, 46(2): 152-157.

LI Yong,CHENG Honghong,LIANG Xinyan,GUO Qian,QIAN Yuhua. CNN image caption generation[J]. Journal of Xidian University, 2019, 46(2): 152-157.

图/表 7

图1

图2

图3

表1

表2

图4

图5

参考文献 9

[1]	VINYALS O, TOSHEV A, BENGIO S , et al. Show and Tell: a Neural Image Caption Generator[C]// Proceedings of the 2015 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. Washington: IEEE Computer Society, 2015: 3156-3164.
[2]	SZEGEDY C, LIU W, JIA Y Q , et al. Going Deeper with Convolutions[C]// Proceedings of the 2015 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. Washington: IEEE Computer Society, 2015: 1-9.
[3]	KARPATHY A, LI F F . Deep Visual-semantic Alignments for Generating Image Descriptions[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017,39(4):664-676. doi: 10.1109/TPAMI.2016.2598339 pmid: 27514036
[4]	许强, 李伟, 占荣辉 , 等. 一种改进的卷积神经网络SAR目标识别算法[J]. 西安电子科技大学学报, 2018,45(5):177-183.
	XU Qiang, LI Wei, ZHAN Ronghui , et al. Improved Algorithm for SAR Target Recognition Based on the Convolutional Neural Network[J]. Journal of Xidian University, 2018,45(5):177-183.
[5]	LU J, XIONG C, PARIKH D , et al. Knowing When to Look: Adaptive Attention via a Visual Sentinel for Image Captioning[C]// Proceedings of the 2017 30th IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2017: 3242-3250.
[6]	WANG Y, LIN Z, SHEN X , et al. Skeleton Key: Image Captioning by Skeleton-attribute Decomposition[C]// Proceedings of the 2017 30th IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2017: 7378-7387.
[7]	张金刚, 方圆, 袁豪 , 等. 一种识别表情序列的卷积神经网络[J]. 西安电子科技大学学报, 2018,45(1):150-155. doi: 10.3969/j.issn.1001-2400.2018.01.027
	ZHANG Jingang, FANG Yuan, YUAN Hao , et al. Multiple Convolutional Neural Networks for Facial Expression Sequence Recognition[J]. Journal of Xidian University, 2018,45(1):150-155. doi: 10.3969/j.issn.1001-2400.2018.01.027
[8]	SANTORO A, RAPOSO D, BARRETT D G T , et al. A Simple Neural Network Module for Relational Reasoning[C]// Advances in Neural Information Processing Systems. Vancouver, Canada: Neural Information Processing System Foundation, 2017: 4968-4977.
[9]	REN Z, WANG X, ZHANG N , et al. Deep Reinforcement Learning-based Image Captioning with Embedding Reward[C]// Proceedings of the 2017 30th IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2017: 1151-1159.

数据集名字	数据量大小
数据集名字	训练集	验证集	测试集
Flickr8k	6000	1000	1000
Flickr30k	28000	1000	1000

数据集 Method	Flickr8k		Flickr30k
数据集 Method	BLEU	CIDEr	BLEU	CIDEr
DeepVS	0.579	9.26	0.573	7.94
Google NIC	0.63	9.84	0.663	8.55
Ours	0.664	10.93	0.855	10.85

[1]	张春祥,周雪松,高雪瑶,刘欢. 融合k均值聚类与LSTM网络的半监督词义消歧[J]. 西安电子科技大学学报, 2021, 48(6): 161-171.
[2]	李源,崔玉爽,王伟. 一种基于字词双通道网络的文本情感分析方法[J]. 西安电子科技大学学报, 2021, 48(6): 179-186.
[3]	吕文凯,杨鹏飞,丁韵青,张鹤于,郑天洋. JEDERL:一种异构计算平台任务调度优化算法[J]. 西安电子科技大学学报, 2021, 48(6): 67-74.
[4]	于浩洋,尹良,李书芳,吕顺. 生成对抗网络小样本雷达调制信号识别算法[J]. 西安电子科技大学学报, 2021, 48(6): 96-104.
[5]	胡代旺,焦一源,李雁妮. 一种新型高效的文库知识图谱实体关系抽取算法[J]. 西安电子科技大学学报, 2021, 48(6): 75-83.
[6]	孙彦景,魏力,张年龙,云霄,董锴文,葛敏,程小舟,侯晓峰. 联合DD-GAN和全局特征的井下人员重识别方法[J]. 西安电子科技大学学报, 2021, 48(5): 201-211.
[7]	周鹏,杨军. 采用神经网络架构搜索的遥感影像分割方法[J]. 西安电子科技大学学报, 2021, 48(5): 47-57.
[8]	张书伟,李俊民. 一种复杂监控场景下的人体检测算法[J]. 西安电子科技大学学报, 2021, 48(5): 68-77.
[9]	杨云航,闵连权. 采用空洞卷积的多尺度融合草图识别模型[J]. 西安电子科技大学学报, 2021, 48(5): 92-99.
[10]	董如婵,焦李成,赵进,沈维燕. 一种深度融合机制的遥感图像目标检测技术[J]. 西安电子科技大学学报, 2021, 48(5): 128-138.
[11]	程德,郝毅,周靖宇,王楠楠,高新波. 利用混合双通路神经网络的跨模态行人重识别[J]. 西安电子科技大学学报, 2021, 48(5): 190-200.
[12]	陈昌川,王海宁,黄炼,黄涛,李连杰,黄向康,代少升. 一种基于局部表征的面部表情识别算法[J]. 西安电子科技大学学报, 2021, 48(5): 100-109.
[13]	宋建锋,苗启广,王崇晓,徐浩,杨瑾. 注意力机制的多尺度单目标跟踪算法[J]. 西安电子科技大学学报, 2021, 48(5): 110-116.
[14]	张宇浩,程培涛,张书豪,王秀美. 一种自适应权重学习的轻量超分辨率重建网络[J]. 西安电子科技大学学报, 2021, 48(5): 15-22.
[15]	韩永赛,马时平,何林远,李承昊,朱明明,张飞. 改进YOLOv3的快速遥感机场区域目标检测[J]. 西安电子科技大学学报, 2021, 48(5): 156-166.

CNN图像标题生成

CNN image caption generation

RichHTML

PDF (PC)

赞

可视化

摘要/Abstract

引用本文

使用本文

图/表 7

参考文献 9

相关文章 15

Metrics

本文评价

推荐阅读 10