CNN图像标题生成

doi:10.19665/j.issn1001-2400.2019.02.025

Abstract

Abstract:

The image caption generation task needs to generate a meaningful sentence which can accurately describe the content of the image. Existing research usually uses the convolutional neural network to encode image information and the recurrent neural network to encode text information, due to the “serial character” of the recurrent neural network which result in the low performance. In order to solve this problem, the model we proposed is completely based on the convolutional neural network. We use different convolutional neural networks to process the data of two modals simultaneously. Benefiting from the “parallel character” of convolution operation, the efficiency of the operation has been significantly improved, and experiments have been carried out on two public data sets. Experimental results have also been improved in the specified evaluation indexes, which indicates the effectiveness of the model for processing the image caption generation task.

Key words: multi-modal data, image caption, long short term memory, neural networks

CLC Number:

TP183

LI Yong,CHENG Honghong,LIANG Xinyan,GUO Qian,QIAN Yuhua. CNN image caption generation[J].Journal of Xidian University, 2019, 46(2): 152-157.

Figures/Tables 7

References 9

[1]	VINYALS O, TOSHEV A, BENGIO S , et al. Show and Tell: a Neural Image Caption Generator[C]// Proceedings of the 2015 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. Washington: IEEE Computer Society, 2015: 3156-3164.
[2]	SZEGEDY C, LIU W, JIA Y Q , et al. Going Deeper with Convolutions[C]// Proceedings of the 2015 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. Washington: IEEE Computer Society, 2015: 1-9.
[3]	KARPATHY A, LI F F . Deep Visual-semantic Alignments for Generating Image Descriptions[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017,39(4):664-676. doi: 10.1109/TPAMI.2016.2598339 pmid: 27514036
[4]	许强, 李伟, 占荣辉 , 等. 一种改进的卷积神经网络SAR目标识别算法[J]. 西安电子科技大学学报, 2018,45(5):177-183.
	XU Qiang, LI Wei, ZHAN Ronghui , et al. Improved Algorithm for SAR Target Recognition Based on the Convolutional Neural Network[J]. Journal of Xidian University, 2018,45(5):177-183.
[5]	LU J, XIONG C, PARIKH D , et al. Knowing When to Look: Adaptive Attention via a Visual Sentinel for Image Captioning[C]// Proceedings of the 2017 30th IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2017: 3242-3250.
[6]	WANG Y, LIN Z, SHEN X , et al. Skeleton Key: Image Captioning by Skeleton-attribute Decomposition[C]// Proceedings of the 2017 30th IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2017: 7378-7387.
[7]	张金刚, 方圆, 袁豪 , 等. 一种识别表情序列的卷积神经网络[J]. 西安电子科技大学学报, 2018,45(1):150-155. doi: 10.3969/j.issn.1001-2400.2018.01.027
	ZHANG Jingang, FANG Yuan, YUAN Hao , et al. Multiple Convolutional Neural Networks for Facial Expression Sequence Recognition[J]. Journal of Xidian University, 2018,45(1):150-155. doi: 10.3969/j.issn.1001-2400.2018.01.027
[8]	SANTORO A, RAPOSO D, BARRETT D G T , et al. A Simple Neural Network Module for Relational Reasoning[C]// Advances in Neural Information Processing Systems. Vancouver, Canada: Neural Information Processing System Foundation, 2017: 4968-4977.
[9]	REN Z, WANG X, ZHANG N , et al. Deep Reinforcement Learning-based Image Captioning with Embedding Reward[C]// Proceedings of the 2017 30th IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2017: 1151-1159.

数据集名字	数据量大小
数据集名字	训练集	验证集	测试集
Flickr8k	6000	1000	1000
Flickr30k	28000	1000	1000

数据集 Method	Flickr8k		Flickr30k
数据集 Method	BLEU	CIDEr	BLEU	CIDEr
DeepVS	0.579	9.26	0.573	7.94
Google NIC	0.63	9.84	0.663	8.55
Ours	0.664	10.93	0.855	10.85

[1]	ZHANG Chunxiang,ZHOU Xuesong,GAO Xueyao,LIU Huan. Semi-supervised word sense disambiguation by combining k-means clustering and the LSTM network [J]. Journal of Xidian University, 2021, 48(6): 161-171.
[2]	LV Wenkai,YANG Pengfei,DING Yunqing,ZHANG Heyu,ZHENG Tianyang. JEDERL:A task scheduling optimization algorithm for heterogeneous computing platforms [J]. Journal of Xidian University, 2021, 48(6): 67-74.
[3]	ZHANG Yuhao,CHENG Peitao,ZHANG Shuhao,WANG Xiumei. Lightweight image super-resolution with the adaptive weight learning network [J]. Journal of Xidian University, 2021, 48(5): 15-22.
[4]	SONG Jianqiao,WANG Feng,NIU Jin,SHI Zezhou,MA Junhui. Potential emotion recognition based on the fusion of the spatio-temporal neural network and facial pulse signals [J]. Journal of Xidian University, 2021, 48(4): 159-167.
[5]	HUI Haisheng,ZHANG Xueying,WU Zelin,LI Fenglian. Method for stroke lesion segmentation using the primary-auxiliary path attention compensation network [J]. Journal of Xidian University, 2021, 48(4): 200-208.
[6]	CAO Yi,CAI Xiaodong. Effective learning strategy for hard samples [J]. Journal of Xidian University, 2021, 48(3): 99-105.
[7]	WANG Ping,JIANG Yuze,ZHAO Guanghui. Object detection based on the multiscale location Enhancement network [J]. Journal of Xidian University, 2021, 48(3): 85-90.
[8]	GUO Zekun,TIAN Long,HAN Ning,WANG Penghui,LIU Hongwei,CHEN Bo. Radar HRRP based few-shot target recognition with CNN-SSD [J]. Journal of Xidian University, 2021, 48(2): 7-14.
[9]	ZHANG Shudong,GAO Haichang,CAO Xiwen,KANG Shuai. Adaptive fast and targeted adversarial attack for speech recognition [J]. Journal of Xidian University, 2021, 48(1): 168-175.
[10]	DANG Jisheng,YANG Jun. 3D model recognition and segmentation based on multi-feature fusion [J]. Journal of Xidian University, 2020, 47(4): 149-157.
[11]	LI Kunlun,ZHANG Lu,XU Hongke,SONG Huansheng. Waveletdomain dilated network for fast low-dose CT image reconstruction [J]. Journal of Xidian University, 2020, 47(4): 86-93.
[12]	NGUYEN Van-Truong,CAI Jueping,WEI Linyu,CHU Jie. Low complexity probability-based piecewise linear approximation of the sigmoid function [J]. Journal of Xidian University, 2020, 47(3): 58-65.
[13]	WANG Jijun,HAO Ziyu,LI Hongliang. Optimization of memory access for the convolutional neural network training [J]. Journal of Xidian University, 2020, 47(2): 98-107.
[14]	LIU Lei,JIA Renxu. Analysis of sensitivity uncertainty of the MEMS microphone based on Latin hypercube Monte Carlo simulation [J]. Journal of Xidian University, 2019, 46(6): 23-29.
[15]	CAO Weidong,LI Jiaqi,WANG Huaichao. Analysis of targeted sentiment by the attention gated convolutional network model [J]. Journal of Xidian University, 2019, 46(6): 30-36.

CNN image caption generation

RichHTML

PDF (PC)

Like

Knowledge

Abstract

Cite this article

share this article

Figures/Tables 7

References 9

Related Articles 15

Metrics

Comments

Recommended 10