结合互信息最大化的文本到图像生成方法

doi:10.19665/j.issn1001-2400.2019.05.025

Abstract

Abstract:

Based on the Stacked Generative Adversarial Networks (StackGAN), a novel method is presented to solve the problem of insufficient diversity caused by non-uniformity of generated samples, which constructs the stacked text-to-image generation antagonistic network model by combining local-global mutual information maximization. In the method, the global vector is first decoupled from the generated model to obtain different scale feature maps. And then, the correlation between global features and text descriptions is enhanced by maximizing mutual information between feature maps and global vectors. Finally, in order to make the text-to-image mapping more relevant, we extract the feature map as a local position feature vector, and enhance the correlation between it and text description by maximizing the average mutual information between the local position feature vector and the global vector. Numerical results show that the proposed method can improve effectively the diversity of generated samples on the CUB dataset. Moreover, it is possible to generate samples with a higher semantic accuracy and the method is more realistic for subjective evaluation.

Key words: image generation, information, generative adversarial networks, local position feature vector

CLC Number:

TP183

MO Jianwen,XU Kailiang,LIN Leping,OUYANG Ning. Text-to-image generation combined with mutual information maximization[J].Journal of Xidian University, 2019, 46(5): 180-188.

Figures/Tables 10

References 17

[1]	陈晓范, 申海杰, 边倩 , 等. 结合注意力机制的人脸超分辨率重建[J]. 西安电子科技大学学报, 2019,46(3):148-153.
	CHEN Xiaofan, SHEN Haijie, BIAN Qian , et al. Face Image Super-resolution with an Attention Mechanism[J]. Journal of Xidian University, 2019,46(3):148-153.
[2]	REED S, AKATA Z, YAN X , et al. Generative Adversarial Text to Image Synconfproc[C]// Proceedings of the 2016 33rd International Conference on Machine Learning. Lille: International Machine Learning Society, 2016: 1681-1690.
[3]	MIRZA M, OSINDERO S . Conditional Generative Adversarial Nets[J]. Computer Science, 2014: 2672-2680.
[4]	ZHANG H, XU T, LI H , et al. StackGAN:Text to Photo-realistic Image Synconfproc with Stacked Generative Adversarial Networks[C]// Proceedings of the IEEE 2017 International Conference on Computer Vision. Piscataway: IEEE, 2017: 5908-5916.
[5]	ZHANG H, XU T, LI H S , et al. StackGAN++: Realistic Image Synjournal with Stacked Generative Adversarial Networks[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2018, DOI: 10.1109/TPAMI.2018.2856256.
[6]	XU T, ZHANG P, HUANG Q , et al. Attngan: Fine-grained Text to Image Generation with Attentional Generative Adversarial Networks[C]// Proceedings of the 2018 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. Washington: IEEE Computer Society, 2018: 1316-1324.
[7]	LINSKER R . Self-organization in a Perceptual Network[J]. Compurer, 1988,21(3):105-117.
[8]	BELL A J, SEJNOWSKI T J . An Information-maximization Approach to Blind Separation and Blind Deconvolution[J]. Neural Computation, 1995,7(6):1129-1159.
[9]	BELGHAZI M I, BARATIN A, RAJESWAR S , et al. Mutual Information Neural Estimation[C]// Proceedings of the 2018 35th International Conference on Machine Learning. Lille: International Machine Learning Society, 2018: 864-873.
[10]	CHEN X, DUAN Y, HOUTHOOFT R , et al. Infogan: Interpretable Representation Learning by Information Maximizing Generative Adversarial Nets[C]// Advances in Neural Information Processing Systems. Vancouver: Neural Information Processing Systems Foundation, 2016: 2180-2188.
[11]	HJELM R D, FEDOROV A, LAVOIE-MARCHILDON S , et al. Learning Deep Representations by Mutual Information Estimation and Maximization[J/OL]. [2019-02-26].https://arxiv.org/abs/ 1808. 06670.
[12]	VINCENT P, LAROCHELLE H, LAJOIE I , et al. Stacked Denoising Autoencoders: Learning Useful Representations in a Deep Network with a Local Denoising Criterion[J]. Journal of Machine Learning Research, 2010,11:3371-3408.
[13]	SALIMANS T, GOODFELLOW I, ZAREMBA W , et al. Improved Techniques for Training Gans[C]// Advances in Neural Information Processing Systems. Vancouver: Neural Information Processing Systems Foundation, 2016: 2234-2242.
[14]	DONAHUE J, KRäHENBüHL P, DARRELL T . Adversarial Feature Learning[J/OL]. Computer Science, 2016, 7.
[15]	NOWOZIN S, CSEKE B, TOMIOKA R . f-GAN: Training Generative Neural Samplers Using Variational Divergence Minimization[C]// Advances in Neural Information Processing Systems. Vancouver: Neural Information Processing Systems Foundation, 2016: 271-279.
[16]	WAH C, BRANSON S, WELINDER P , et al. The Caltech-UCSD Birds-200-2011 Dataset: CNS-TR-2011-001[R/OL]. [ 2019- 02- 26]. .
[17]	REED S, AKATA Z, MOHAN S , et al. Learning What and Where to Draw[C]// Advances in Neural Information Processing Systems. Vancouver: Neural Information Processing Systems Foundation, 2016: 217-225.

层名称	卷积核尺寸和输出维度	输出尺寸
Input_1	2N_t	64×64
conv_1+Relu	3×3,32	62×62
conv_2+Relu	3×3,16	60×60
reshape	57 600	1×1
Input_2	228	1×1
concat	57 828	1×1
fc_1	256	1×1
fc_2	256	1×1
fc_3	1	1×1

层名称	卷积核尺寸和输出维度	输出尺寸
Input_1,Feature map	2N_t	64×64
Input_2,y	228	1×1
expend	228	64×64
concat	292	64×64
conv_1+Relu	1×1,256	64×64
conv_2+Relu	1×1,256	64×64
conv_3	1×1,1	64×64

方法	ρ=0	ρ=1	ρ=3	ρ=5	ρ=10
Our1	3.82 ± 0.06	3.91 ± 0.04	4.11 ± 0.07	3.78 ± 0.05	3.63 ± 0.05
Our2			4.23 ± 0.03	4.20 ± 0.06

Text-to-image generation combined with mutual information maximization

RichHTML

PDF (PC)

Like

Knowledge

Abstract

Cite this article

share this article

Figures/Tables 10

References 17

Related Articles 15

Metrics

Comments

Recommended 10

[1]	WAN Pengwu,YAO Yuanyuan,YAN Qianli,CHEN Yufei. Passive localization based on energy-time-frequency information fusion [J]. Journal of Xidian University, 2021, 48(6): 105-114.
[2]	TIAN Lin,SU Zhijie,FENG Wanmei,CHEN Zhen,TANG Jie,ZHOU Encheng. Trajectory and resource allocation for multi-UAV enabled swipt systems [J]. Journal of Xidian University, 2021, 48(6): 115-122.
[3]	DAI Mingjun,LI Xiaofeng,DENG Haiyan,CHEN Bin. Private information retrieval with low encoding/decoding complexity [J]. Journal of Xidian University, 2021, 48(6): 212-220.
[4]	LIU Jiawei,ZHANG Wenhui,KOU Xiaoli,LI Yanni. Harnessing adversarial examples via input denoising and hidden information restoring [J]. Journal of Xidian University, 2021, 48(6): 23-31.
[5]	SUN Yanjing,WEI Li,ZHANG Nianlong,YUN Xiao,DONG Kaiwen,GE Min,CHENG Xiaozhou,HOU Xiaofeng. Person re-identification method combining the DD-GAN and Global feature in a coal mine [J]. Journal of Xidian University, 2021, 48(5): 201-211.
[6]	YANG Jingbo,ZHAO Qijun,LYU Zejun. Synthesis of the expression image and its application under the dimentional emotion model [J]. Journal of Xidian University, 2021, 48(5): 30-37.
[7]	XU Huanhuan,LI Hongmei,LI Fuyu,SUN Xuemei. Gait recognition method based on spatial-temporal convolution [J]. Journal of Xidian University, 2021, 48(4): 144-150.
[8]	WEI Ziyu,YANG Xi,WANG Nannan,YANG Dong,GAO Xinbo. Reciprocal bi-directional generative adversarial network for cross-modal pedestrian re-identification [J]. Journal of Xidian University, 2021, 48(2): 205-212.
[9]	GU Dawu,ZHANG Chi,LU Xiangjun. Progress of and some comments on the research of side-channel attack for cryptosystems [J]. Journal of Xidian University, 2021, 48(1): 14-21.
[10]	LIU Jingmei,GAO Yuanbo. Fast network intrusion detection system using adaptive binning feature selection [J]. Journal of Xidian University, 2021, 48(1): 176-182.
[11]	SI Chengxiang,GAO Feng,ZHU Liehuang,GONG Guopeng,ZHANG Can,CHEN Zhuo,LI Ruiguang. Covert data transmission mechanism based on dynamic label in blockchain [J]. Journal of Xidian University, 2020, 47(5): 94-102.
[12]	ZHANG Lu,MU Dejun,HU Wei,TAI Yu. High-level synthesis design flow for power side-channel security [J]. Journal of Xidian University, 2020, 47(4): 64-69.
[13]	ZHANG Zhiyuan,DIAO Yinghua. Pedestrian trajectory prediction model with social features and attention [J]. Journal of Xidian University, 2020, 47(1): 10-17.
[14]	YANG Xiaoli,LIN Suzhen. Method for multi-band image feature-level fusion based on the attention mechanism [J]. Journal of Xidian University, 2020, 47(1): 120-127.
[15]	YANG Luhui,LIU Guangjie,ZHAI Jiangtao,LIU Weiwei,BAI Huiwen,DAI Yuewei. Improved algorithm for detection of the malicious domain name based on the convolutional neural network [J]. Journal of Xidian University, 2020, 47(1): 37-43.