结合互信息最大化的文本到图像生成方法

doi:10.19665/j.issn1001-2400.2019.05.025

摘要/Abstract

摘要：

在堆叠式文本到图像生成模型的基础上,针对其生成样本分布不均匀导致多样性不足的问题,提出了一种结合局部-全局互信息最大化的堆叠式文本到图像的生成对抗网络模型。首先利用生成模型将全局向量解耦得到不同尺度特征图;然后通过最大化特征图与全局向量间的互信息,对图像全局特征与文本描述的相关性进行增强;最后,将特征图提取为局部位置特征向量,通过最大化局部位置特征向量与全局向量之间的平均互信息,加强局部位置特征与文本描述的相关性,得到更紧密的文本到图像的映射关系。在CUB数据集上的实验验证了该方法能有效地提高生成样本的多样性,同时在主观评价上能生成语义精确度更高的样本,更接近自然图像。

关键词: 图像生成, 互信息, 生成对抗网络, 局部位置特征向量

Abstract:

Based on the Stacked Generative Adversarial Networks (StackGAN), a novel method is presented to solve the problem of insufficient diversity caused by non-uniformity of generated samples, which constructs the stacked text-to-image generation antagonistic network model by combining local-global mutual information maximization. In the method, the global vector is first decoupled from the generated model to obtain different scale feature maps. And then, the correlation between global features and text descriptions is enhanced by maximizing mutual information between feature maps and global vectors. Finally, in order to make the text-to-image mapping more relevant, we extract the feature map as a local position feature vector, and enhance the correlation between it and text description by maximizing the average mutual information between the local position feature vector and the global vector. Numerical results show that the proposed method can improve effectively the diversity of generated samples on the CUB dataset. Moreover, it is possible to generate samples with a higher semantic accuracy and the method is more realistic for subjective evaluation.

Key words: image generation, information, generative adversarial networks, local position feature vector

中图分类号:

TP183

莫建文,徐凯亮,林乐平,欧阳宁. 结合互信息最大化的文本到图像生成方法[J]. 西安电子科技大学学报, 2019, 46(5): 180-188.

MO Jianwen,XU Kailiang,LIN Leping,OUYANG Ning. Text-to-image generation combined with mutual information maximization[J]. Journal of Xidian University, 2019, 46(5): 180-188.

图/表 10

图1

图2

图3

图4

表1

表2

表3

图5

图6

表4

参考文献 17

[1]	陈晓范, 申海杰, 边倩 , 等. 结合注意力机制的人脸超分辨率重建[J]. 西安电子科技大学学报, 2019,46(3):148-153.
	CHEN Xiaofan, SHEN Haijie, BIAN Qian , et al. Face Image Super-resolution with an Attention Mechanism[J]. Journal of Xidian University, 2019,46(3):148-153.
[2]	REED S, AKATA Z, YAN X , et al. Generative Adversarial Text to Image Synconfproc[C]// Proceedings of the 2016 33rd International Conference on Machine Learning. Lille: International Machine Learning Society, 2016: 1681-1690.
[3]	MIRZA M, OSINDERO S . Conditional Generative Adversarial Nets[J]. Computer Science, 2014: 2672-2680.
[4]	ZHANG H, XU T, LI H , et al. StackGAN:Text to Photo-realistic Image Synconfproc with Stacked Generative Adversarial Networks[C]// Proceedings of the IEEE 2017 International Conference on Computer Vision. Piscataway: IEEE, 2017: 5908-5916.
[5]	ZHANG H, XU T, LI H S , et al. StackGAN++: Realistic Image Synjournal with Stacked Generative Adversarial Networks[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2018, DOI: 10.1109/TPAMI.2018.2856256.
[6]	XU T, ZHANG P, HUANG Q , et al. Attngan: Fine-grained Text to Image Generation with Attentional Generative Adversarial Networks[C]// Proceedings of the 2018 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. Washington: IEEE Computer Society, 2018: 1316-1324.
[7]	LINSKER R . Self-organization in a Perceptual Network[J]. Compurer, 1988,21(3):105-117.
[8]	BELL A J, SEJNOWSKI T J . An Information-maximization Approach to Blind Separation and Blind Deconvolution[J]. Neural Computation, 1995,7(6):1129-1159.
[9]	BELGHAZI M I, BARATIN A, RAJESWAR S , et al. Mutual Information Neural Estimation[C]// Proceedings of the 2018 35th International Conference on Machine Learning. Lille: International Machine Learning Society, 2018: 864-873.
[10]	CHEN X, DUAN Y, HOUTHOOFT R , et al. Infogan: Interpretable Representation Learning by Information Maximizing Generative Adversarial Nets[C]// Advances in Neural Information Processing Systems. Vancouver: Neural Information Processing Systems Foundation, 2016: 2180-2188.
[11]	HJELM R D, FEDOROV A, LAVOIE-MARCHILDON S , et al. Learning Deep Representations by Mutual Information Estimation and Maximization[J/OL]. [2019-02-26].https://arxiv.org/abs/ 1808. 06670.
[12]	VINCENT P, LAROCHELLE H, LAJOIE I , et al. Stacked Denoising Autoencoders: Learning Useful Representations in a Deep Network with a Local Denoising Criterion[J]. Journal of Machine Learning Research, 2010,11:3371-3408.
[13]	SALIMANS T, GOODFELLOW I, ZAREMBA W , et al. Improved Techniques for Training Gans[C]// Advances in Neural Information Processing Systems. Vancouver: Neural Information Processing Systems Foundation, 2016: 2234-2242.
[14]	DONAHUE J, KRäHENBüHL P, DARRELL T . Adversarial Feature Learning[J/OL]. Computer Science, 2016, 7.
[15]	NOWOZIN S, CSEKE B, TOMIOKA R . f-GAN: Training Generative Neural Samplers Using Variational Divergence Minimization[C]// Advances in Neural Information Processing Systems. Vancouver: Neural Information Processing Systems Foundation, 2016: 271-279.
[16]	WAH C, BRANSON S, WELINDER P , et al. The Caltech-UCSD Birds-200-2011 Dataset: CNS-TR-2011-001[R/OL]. [ 2019- 02- 26]. .
[17]	REED S, AKATA Z, MOHAN S , et al. Learning What and Where to Draw[C]// Advances in Neural Information Processing Systems. Vancouver: Neural Information Processing Systems Foundation, 2016: 217-225.

层名称	卷积核尺寸和输出维度	输出尺寸
Input_1	2N_t	64×64
conv_1+Relu	3×3,32	62×62
conv_2+Relu	3×3,16	60×60
reshape	57 600	1×1
Input_2	228	1×1
concat	57 828	1×1
fc_1	256	1×1
fc_2	256	1×1
fc_3	1	1×1

层名称	卷积核尺寸和输出维度	输出尺寸
Input_1,Feature map	2N_t	64×64
Input_2,y	228	1×1
expend	228	64×64
concat	292	64×64
conv_1+Relu	1×1,256	64×64
conv_2+Relu	1×1,256	64×64
conv_3	1×1,1	64×64

方法	ρ=0	ρ=1	ρ=3	ρ=5	ρ=10
Our1	3.82 ± 0.06	3.91 ± 0.04	4.11 ± 0.07	3.78 ± 0.05	3.63 ± 0.05
Our2			4.23 ± 0.03	4.20 ± 0.06