西安电子科技大学学报 ›› 2019, Vol. 46 ›› Issue (5): 180-188.doi: 10.19665/j.issn1001-2400.2019.05.025

• • 上一篇    

结合互信息最大化的文本到图像生成方法

莫建文,徐凯亮,林乐平,欧阳宁   

  1. 桂林电子科技大学 信息与通信学院, 广西壮族自治区 桂林 541004
  • 收稿日期:2019-03-05 出版日期:2019-10-20 发布日期:2019-10-30
  • 作者简介:莫建文(1972—),男,副教授,博士,E-mail:mo_jianwen@126.com.
  • 基金资助:
    国家自然科学基金(61661017);广西自然科学基金(2016GXNSFAA380149)

Text-to-image generation combined with mutual information maximization

MO Jianwen,XU Kailiang,LIN Leping,OUYANG Ning   

  1. School of Information and Communication Engineering,Guilin University of Electronic Technology,Guilin 541004,China
  • Received:2019-03-05 Online:2019-10-20 Published:2019-10-30

摘要:

在堆叠式文本到图像生成模型的基础上,针对其生成样本分布不均匀导致多样性不足的问题,提出了一种结合局部-全局互信息最大化的堆叠式文本到图像的生成对抗网络模型。首先利用生成模型将全局向量解耦得到不同尺度特征图;然后通过最大化特征图与全局向量间的互信息,对图像全局特征与文本描述的相关性进行增强;最后,将特征图提取为局部位置特征向量,通过最大化局部位置特征向量与全局向量之间的平均互信息,加强局部位置特征与文本描述的相关性,得到更紧密的文本到图像的映射关系。在CUB数据集上的实验验证了该方法能有效地提高生成样本的多样性,同时在主观评价上能生成语义精确度更高的样本,更接近自然图像。

关键词: 图像生成, 互信息, 生成对抗网络, 局部位置特征向量

Abstract:

Based on the Stacked Generative Adversarial Networks (StackGAN), a novel method is presented to solve the problem of insufficient diversity caused by non-uniformity of generated samples, which constructs the stacked text-to-image generation antagonistic network model by combining local-global mutual information maximization. In the method, the global vector is first decoupled from the generated model to obtain different scale feature maps. And then, the correlation between global features and text descriptions is enhanced by maximizing mutual information between feature maps and global vectors. Finally, in order to make the text-to-image mapping more relevant, we extract the feature map as a local position feature vector, and enhance the correlation between it and text description by maximizing the average mutual information between the local position feature vector and the global vector. Numerical results show that the proposed method can improve effectively the diversity of generated samples on the CUB dataset. Moreover, it is possible to generate samples with a higher semantic accuracy and the method is more realistic for subjective evaluation.

Key words: image generation, information, generative adversarial networks, local position feature vector

中图分类号: 

  • TP183
Baidu
map