面向大规模零样本图像识别的高效算法框架

doi:10.19665/j.issn1001-2400.2022.06.013

摘要/Abstract

摘要：

在大规模零样本图像识别任务中,由于图像类别数量大,所以模型训练困难,且模型的训练成本高。为了解决这些问题,设计了高效的零样本学习算法框架,在低训练成本的前提下提高了模型识别精度和泛化能力。该算法框架中定义了公共空间,利用图像分支网络和语义分支网络分别将不同模态的特征向量映射至公共空间完成模型训练和推理。在图像分支网络中,为了改变图像特征向量的分布,使用感知机网络将图像特征向量映射至公共空间;在语义分支网络中,采用图卷积网络将语义向量映射至公共空间。同时,设计损失函数对公共空间进行约束,使得公共空间中不同类别的区分度被增大,利于模型训练。在ImageNet数据集上的实验结果显示,在“2-HOPS”测试集上,与不需要微调的现有方法相比,该算法框架的精度提高了1.1%,训练时间节省了57.8%;与需要微调的现有算法相比,该算法框架在不损失精度的情况下节省了98.4%的训练时间。实验结果表明,该算法框架以低训练成本实现了模型性能的提升。

关键词: 深度学习, 知识图谱, 图神经网络

Abstract:

For large-scale zero-shot image recognition tasks,because of a large number of classes,model training is difficult and training costs of the model are high.In order to solve those problems,this paper designs a high-efficient zero-shot learning framework,which improves the accuracy and generalization ability at low training costs.This framework designs the joint space,uses the image branch network and the semantic branch network to map different modal vectors to the joint space to complete model training and inference.In the image branch network,in order to change the distribution of image feature vectors,this paper uses the perceptron network to map image feature vectors to the joint space.In the semantic branch network,graph convolutional networks are used to map semantic vectors to the joint space.In addition,the loss function is designed to constrain the joint space,so that the discrimination of different classes in the joint space is increased,which is conducive to model training.Experimental results on the ImageNet show that on the “2-HOPS” test set,compared with existing methods without fine-tuning,the accuracy of our algorithm increases by 1.1%,and the training time decreases by 57.8%;compared with existing algorithms after fine-tuning,the accuracy of our algorithm saves 98.4% of training time without any loss of accuracy.Experimental results show that the method improves the model performance with low training costs.

Key words: deep learning, knowledge graph, graph neural networks

中图分类号:

TP183

张泽欢, 刘强, 国狄非. 面向大规模零样本图像识别的高效算法框架[J]. 西安电子科技大学学报, 2022, 49(6): 103-110.

ZHANG Zehuan, LIU Qiang, GUO Difei. High efficient framework for large-scale zero-shot image recognition[J]. Journal of Xidian University, 2022, 49(6): 103-110.

图/表 11

图1

表1

主要符号说明"

符号	含义	符号	含义
N/N_s/N_u	总类别/可见类/未见类的数量	x_i	第i类语义空间的特征向量
n_s/n_u	可见类/未见类的样本数量	d	图像特征空间的向量维度
$v j s$ / $v j u$	可见类/未见类的样本	k	语义空间的向量维度
$Y i s$ / $Y i u$	可见类/未见类的标签	c	公共空间的向量维度
$S i s$ / $S i u$	可见类/未见类的第i类的样本集合	Φ(·)	感知机网络运算
\| $S i s$ \|/\| $S i u$ \|	可见类/未见类的第i类的样本集合中的样本数量	f(·)	图卷积网络每层运算
$V i s$	第i类图像特征空间的特征向量	G(·)	图卷积网络运算

表1

图2

图3

表2

图4

图5

表3

图6

表4

表5

参考文献 31

[1]	宋建锋, 苗启广, 王崇晓, 等. 注意力机制的多尺度单目标跟踪算法[J]. 西安电子科技大学学报, 2021, 48(5):110-116.
	SONG Jianfeng, MIAO Qiguang, WANG Chongxiao, et al. Multi-Scale Single Object Tracking Based on the Attention Mechanism[J]. Journal of Xidian University, 2021, 48(5):110-116.
[2]	HARL M, HERCHENBACH M, KRUSCHEL S, et al. A Light in the Dark:Deep Learning Practices for Industrial Computer Vision[J/OL].[2021-12-20].ArXiv:2201.02028,2022.
[3]	VIJ R, ARORA S. Computer Vision with Deep Learning Techniques for Neurodegenerative Diseases Analysis Using Neuroimaging:A Survey[C]// International Conference on Innovative Computing and Communications.Heidelberg:Springer, 2022:179-189.
[4]	孔月萍, 刘楚, 朱旭东. 一种利用背景光流特征的虚假人脸检测方法[J]. 西安电子科技大学学报, 2021, 48(5):86-91.
	KONG Yueping, LIU Chu, ZHU Xudong. Faceanti-Spoofing Method Using the Optical Flow Features of Back Ground[J]. Journal of Xidian University, 2021, 48(5):86-91.
[5]	XIAN Y, LAMPERT C H, SCHIELE B, et al. Zero-Shot Learning—A Comprehensive Evaluation of the Good,the Bad and the Ugly[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2018, 41(9):2251-2265. doi: 10.1109/TPAMI.2018.2857768
[6]	VERMA V K, LIANG K, MEHTA N, et al. Meta-Learned Attribute Self-Gating for Continual Generalized Zero-Shot Learning[J/OL].[2021-12-22]. ArXiv:2102.11856,2021.
[7]	NAM J, AHN D, KANG D, et al. Zero-Shot Natural Language Video Localization[C]// Proceedings of the IEEE International Conference on Computer Vision.Piscataway:IEEE, 2021:1470-1479.
[8]	QIN Y, ZHAO C, ZHU X, et al. Learning Meta Model for Zero-and Few-Shot Face Anti-Spoofing[C]// Proceedings of the AAAI Conference on Artificial Intelligence.Piscataway:IEEE, 2020:11916-11923.
[9]	HUYNH D, ELHAMIFAR E. Fine-Grained Generalized Zero-Shot Learning Via Dense Attribute-Based Attention[C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Piscataway:IEEE, 2020:4483-4493.
[10]	XU W, XIAN Y, WANG J, et al. Attribute Prototype Network for Zero-Shot Learning[J/OL].[2021-12-16].ArXiv:2008.08290,2020.
[11]	LIU L, ZHOU T, LONG G, et al. Attribute Propagation Network for Graph Zero-Shot Learning[C]// Proceedings of the AAAI Conference on Artificial Intelligence.Piscataway:IEEE, 2020:4868-4875.
[12]	BEN-COHEN A, ZAMIR N, BEN-BARUCH E, et al. Semantic Diversity Learning for Zero-Shot Multi-Label Classification[C]// Proceedings of the IEEE International Conference on Computer Vision.Piscataway:IEEE, 2021:640-650.
[13]	PAUL A, SHEN T C, LEE S, et al. Generalized Zero-Shot Chest X-Ray Diagnosis Through Trait-Guided Multi-View Semantic Embedding with Self-Training[J]. IEEE Transactions on Medical Imaging, 2021, 40(10):2642-2655. doi: 10.1109/TMI.2021.3054817
[14]	MANCINI M, NAEEM M F, XIAN Y, et al. Learning Graph Embeddings for Open World Compositional Zero-Shot Learning[J/OL].[2021-12-20]. ArXiv:2105.01017,2021.
[15]	SUN B, KONG D, WANG S, et al. GAN for Vision,KG for Relation:A Two-Stage Deep Network for Zero-Shot Action Recognition[J/OL].[2021-12-18].ArXiv:2105.11789,2021.
[16]	CHURCH K W. Word2Vec[J]. Natural Language Engineering, 2017, 23(1):155-162. doi: 10.1017/S1351324916000334
[17]	WANG X, YE Y, GUPTA A. Zero-Shot Recognition Via Semantic Embeddings and Knowledge Graphs[C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Piscataway:IEEE, 2018:6857-6866.
[18]	KAMPFFMEYER M, CHEN Y, LIANG X, et al. Rethinking Knowledge Graph Propagation for Zero-Shot Learning[C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Piscataway:IEEE, 2019:11487-11496.
[19]	XIAO B, DU Y, WU Q M J, et al. A Fast Hybrid Model for Large-Scale Zero-Shot Image Recognition Based on Knowledge Graphs[J]. IEEE Access, 2019, 7:119309-119318.
[20]	LIU S, CHEN J, PAN L, et al. Hyperbolic Visual Embedding Learning for Zero-Shot Recognition[C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Piscataway:IEEE, 2020:9273-9281.
[21]	ZHANG L, WANG P, LIU L, et al. Towards Effective Deep Embedding for Zero-Shot Learning[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2020, 30(9):2843-2852. doi: 10.1109/TCSVT.2020.2984666
[22]	DENG J, DONG W, SOCHER R, et al. Imagenet:A Large-Scale Hierarchical Image Database[C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Piscataway:IEEE, 2009:248-255.
[23]	HE K, ZHANG X, REN S, et al. Deep Residual Learning for Image Recognition[J/OL].[2021-11-30].ArXiv:1512.03385.
[24]	VAN der MAATEN L, HINTON G. Visualizing Data Using t-SNE[J]. Journal of Machine Learning Research, 2008, 9(11):2570-2605.
[25]	MILLER G A. WordNet:A Lexical Database for English[J]. Communications of the ACM, 1995, 38(11):39-41.
[26]	KIPFT N, WELLING M. Semi-Supervised Classification with Graph Convolutional Networks[J/OL].[2021-11-28].ArXiv:1609.02907,2016.
[27]	VELIČKOVIĆ P, CUCURULL G, CASANOVA A, et al. Graph Attention Networks[J/OL].[2021-12-02].ArXiv:1710.10903,2017.
[28]	XU B, SHEN H, CAO Q, et al. Graph Convolutional Networks Using Heat Kernel for Semi-Supervised Learning[J/OL].[2021-12-08].ArXiv:2007.16002,2020.
[29]	NOROUZI M, MIKOLOV T, BENGIO S, et al. Zero-Shot Learning by Convex Combination of Semantic Embeddings[J/OL].[2021-12-12]. ArXiv:1312.5650,2013.
[30]	CHANGPINYO S, CHAO W L, GONG B, et al. Synthesized Classifiers for Zero-Shot Learning[C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Piscataway:IEEE, 2016:5327-5336.
[31]	CHANGPINYO S, CHAO W L, SHA F. Predicting Visual Exemplars of Unseen Classes for Zero-Shot Learning[C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Piscataway:IEEE, 2017:3476-3485.

操作	参数量	样本量
微调卷积层	25.5×10⁶	1.2×10⁶
训练感知机网络	8.4×10⁶	0.001×10⁶

映射空间	Hit@1/%
图像特征空间	25.5
公共空间	26.6

训练集	模型	微调	Hit@1/%
2-HOPS	ConSE	×	8.3
	SYNC	×	12.5
	EXEM	×	10.5
	GCNZ	×	19.8
	HVELN	×	13.3
	ARGCN-DKG	×	25.5
	DGP	×	24.8
	DGP	√	26.6
	文中	×	26.6
3-HOPS	DGP	√	6.3
	文中	×	6.8
ALL	DGP	√	3.0
	文中	×	3.3

方法	训练时间	微调时间	总时间
DGP	0.45	11.10	11.55
文中	0.19	0	0.19