西安电子科技大学学报 ›› 2022, Vol. 49 ›› Issue (6): 103-110.doi: 10.19665/j.issn1001-2400.2022.06.013

• 计算机科学与技术 & 人工智能 • 上一篇    下一篇

面向大规模零样本图像识别的高效算法框架

张泽欢1,2(),刘强1,2(),国狄非3()   

  1. 1.天津大学 微电子学院,天津 300072
    2.天津市成像与感知微电子技术重点实验室,天津 300072
    3.天津通信广播集团有限公司,天津 300140
  • 收稿日期:2022-01-07 出版日期:2022-12-20 发布日期:2023-02-09
  • 通讯作者: 刘强(1978—),男,教授,博士,E-mail:qiangliu@tju.edu.cn
  • 作者简介:张泽欢(1997—),男,天津大学硕士研究生,E-mail:zehuanzhang@tju.edu.cn|国狄非(1975—),男,博士,E-mail:gdfmail@163.com
  • 基金资助:
    国家自然科学基金(U21B2031)

High efficient framework for large-scale zero-shot image recognition

ZHANG Zehuan1,2(),LIU Qiang1,2(),GUO Difei3()   

  1. 1. School of Microelectronics,Tianjin University,Tianjin 300072,China
    2. Tianjin Key Laboratory of Imaging and Sensing Microelectronic Technology,Tianjin 300072,China
    3. Tianjin Communication & Broadcasting Group Co.,Ltd.,Tianjin 300140,China
  • Received:2022-01-07 Online:2022-12-20 Published:2023-02-09

摘要:

在大规模零样本图像识别任务中,由于图像类别数量大,所以模型训练困难,且模型的训练成本高。为了解决这些问题,设计了高效的零样本学习算法框架,在低训练成本的前提下提高了模型识别精度和泛化能力。该算法框架中定义了公共空间,利用图像分支网络和语义分支网络分别将不同模态的特征向量映射至公共空间完成模型训练和推理。在图像分支网络中,为了改变图像特征向量的分布,使用感知机网络将图像特征向量映射至公共空间;在语义分支网络中,采用图卷积网络将语义向量映射至公共空间。同时,设计损失函数对公共空间进行约束,使得公共空间中不同类别的区分度被增大,利于模型训练。在ImageNet数据集上的实验结果显示,在“2-HOPS”测试集上,与不需要微调的现有方法相比,该算法框架的精度提高了1.1%,训练时间节省了57.8%;与需要微调的现有算法相比,该算法框架在不损失精度的情况下节省了98.4%的训练时间。实验结果表明,该算法框架以低训练成本实现了模型性能的提升。

关键词: 深度学习, 知识图谱, 图神经网络

Abstract:

For large-scale zero-shot image recognition tasks,because of a large number of classes,model training is difficult and training costs of the model are high.In order to solve those problems,this paper designs a high-efficient zero-shot learning framework,which improves the accuracy and generalization ability at low training costs.This framework designs the joint space,uses the image branch network and the semantic branch network to map different modal vectors to the joint space to complete model training and inference.In the image branch network,in order to change the distribution of image feature vectors,this paper uses the perceptron network to map image feature vectors to the joint space.In the semantic branch network,graph convolutional networks are used to map semantic vectors to the joint space.In addition,the loss function is designed to constrain the joint space,so that the discrimination of different classes in the joint space is increased,which is conducive to model training.Experimental results on the ImageNet show that on the “2-HOPS” test set,compared with existing methods without fine-tuning,the accuracy of our algorithm increases by 1.1%,and the training time decreases by 57.8%;compared with existing algorithms after fine-tuning,the accuracy of our algorithm saves 98.4% of training time without any loss of accuracy.Experimental results show that the method improves the model performance with low training costs.

Key words: deep learning, knowledge graph, graph neural networks

中图分类号: 

  • TP183
Baidu
map