Journal of Xidian University ›› 2022, Vol. 49 ›› Issue (4): 109-117.doi: 10.19665/j.issn1001-2400.2022.04.013

• Computer Science and Technology • Previous Articles     Next Articles

Micro-video multi-label classification method based on multi-modal feature encoding

JING Peiguang(),LI Yaxin(),SU Yuting()   

  1. School of Electrical and Information Engineering,Tianjin University,Tianjin 300072,China
  • Received:2021-05-07 Online:2022-08-20 Published:2022-08-15
  • Contact: Yuting SU E-mail:pgjing@tju.edu.cn;curryxin@tju.edu.cn;ytsu@tju.edu.cn

Abstract:

With the popularization of smart phones and the mobile Internet,micro-videos have been developed rapidly as a new form of user generated contents (UGCs).Browsing micro-videos has become one of the most popular entertainment methods.Micro-video has natural relevance in modalities and semantics.How to make full use of this correlation is the key to micro-video representation learning.Aiming at better solving multi-label classification tasks,a modal subspace encoding algorithm is proposed,which integrates subspace coding for multi-modal and label semantic relevance learning in a unified framework.The proposed algorithm uses the subspace coding network to obtain a complete common representation by modeling the consistency and complementary of modalities and meanwhile the redundancy and noise information are reduced further,so that the common and complete representations of multimodal fusion are obtained.Furthermore,the graph convolutional network is used to construct a label correlation matrix to learn the semantic relevance and representations of labels,which are used to guide the multi-label classification task.Overall,the proposed algorithm makes full use of feature-level and label-level information to improve classification performance.The reconstruction loss and multi-label classification loss are formulated as a whole and experiments on a public dataset have proved superiority of our proposed algorithm.

Key words: micro-video, multi-modal fusion, deep learning, multi-label classification, neural networks

CLC Number: 

  • TP391

Baidu
map