西安电子科技大学学报 ›› 2024, Vol. 51 ›› Issue (1): 114-124.doi: 10.19665/j.issn1001-2400.20230206

• 计算机科学与技术 • 上一篇    下一篇


丁昕苗(), 王家兴(), 郭文()   

  1. 山东工商学院 信息与电子工程学院,山东 烟台 264005
  • 收稿日期:2022-10-31 出版日期:2023-08-29 发布日期:2023-08-29
  • 通讯作者: 郭文(1978—),男,教授,E-mail:wguo@sdtbu.edu.cn
  • 作者简介:丁昕苗(1979—),女,教授,E-mail:dingxinmiao@126.com;
  • 基金资助:

Three-dimensional attention-enhanced algorithm for violence scene detection

DING Xinmiao(), WANG Jiaxing(), GUO Wen()   

  1. School of Information and Electronic Engineering,Shandong Technology and Business University,Yantai 264005,China
  • Received:2022-10-31 Online:2023-08-29 Published:2023-08-29


为了提升互联网多媒体内容安全检测能力,有效过滤不良信息,提出了一种基于三维注意力增强的视频暴力内容检测算法。该算法以3D-DenseNet为骨干网络,首先利用P3D提取低层次的时空特征信息;其次引入SimAM注意力模块计算通道-空间注意力,增强帧画面重点区域信息;然后设计了时域注意力加强的过渡层突出重点时序信息,如此形成通道-空间-时间三维注意力,提升暴力场景检测性能。实验结果显示,算法在内容单一的小规模暴力行为检测数据集Hockey和Movies上准确率分别达到了98.75%和100%,在内容多样的大规模数据集RWF-2 000上达到了89.25%,综合性能优于同类算法,验证了算法的有效性;在长视频的暴力内容定位实验中,算法在VSD2014数据集上相较同类算法也取得了更好的检测效果,证明了算法在暴力内容检测方面的泛化能力。

关键词: 暴力检测, 深度学习, 注意力机制, 模式识别, P3D, 3D-DenseNet


In order to improve the ability of multimedia to analyze the security on Web and effectively filter the objectionable content,a violent video scene detection algorithm based on three-dimensional attention is proposed.Taking the 3D DenseNet as the backbone network,the algorithm first uses the P3D to extract low-level spatial-temporal feature information.Second,the SimAM attention module is introduced to calculate channel-spatial attention so as to enhance the feature of the key area in the video frame.Then,a transition layer with temporal attention is designed to highlight the feature of key frames in the video.In this way,the channel-spatial-temporal attention is formed to better detect violent scenes.In the experiments on violence detection,the accuracy reaches 98.75% and 100% on Hockey and Movies,which are small data sets with a single content,and 89.25% on RWF-2000,which is a large data set with a diverse content.Results show that the proposed algorithm can effectively improve the performance of violence detection with 3D attention.In the violent content localization detection experiment on data set VSD2014,the better performance further proves the effectiveness and generalization ability of the algorithm.

Key words: violence detection, deep learning, attention mechanism, pattern recognition, P3D, 3D-DenseNet


  • TP311