采用空洞卷积的多尺度融合草图识别模型

doi:10.19665/j.issn1001-2400.2021.05.012

摘要/Abstract

摘要：

针对现有的基于深度学习的草图识别方法大多将普通卷积作为草图特征提取的主要手段。而忽略了草图对象的稀疏性特点,提出了一种通过空洞卷积实现草图特征提取的草图识别模型。该模型将空洞卷积和普通卷积融合,利用空洞卷积不增加卷积核有效单元数量即可扩大感受野的特性,实现对草图结构特征的初步提取。考虑到空洞卷积的稀疏采样方式使得远距离卷积得到的信息之间没有相关性,对分类结果会产生影响,于是在使用空洞卷积对图像特征进行稀疏提取的同时,使用具有相同大小感受野的普通卷积对输入图像特征进行密集提取,最后将两种不同卷积方式输出的特征在通道维度上进行拼接。这种方法不仅发挥了空洞卷积的稀疏采样特性,也充分利用到不同卷积方式带来的多尺度信息优势。实验结果表明,该模型在TU-Berlin SKetch数据集取得了72.6%的识别准确率,相较于目前主流的草图识别方法,效果更加明显。

关键词: 空洞卷积, 多尺度融合, 草图识别, 卷积神经网络, 感受野

Abstract:

Focused on the issue that existing sketch recognition methods based on deep learning still use ordinary convolution as the main method of sketch feature extraction,ignoring the sparsity characteristics of sketch objects,this paper proposes a sketch recognition model based on dilated convolution.This model combines the dilated convolution and ordinary convolution by using the dilated convolution’s characteristics of expanding the receptive field without increasing the number of effective units of the convolution kernel,to realize the preliminary extraction of the structural features of the sketch.Due to the sparsely sampled input signal of the dilated convolution,there is no correlation between the information obtained by the long-distance convolution,which will affect the classification result.Therefore,the model uses the dilated convolution and ordinary convolution to extract the input image features separately,and finally adds the feature output by the two different convolution methods in the channel dimension.This method not only takes advantage of the sparse sampling characteristics of the dilated convolution,but also makes full use of the advantages of multi-scale information from different convolution methods.Experimental results show that this model has achieved a recognition accuracy of 72.6% on the TU-Berlin SKetch dataset,indicating that it has certain advantages over the current mainstream sketch recognition methods.

Key words: dilated convolution, multi-scale fusion, sketch recognition, convolutional neural network, receptive filed

中图分类号:

TP391

杨云航,闵连权. 采用空洞卷积的多尺度融合草图识别模型[J]. 西安电子科技大学学报, 2021, 48(5): 92-99.

YANG Yunhang,MIN Lianquan. Multi-scalefusion sketch recognition model by dilated convolution[J]. Journal of Xidian University, 2021, 48(5): 92-99.

图/表 7

图1

图2

表1

图3

表2

图4

表3

参考文献 16

[1]	EITZ M, HAYS J, ALEXA M. How Do Humans Sketch Objects?[J]. ACM Transactions on Graphics, 2012, 31(4):1-10.
[2]	吴玲达, 邓维, 张友根, 等. 在线草图识别研究综述[J]. 计算机应用研究, 2015, 32(06):1601-1607.
	WU Lingda, DENG Wei, ZHANG Yougen, et al. Review of Online Sketch Recognition[J]. Application Research of Computers, 2015, 32(06):1601-1607.
[3]	汪昱东, 郭继昌, 王天保. 一种改进的雾天图像行人和车辆检测算法[J]. 西安电子科技大学学报, 2020, 47(4):70-77.
	WANG Yudong, GUO Jichang, WANG Tianbao. Algorithm for Foggy-image Pedestrian and Behicle Detection[J]. Journal of Xidian University, 2020, 47(4):70-77.
[4]	YU Q, YAN Y X, LIU F, et al. Sketch-a-net:A Deep Neural Network that Beats Humans[J]. International Journal of Computer Vision, 2017, 122(3):411-425. doi: 10.1007/s11263-016-0932-3
[5]	印桂生, 严雪, 王宇华, 等. 基于卷积神经网络的手绘草图识别[J]. 吉林大学学报:信息科学版, 2019, 37(4):417-425.
	YIN Guisheng, YAN Xue, WANG Yuhua, et al. Sketch Recognition Based on Convolution Neural Network[J]. Journal of Jilin University (Information Science Edition), 2019, 37(4):417-425.
[6]	SERT M, BOYACI E. Sketch Recognition Using Transfer Learning[J]. Multimedia Tools and Applications, 2019, 78(12):17095-17112. doi: 10.1007/s11042-018-7067-1
[7]	AGRAWAL S, SINGH R K, SINGH U P, et al. Biogeography Particle Swarm Optimization Based Counter Propagation Network for Sketchbased Face Recognition[J]. Multimedia Tools and Applications, 2019, 78(8):9801-9825. doi: 10.1007/s11042-018-6542-z
[8]	赵鹏, 冯晨成, 韩莉, 等. 融合深度学习和语义树的草图识别方法[J]. 模式识别与人工智能, 2019, 32(4):361-368.
	ZHAO Peng, FENG Chencheng, HAN Li, et al. Sketch Recognition Combining Deep Learning and Semantic Tree[J]. Pattern Recognition and Artificial Intelligence, 2019, 32(4):361-368.
[9]	林景栋, 吴欣怡, 柴毅, 等. 卷积神经网络结构优化综述[J]. 自动化学报, 2020, 46(1):24-37.
	LIN Jindong, WU Xinyi, CHAI Yi, et al. Structure Optimization of Convolutional Neural Networks:A Survey[J]. Acta Automatica Sinica, 2020, 46(1):24-37.
[10]	XU P, HUANG Y Y, YUAN T T, et al. Sketchmate:Deep Hashing for Million-Scale Human Sketch Retrieval[C]// Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition.Piscataway:IEEE, 2018:8090-8098.
[11]	赵鹏, 刘杨, 刘慧婷, 等. 基于深度卷积-递归神经网络的手绘草图识别方法[J]. 计算机辅助设计与图形学学报, 2018, 30(2):217-224.
	ZHAO Peng, LIU Yang, LIU Huiting, et al. Free-Hand Sketch Recognition Method Based on Deep Convolution-Recurrent Neural Network[J]. Journal of Computer-Aided Design & Computer Graphics, 2018, 30(2):217-224.
[12]	LIU Y J, TANG K, JONEJA A. Sketch-Based Free-Form Shape Modelling with a Fast and Stable Numerical Engine[J]. Computers & Graphics, 2005, 29(5):771-786. doi: 10.1016/j.cag.2005.08.007
[13]	ZIEGLER T, FRITSCHE M, KUHN L, et al. Efficient Smoothing of Dilated Convolutions for Image Segmentation[J/OL]. [2020-12-28]. https://arxiv.org/abs/1903.07992v1.
[14]	SCHNEIDER R G, TUYTELAARSY T. Sketch Classification and Classification-Driven Analysis Using Fisher Vectors[J]. ACM Transactions on Graphics, 2014, 33(6):1-9.
[15]	KRIZHEVSKY A, SUTSKEVER I, HINTON G E. ImageNet Classification with Deep Convolutional Neural Networks[J]. Communications of the ACM, 2017, 60(6):84-90. doi: 10.1145/3065386
[16]	ZHANG H, LIU S, ZHANG C Q, et al. Sketchnet:Sketch Classification with Web Images[C]// Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition.Piscataway:IEEE, 2016:1105-1113.

操作类型	输入	过滤器大小	步长	填充	输出
Conv1-DC	224×224×1	5×5	1	0	224×224×32
Conv1-C	224×224×1	9×9	1	0	224×224×32
Pool1(max)	224×224×64	2×2	2	0	112×112×64
Conv2-DC	112×112×64	3×3	1	0	112×112×64
Conv2-C	112×112×64	5×5	1	0	112×112×64
Pool2(avg)	112×112×128	2×2	2	0	56×56×128
Conv3-DC	56×56×128	3×3	1	0	56×56×128
Pool3(max)	56×56×128	2×2	2	0	28×28×128
Conv4-DC	28×28×128	3×3	1	0	28×28×128
Conv4-C	28×28×128	5×5	1	0	28×28×128
Pool4(avg)	28×28×256	2×2	2	0	14×14×256
Conv5	14×14×256	3×3	1	0	14×14×512
Pool5(max)	14×14×512	2×2	2	0	7×7×512
Fc1	7×7×512				2 048×1
Fc2	2 048×1				1 024×1
Fc3	1 024×1				250×1

方法	识别准确率	方法	识别准确率
HOG-SVM	56.0	Sketch-Net	70.4
SIFT-Fisher Vector	61.5	DCSN	70.5
AlexNet	65.9	deep-CRNN-sketch	71.8
Sketch-a-Net	69.6	DCSNet	72.6

卷积核参数	识别准确率/%
k=5,d=1	68.9
k=9,d=1	70.3
k=5,d=2	71.5
(k=5,d=2)+(k=9,d=1)=DCSNet	72.6