基于深度学习的盲道和盲道障碍物识别算法

doi:10.16180/j.cnki.issn1007-7820.2024.03.010

摘要/Abstract

摘要：

盲道和盲道障碍物是影响盲人出行安全的重要因素,现有算法只对盲道分割和盲道障碍物检测单独处理,效率低且计算量大。针对上述问题,文中提出了一种基于深度学习的多任务识别算法。该算法通过骨干网络提取公共特征,将提取的特征经过SPP(Spatial Pyramid Pooling)和FPN(Feature Pyramid Networks)网络融合特征后,分别传入分割网络和检测网络完成盲道分割和盲道障碍物检测的任务。为了让盲道分割更平整,引入修正损失函数。为了提高障碍物检测召回率,将检测网络的NMS(Non Maximum Suppression)替换为Soft-NMS。实验结果表明,该算法分割部分MIoU(Mean Intersection over Union)、MPA(Mean Pixel Accuracy)分别达到了93.52%、95.29%,检测部分mAP(mean Average Precision)、mAP@0.5以及mAP@0.75分别达到了75.58%、91.58%和74.82%。相较于使用SegFormer网络进行盲道分割和RetinaNet网络进行盲道障碍物检测,该算法在精度提升的同时速度也提升73.72%,FPS(Frames Per Secon)达到了18.52。相比于其他对比算法,该算法在速度和精度上也有一定的提升。

关键词: 盲道分割, 盲道障碍物检测, 目标检测, 图像分割, 特征融合, Transformer, 多任务学习, 深度学习

Abstract:

Blind roads and blind road obstacles are important factors that affect the travel safety of blind people. Existing algorithms only deal with blind road segmentation and blind road obstacle detection separately, with low efficiency and high computational complexity. To solve the above problems, this study proposes a multi-task recognition algorithm based on deep learning. The algorithm extracts public features through the backbone network, after the extracted features are fused through the SPP(Spatial Pyramid Pooling)and FPN(Feature Pyramid Networks)networks, they are respectively passed into the segmentation network and the detection network to complete the tasks of blind road segmentation and blind road obstacle detection. In order to make the blind road segmentation smoother, a correction loss function is introduced. In order to improve the recall rate of obstacle detection, the NMS(Non Maximum Suppression) of the detection network is replaced by Soft-NMS. The experimental results show that the algorithm segmentation part MIoU, MPA reach 93.52%, 95.29%, respectively, and the detection part mAP(mean Average Precision)、mAP@0.5 and mAP@0.75 respectively reach 75.58%、91.58%and 74.82%. Compared with using the SegFormer network for blind road segmentation and the RetinaNet network for blind road obstacle detection, this algorithm not only improves the accuracy, but also improves the speed by 73.72%, and the FPS(Frames Per Secon) reaches 18.52. Compared with other comparative algorithms, this algorithm also has a certain improvement in speed and accuracy.

Key words: blind roads segmentation, blind roads obstacle detection, object detection, image segmentation, feature fusion, Transformer, multi-task learning, deep learning

中图分类号:

TP391

马文杰, 张轩雄. 基于深度学习的盲道和盲道障碍物识别算法[J]. 电子科技, 2024, 37(3): 75-83.

MA Wenjie, ZHANG Xuanxiong. Research on Blind Roads and Obstacle Recognition Algorithm Based on Deep Learning[J]. Electronic Science and Technology, 2024, 37(3): 75-83.

图/表 16

图1

图2

图3

图4

图5

表1

图6

图7

图8

图9

图10

图11

图12

表2

表3

图13

参考文献 20

[1]	赵磊, 李振伟, 杨晓利, 等. 一种基于图像处理的提示盲道检测方法[J]. 计算机技术与发展, 2021, 31(2):91-96.
	Zhao Lei, Li Zhenwei, Yang Xiaoli, et al. A warning blind sidewalk detection method based on image processing[J]. Computer Technology and Development, 2021, 31(2):91-96.
[2]	莫亚男. 基于Gabor特征的盲道检测与应用[D]. 重庆: 重庆师范大学, 2020:41-52.
	Mo Yanan. Blind path detection and application basedon Gabor feature[D]. Chongqing: Chongqing Normal University, 2020:41-52.
[3]	叶倩倩. 基于深度学习的盲道检测算法研究[D]. 郑州: 郑州轻工业大学, 2022:38-46.
	Ye Qianqian. Research on tactile pavement detection algorithm based on deep learning[D]. Zhengzhou: Zhengzhou University of Light Industry, 2022:38-46.
[4]	Cao Z, Xu X, Hu B, et al. Rapid detection of blind roads and crosswalks by using a lightweight semantic segmentation network[J]. IEEE Transactions on Intelligent Transportation Systems, 2020, 22(10):6188-6197. doi: 10.1109/TITS.2020.2989129
[5]	Katzschmann R K, Araki B, Rus D. Safe local navigation for visually impaired users with a time-of-flight and haptic feedback device[J]. IEEE Transactions on Neural Systems and Rehabilitation Engineering, 2018, 26(3):583-593. doi: 10.1109/TNSRE.2018.2800665 pmid: 29522402
[6]	Simöes W C S S, De Lucena V F. Blind user wearable audio assistance for indoor navigation based on visual markers and ultrasonic obstacle detection[C]. Nantou: IEEE International Conference on Consumer Electronics, 2016:933-942.
[7]	方仁杰, 朱维兵. 基于GPS定位与超声波导盲拐杖的设计[J]. 计算机测量与控制, 2011, 19(5):1154-1157.
	Fang Renjie, Zhu Weibing. Design of guiding blind cane based on GPS positioning and ultrasonic detection[J]. Computer Measurement & Control, 2011, 12(5):1154-1157.
[8]	Girshick R. Fast R-CNN[C]. Santiago: Proceedings of theIEEE International Conference on Computer Vision, 2015:2001-2011.
[9]	Redmon J, Divvala S, Girshick R, et al. You only look once:Unified,real-time object detection[C]. Las Vegas: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016:899-905.
[10]	Liu W, Anguelov D, Erhan D, et al. Ssd:Single shot mu-ltibox detector[C]. Amsterdam: Euro Pean Conference on Computer Vision, 2016:56-61.
[11]	段中兴, 王剑, 丁青辉, 等. 基于深度学习的盲道障碍物检测算法研究[J]. 计算机测量与控制, 2021, 29(12):27-32.
	Duan Zhongxing, Wang Jian, Ding Qinghui, et al. Research on obstacle detection algorithm of blind path based on deep learning[J]. Computer Measurement and Control, 2021, 29(12):27-32.
[12]	赵崇, 迟蒙蒙, 储聪, 等. 导盲犬行走机构运动仿真及其视觉识别算法研究[J]. 电子科技, 2021, 34(9):66-72.
	Zhao Chong, Chi Mengmeng, Chu Cong, et al. Research on motion simulation and visual recognition algorithm of guide dog walking mechanism[J]. Electronic Science and Technology, 2021(9):66-72.
[13]	Wu D, Liao M W, Zhang W T, et al. Yolop:You only look once for panoptic driving perception[J]. Machine Intelligence Research, 2022, 19(6):550-562. doi: 10.1007/s11633-022-1339-y
[14]	Xie E, Wang W, Yu Z, et al. SegFormer:Simple and efficient design for semantic segmentation with transformers[J]. Advances in Neural Information Processing Systems, 2021, 34(4):12077-12090.
[15]	Lin T Y, Goyal P, Girshick R, et al. Focal loss for dense object detection[C]. Venice: Proceedings of the IEEE International Conference on Computer Vision, 2017:628-633.
[16]	He K, Zhang X, Ren S, et al. Deep residual learning forimage recognition[C]. Las Vegas: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016:269-276.
[17]	Lin T Y, Dollár P, Girshick R, et al. Feature pyramid networks for object detection[C]. Venice: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017:315-320.
[18]	He K, Zhang X, Ren S, et al. Spatial pyramid pooling in deep convolutional networks for visual recognition[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2015, 37(9):1904-1916. doi: 10.1109/TPAMI.2015.2389824 pmid: 26353135
[19]	黄静, 谢宣. 基于改进的SSD监理目标检测研究[J]. 电子科技, 2022, 35(5):7-13.
	Huang Jing, Xie Xuan. Research on supervision object detection based on improved SSD[J]. Electronic Science and Technology, 2022, 35(5):7-13.
[20]	Bodla N, Singh B, Chellappa R, et al. Soft-NMS:Improving object detection with one line of code[C]. Venice: Proceedings of the IEEE International Conference on Computer Vision, 2017:698-705.

阶段	层名	参数
Stage 1	Overlapping Patch Embeding	K₁=7;S₁=4;P₁=3;C₁=64
Stage 1	Transformer Encoder	R₁=8;N₁=1;E₁=8;L₁=3
Stage 2	Overlapping Patch Embeding	K₂=3;S₂=2;P₂=1;C₂=128
Stage 2	Transformer Encoder	R₂=4;N₂=2;E₂=8;L₂=3
Stage 3	Overlapping Patch Embeding	K₃=3;S₃=2;P₃=1;C₃=320
Stage 3	Transformer Encoder	R₃=2;N₃=5;E₃=4;L₃=18
Stage 4	Overlapping Patch Embeding	K₄=3;S₄=2;P₄=1;C₄=512
Stage 4	Transformer Encoder	R₄=1;N₄=8;E₄=4;L₄=3

方法	MIoU/%	MPA/%
FCN	83.83	86.46
Unet	85.33	87.17
DeepLab V3+	88.57	89.92
SegNet	89.46	89.23
SegFormer	91.78	93.73
本文	93.25	95.29

方法	mAP@0.5 /%	mAP@0.75 /%	mAP /%
Faster R-CNN	80.45	61.39	60.52
SSD	84.36	62.21	64.45
YOLOV3	85.69	67.13	65.34
RetinaNet (Backbone=ResNet50)	87.43	68.47	69.67
RetinaNet (Backbone=Transformer)	89.15	71.23	71.56
本文	91.58	74.82	75.85