面向室内动态场景的VSLAM

doi:10.16180/j.cnki.issn1007-7820.2022.04.003

电子科技 ›› 2022, Vol. 35 ›› Issue (4): 14-19.doi: 10.16180/j.cnki.issn1007-7820.2022.04.003

面向室内动态场景的VSLAM

伞红军¹,王汪林¹,陈久朋¹,谢飞亚²,徐洋洋¹,陈佳¹

1.昆明理工大学机电工程学院,云南昆明 650500
2.中国人民解放军第78098部队,四川眉山 620031

收稿日期:2021-05-07 出版日期:2022-04-15 发布日期:2022-04-15
作者简介:伞红军 (1976-),男,博士,副教授。研究方向:并联机器人。|王汪林 (1998-),男,硕士研究生。研究方向:视觉SLAM。|陈久朋 (1993-),男,博士,讲师。研究方向:机器人技术及应用。
基金资助:
国家重点研发项目(2017YFC1702503);云南省科技厅重大专项(202002AC080001)

VSLAM for Indoor Dynamic Scenes

Hongjun SAN¹,Wanglin WANG¹,Jiupeng CHEN¹,Feiya XIE²,Yangyang XU¹,Jia CHEN¹

1. Faculty of Mechanical and Electrical Engineering,Kunming University of Science and Technology,Kunming 650500,China
2. No.78098 Unit of PLA,Meishan 620031,China

Received:2021-05-07 Online:2022-04-15 Published:2022-04-15
Supported by:
National Key R&D Projects(2017YFC1702503);Major Special Project of Yunnan Provincial S&T Department(202002AC080001)

摘要/Abstract

摘要：

传统VSLAM算法基于静态场景实现,其在室内动态场景下定位精度退化,三维稀疏点云地图也会出现动态特征点误匹配等问题。文中在ORB-SLAM2框架上进行改进,结合Mask R-CNN进行图像的语义分割,剔除位于动态物体上的动态特征点,优化了相机位姿,得到了静态的三维稀疏点云地图。在公开的TUM数据集上的实验结果表明,结合Mask R-CNN的ORB-SLAM2有效提高了智能移动机器人的位姿估计精度,绝对轨迹的均方根误差可提高96.3%,相对平移轨迹的均方根误差可提高41.2%,相对旋转轨迹的误差也有明显改善。相较于ORB-SLAM2,文中所提方法能更准确地建立无动态物体特征点干扰的三维稀疏点云地图。

关键词: VSLAM, 室内动态场景, Mask R-CNN, 语义分割, 位姿估计精度, ORB-SLAM2, TUM数据集, 三维稀疏点云地图

Abstract:

The traditional VSLAM algorithm is implemented based on static scenes, and the positioning accuracy is degraded in indoor dynamic scenes, and the 3D sparse point cloud map has problems such as mismatching of dynamic feature points. In this study, the ORB-SLAM2 framework is improved, which is combined with Mask R-CNN to perform semantic segmentation of images to remove dynamic feature points located on dynamic objects, optimize the camera pose, and obtain a static 3D sparse point cloud map. The experimental results on the public TUM dataset show that ORB-SLAM2 combined with Mask R-CNN effectively improves the pose estimation accuracy of intelligent mobile robots. The root mean square error of the absolute trajectory can be increased by 96.3%. The root mean square error of relative translation trajectory can be increased by 41.2%, and the relative rotation trajectory error has also been significantly improved. Compared with ORB-SLAM2, the proposed method can more accurately establish a 3D sparse point cloud map without the interference of dynamic object feature points.

Key words: VSLAM, indoor dynamic scene, Mask R-CNN, semantic segmentation, accuracy of pose estimation, ORB-SLAM2, TUM data set, 3D sparse point cloud map

中图分类号:

TP242.6

伞红军,王汪林,陈久朋,谢飞亚,徐洋洋,陈佳. 面向室内动态场景的VSLAM[J]. 电子科技, 2022, 35(4): 14-19.

Hongjun SAN,Wanglin WANG,Jiupeng CHEN,Feiya XIE,Yangyang XU,Jia CHEN. VSLAM for Indoor Dynamic Scenes[J]. Electronic Science and Technology, 2022, 35(4): 14-19.

图/表 11

图1

图2

图3

图4

图5

表1

表2

表3

表4

图6

图7

参考文献 19

[1]	高翔, 张涛, 刘毅. 视觉SLAM十四讲:从理论到实践[M]. 北京: 电子工业出版社, 2019.
	Gao Xiang, Zhang Tao, Liu Yi. Visual SLAM: from theory to practice[M]. Beijing: Publishing House of Electronics Industry, 2019.
[2]	Davison A J, Reid I D, Molton N D. MonoSLAM: Real-time single camera slam[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2007, 29(6):1052-1067. pmid: 17431302
[3]	De Croce M, Pire T, Bergero F. DS-PTAM: distributed stereo parallel tracking and mapping slam system[J]. Journal of Intelligent & Robotic Systems, 2019, 95(2):365-377.
[4]	Mur-Artal R, Tardos J D. ORB-SLAM2: An open-source slam system for monocular, stereo, and RGB-D cameras[J]. IEEE Transactions on Robotics, 2017, 33(5):1255-1262. doi: 10.1109/TRO.2017.2705103
[5]	Enhel J, Schoeps T, Cremers D. LSD-SLAM: Large-scale direct monocular slam[C]. Zurich:Proceedingss of the European Conference on Computer Vision, 2014.
[6]	Forster C, Zhang Z, Gassner M. SVO: Semidirect visual odometry for monocular and multicamera systems[J]. IEEE Transactions on Robotics, 2016, 33(2):249-265. doi: 10.1109/TRO.2016.2623335
[7]	Engel J, Koltun V, Cremers D. Direct sparse odometry[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 40(3):611-625. doi: 10.1109/TPAMI.2017.2658577
[8]	Yang S, Wang J, Wang G, et al. Robust RGB-D slam in dynamic environment using faster R-CNN[C]. Chengdu:IEEE International Conference on Computer and Communications, 2017.
[9]	Ren S, He K, Girshick R, et al. Faster R-CNN: Towards real-time object detection with region proposal networks[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(6):1137-1149. doi: 10.1109/TPAMI.2016.2577031
[10]	Zhong F, Wang S, Zhang Z, et al. Detect-SLAM: Making object detection and slam mutually beneficial[C]. Lake Tahoe:Proceedingss of the IEEE Winter Conference on Applications of Computer Vision, 2018.
[11]	Wang Y B, Huang S D. Towards dense moving object segmentation based robust dense RGB-D slam in dynamic scenarios[C]. Singapore:Proceedingss of the Thirteenth International Conference on Control Automation Robotics & Vision, 2014.
[12]	Bescos B, Facil J M, Civera J, et al. Dynaslam: Tracking, mapping, and inpainting in dynamic scenes[J]. IEEE Robotics and Automation Letters, 2018, 3(4):4076-4083. doi: 10.1109/LRA.2018.2860039
[13]	Yu C, Liu Z X, Liu X J, et al. DS-SLAM: A semantic visual slam towards dynamic environments[C]. Madrid:Proceedingss of the IEEE/RSJ International Conference on Intelligent Robots and Systems, 2018.
[14]	He K, Gkioxari G, Dollar P, et al. Mask R-CNN[C]. Venice:Proceedingss of the IEEE International Conference on Computer Vision, 2017.
[15]	叶飞, 刘子龙. 基于改进YOLOv3算法的行人检测研究[J]. 电子科技, 2021, 34(1):5-9.
	Ye Fei, Liu Zilong. Pedestrian detection based on improved YOLOv3 algorithm[J]. Electronic Science and Technology, 2021, 34(1):5-9.
[16]	Redmon J, Divvala S, Girshick R, et al. You only look once: Unified, real-time object detection[C]. Seattle:Proceedingss of the IEEE Conference on Computer Vision and Pattern Recognition, 2016.
[17]	Tateno K, Tombari F, Laina I, et al. CNN-SLAM: Real-time dense monocular slam with learned depth prediction[C]. Honolulu:Proceedingss of the Conference on Computer Vision and Pattern Recognition, 2017.
[18]	Sturm J, Engelhard N, Endres F, et al. A benchmark for the evaluation of RGB-D slam systems[C]. Vilamoura-Algarve:Proceedingss of the IEEE/RSJ International Conference on Intelligent Robots and Systems, 2012.
[19]	张慧丽, 彭晓东, 谢文明, 等. 一种动态光照下视觉VSLAM中的场景特征匹配方法[J]. 电子设计工程, 2018, 26(24):1-5.
	Zhang Huili, Peng Xiaodong, Xie Wenming, et al. A method of feature matching under changing illumination in VSLAM[J]. Electronic Design Engineering, 2018, 26(24):1-5.

数据集	ORB-SLAM2				本文算法
数据集	均方根误差/m	平均值/m	中值/m	标准差/m	均方根误差/m	平均值/m	中值/m	标准差/m
s_r	0.028	0.022	0.017	0.017	0.020	0.017	0.014	0.011
s_s	0.024	0.020	0.018	0.012	0.023	0.021	0.017	0.010
w_r	0.155	0.128	0.126	0.088	0.069	0.060	0.047	0.035
w_s	0.374	0.362	0.355	0.093	0.021	0.018	0.013	0.010

数据集	均方根误差/%	平均值/%	中值/%	标准差/%
s_r	2.9	2.3	17.6	35.3
s_s	4.2	-5.0	5.6	16.7
w_r	55.5	53.1	62.7	60.2
w_s	94.4	95.0	96.3	89.2

数据集	ORB-SLAM2				本文算法
数据集	均方根误差/m	平均值/m	中值/m	标准差/m	均方根误差/m	平均值/m	中值/m	标准差/m
s_r	0.023	0.018	0.015	0.015	0.016	0.012	0.008	0.011
s_s	0.033	0.025	0.020	0.022	0.020	0.017	0.014	0.012
w_r	0.051	0.032	0.019	0.040	0.030	0.025	0.023	0.017
w_s	0.030	0.018	0.009	0.024	0.027	0.018	0.006	0.019

数据集	均方根误差/%	平均值/%	中值/%	标准差/%
s_r	30.4	33.3	46.7	26.7
s_s	39.4	32.0	30.0	45.5
w_r	41.2	21.9	-21.1	57.5
w_s	10.0	0.0	33.3	20.8

面向室内动态场景的VSLAM

VSLAM for Indoor Dynamic Scenes

RichHTML

PDF (PC)

赞

可视化

摘要/Abstract

引用本文

使用本文

图/表 11

参考文献 19

相关文章 1

Metrics

本文评价

推荐阅读 10