基于强化学习的无人驾驶车辆行为决策方法研究进展

doi:10.16180/j.cnki.issn1007-7820.2021.05.012

摘要/Abstract

摘要：

行为决策系统能够综合环境及自车信息,使自动驾驶车辆产生安全合理的驾驶行为,是实现无人驾驶的核心。强化学习算法采用一种自监督学习的方式,使自动驾驶车辆的决策系统在与环境的交互过程中,通过不断改进自身策略自主学习到最优的决策模型,为构建有效的决策系统提供了方向。文中总结了近年来基于强化学习的行为决策方法在提高决策精度、提高决策广度以及应对不确定因素等方面的研究进展。决策精度的提升主要依赖于引入具有强大表征能力的深度学习技术。决策广度的提升得益于能够通过任务分解以缓解维数灾难的分层抽象技术。不确定因素则通过部分可观测马尔科夫决策过程被纳入考量之中以提高行车安全。

关键词: 无人驾驶, 强化学习, 行为决策, 自监督学习, 策略改进, 决策精度, 决策广度, 不确定因素

Abstract:

The decision-making system can integrate environment and ego vehicle information, so that the autonomous vehicle produces safe and reasonable driving behavior, which is the core technology to realize the autonomous driving. Reinforcement learning algorithm adopts a self-supervised learning method, so that the decision-making system of autonomous vehicles can autonomously learn the optimal decision model through continuous improvement of its strategy during the interaction with the environment, which provides a direction for building an effective decision-making system.This study summarizes the research progress in recent years of the decision-making method based on reinforcement learning in terms of improving decision accuracy, improving decision-making breadth, and dealing with uncertain factors. The improvement of decision-making accuracy mainly depends on the introduction of deep learning algorithm with strong representation ability and the hierarchical abstraction technology that can decompose complex tasks to alleviate the dimension disaster. The uncertainty is considered by partially observable Markov decision process to improve driving safety.

Key words: autonomous driving, reinforcement learning, decision-making, self-monitoring learning, strategy improvement, decision accuracy, decision breadth, uncertainty

中图分类号:

TP242.6

张佳鹏,李琳,朱叶. 基于强化学习的无人驾驶车辆行为决策方法研究进展[J]. 电子科技, 2021, 34(5): 66-71.

ZHANG Jiapeng,LI Lin,ZHU Ye. A Review of Research on Decision-Making Method of Autonomous Vehicle Based on Reinforcement Learning[J]. Electronic Science and Technology, 2021, 34(5): 66-71.

参考文献 37

[1]	Hillel A B, Lerner R, Levi D, et al. Recent progress in road and lane detection:a survey[J]. Machine Vision and Applications, 2014,25(3):727-745. doi: 10.1007/s00138-011-0404-2
[2]	Gao H, Cheng B, Wang J, et al. Object classification using CNN-based fusion of vision and LIDAR in autonomous vehicle environment[J]. IEEE Transactions on Industrial Informatics, 2018,14(9):4224-4231. doi: 10.1109/TII.9424
[3]	Schwarting W, Alonso Mora J, Rus D, et al. Planning and decision-making for autonomous vehicles[J]. Annual Review of Control, Robotics, and Autonomous Systems, 2018,1(7):187-210. doi: 10.1146/annurev-control-060117-105157
[4]	Bae I, Moon J, Cha J, et al. Integrated lateral and longitudinal control system for autonomous vehicles[C]. Qingdao:International Conference on Intelligent Transportation Systems, 2014.
[5]	Sutton R S, Barto A G, et al. Reinforcement learning:An introduction[M]. Cambridge: MIT Press, 2018.
[6]	Leonard J, How J, Teller S, et al. A perception-driven autonomous urban vehicle[J]. Journal of Field Robotics, 2008,25(10):727-774. doi: 10.1002/rob.v25:10
[7]	Montemerlo M, Becker J, Bhat S, et al. Junior:The stanford entry in the urban challenge[J]. Journal of Field Robotics, 2008,25(9):569-597. doi: 10.1002/rob.v25:9
[8]	Urmson C, Anhalt J, Bagnell D, et al. Autonomous driving in urban environments:Boss and the urban challenge[J]. Journal of Field Robotics, 2008,25(8):425-466. doi: 10.1002/rob.v25:8
[9]	Bacha A, Bauman C, Faruque R, et al. Odin:Team victortango's entry in the darpa urban challenge[J]. Journal of Field Robotics, 2008,25(8):467-492. doi: 10.1002/rob.v25:8
[10]	Zheng R, Liu C, Guo Q. A decision-making method for autonomous vehicles based on simulation and reinforcement learning[C]. Tianjin:International Conference on Machine Learning and Cybernetics, 2013.
[11]	Gao Z, Sun T, Xiao H, et al. Decision-making method for vehicle longitudinal automatic driving based on reinforcement Q-learning[J]. International Journal of Advanced Robotic Systems, 2019,16(3):141-172.
[12]	Mnih V, Kavukcuoglu K, Silver D, et al. Playing atari with deep reinforcement learning[EB/OL]. (2013-12-09) [2019-12-20] https://arxiv.org/abs/1312.5602.
[13]	缪冉, 李菲菲, 陈虬. 基于卷积神经网络与多尺度空间编码的场景识别方法[J]. 电子科技, 2020,33(12):54-58,74.
	Miao Ran, Li Feifei, Chen Qiu, et al. Scene recognition method based on convolutional neural network and multi-scale space coding[J]. Electronic Science and Technology, 2020,33(12):54-58,74.
[14]	程俊华, 曾国辉, 刘瑾, 等. 基于深度强化学习的复杂背景分类方法研究[J]. 电子科技, 2020,33(12):59-66.
	Cheng Junhua, Zeng Guohui, Liu Jin, et al. Research on complex background image classification method based on deep learning[J]. Electronic Science and Technology, 2020,33(12):59-66.
[15]	Mnih V, Kavukcuoglu K, Silver D, et al. Human-level control through deep reinforcement learning[J]. Nature, 2015,518(7540):529-533. doi: 10.1038/nature14236
[16]	Silver D, Huang A, Maddison C J, et al. Mastering the game of Go with deep neural networks and tree search[J]. Nature, 2016,529(7587):484-496. doi: 10.1038/nature16961 pmid: 26819042
[17]	Lillicrap, Timothy P, Hunt, et al. Continuous control with deep reinforcement learning[J]. Computer Ence, 2015,8(6):A187-199.
[18]	Wolf P, Hubschneider C, Weber M, et al. Learning how to drive in a real world simulation with deep q-networks[C]. Los Angeles:IEEE Intelligent Vehicles Symposium, 2017.
[19]	Chae H, Kang C M, Kim B D, et al. Autonomous braking system via deep reinforcement learning[C]. Yokohama: IEEE The Twentieth International Conference on Intelligent Transportation Systems, 2017.
[20]	Sallab A E L, Abdou M, Perot E, et al. Deep reinforcement learning framework for autonomous driving[J]. Electronic Imaging, 2017(19):70-76.
[21]	Kendall A, Hawke J, Janz D, et al. Learning to drive in a day[C]. Montreal:International Conference on Robotics and Automation, 2019.
[22]	Ye Y, Zhang X, Sun J. Automated vehicle’s behavior decision making using deep reinforcement learning and high-fidelity simulation environment[J]. Transportation Research Part C:Emerging Technologies, 2019,107(19):155-170. doi: 10.1016/j.trc.2019.08.011
[23]	Al-Emran M. Hierarchical reinforcement learning: a survey[J]. International Journal of Computing and Digital Systems, 2015,4(2):137-143. doi: 10.12785/ijcds/040207
[24]	Vezhnevets A S, Osindero S, Schaul T, et al. Feudal networks for hierarchical reinforcement learning[C]. Sydney:Proceedings of the Thirth-fourth International Conference on Machine Learning-Volume, 2017.
[25]	Nachum O, Gu S S, Lee H, et al. Data-efficient hierarchical reinforcement learning[C]. Montreal:Advances in Neural Information Processing Systems, 2018.
[26]	Paxton C, Raman V, Hager G D, et al. Combining neural networks and tree search for task and motion planning in challenging environments[C]. Vancouver:RSJ International Conference on Intelligent Robots and Systems, 2017.
[27]	Nosrati M S, Abolfathi E A, Elmahgiubi M, et al. Towards practical hierarchical reinforcement learning for multi-lane autonomous driving[C]. Montreal:The Thirty-second Conference on Neural Information Processing Systems, 2018.
[28]	Shani G, Pineau J, Kaplow R. A survey of point-based POMDP solvers[J]. Autonomous Agents and Multi-Agent Systems, 2013,27(1):1-51. doi: 10.1007/s10458-012-9200-2
[29]	Bai H, Hsu D, Lee W S. Integrated perception and planning in the continuous space: A POMDP approach[J]. The International Journal of Robotics Research, 2014,33(9):1288-1302. doi: 10.1177/0278364914528255
[30]	Brechtel S, Gindele T, Dillmann R. Solving continuous POMDPs: Value iteration with incremental learning of an efficient space representation[C]. Karlsruhe:International Conference on Machine Learning, 2013.
[31]	Wei J, Dolan J M, Snider J M, et al. A point-based mdp for robust single-lane autonomous driving behavior under uncertainties[C]. Shanghai:IEEE International Conference on Robotics and Automation, 2011.
[32]	Ulbrich S, Maurer M. Probabilistic online POMDP decision making for lane changes in fully automated driving[C]. Hague:International IEEE Conference on Intelligent Transportation Systems, 2013.
[33]	Brechtel S, Gindele T, Dillmann R. Probabilistic decision-making under uncertainty for autonomous driving using continuous POMDPs[C]. Qingdao: International IEEE Conference on Intelligent Transportation Systems, 2014.
[34]	Bandyopadhyay T, Won K S, Frazzoli E, et al. Intention-aware motion planning[M]. Berlin:Algorithmic Foundations of Robotics X, 2013.
[35]	Bai H, Cai S, Ye N, et al. Intention-aware online POMDP planning for autonomous driving in a crowd[C]. Seattle: IEEE International Conference on Robotics and Automation, 2015.
[36]	Liu W, Kim S W, Pendleton S, et al. Situation-aware decision making for autonomous driving on urban road using online POMDP[C]. Seoul:IEEE Intelligent Vehicles Symposium, 2015.
[37]	Song W, Xiong G, Chen H. Intention-aware autonomous driving decision-making in an uncontrolled intersection[J]. Mathematical Problems in Engineering, 2016,31(2):71-87.