电子科技 ›› 2021, Vol. 34 ›› Issue (5): 66-71.doi: 10.16180/j.cnki.issn1007-7820.2021.05.012

• • 上一篇    下一篇

基于强化学习的无人驾驶车辆行为决策方法研究进展

张佳鹏,李琳,朱叶   

  1. 上海理工大学 光电信息与计算机工程学院,上海 200000
  • 收稿日期:2020-02-17 出版日期:2021-05-15 发布日期:2021-05-24
  • 作者简介:张佳鹏(1995-),男,硕士研究生。研究方向:强化学习,自动驾驶。|李琳(1983-),女,博士,副教授。研究方向:多智能体系统,一致性控制,鲁棒控制。
  • 基金资助:
    国家自然科学基金(61673277)

A Review of Research on Decision-Making Method of Autonomous Vehicle Based on Reinforcement Learning

ZHANG Jiapeng,LI Lin,ZHU Ye   

  1. School of Optical-Electrical and Computer Engineering,University of Shanghai for Science and Technology,Shanghai 200000,China
  • Received:2020-02-17 Online:2021-05-15 Published:2021-05-24
  • Supported by:
    National Natural Science Foundation of China(61673277)

摘要:

行为决策系统能够综合环境及自车信息,使自动驾驶车辆产生安全合理的驾驶行为,是实现无人驾驶的核心。强化学习算法采用一种自监督学习的方式,使自动驾驶车辆的决策系统在与环境的交互过程中,通过不断改进自身策略自主学习到最优的决策模型,为构建有效的决策系统提供了方向。文中总结了近年来基于强化学习的行为决策方法在提高决策精度、提高决策广度以及应对不确定因素等方面的研究进展。决策精度的提升主要依赖于引入具有强大表征能力的深度学习技术。决策广度的提升得益于能够通过任务分解以缓解维数灾难的分层抽象技术。不确定因素则通过部分可观测马尔科夫决策过程被纳入考量之中以提高行车安全。

关键词: 无人驾驶, 强化学习, 行为决策, 自监督学习, 策略改进, 决策精度, 决策广度, 不确定因素

Abstract:

The decision-making system can integrate environment and ego vehicle information, so that the autonomous vehicle produces safe and reasonable driving behavior, which is the core technology to realize the autonomous driving. Reinforcement learning algorithm adopts a self-supervised learning method, so that the decision-making system of autonomous vehicles can autonomously learn the optimal decision model through continuous improvement of its strategy during the interaction with the environment, which provides a direction for building an effective decision-making system.This study summarizes the research progress in recent years of the decision-making method based on reinforcement learning in terms of improving decision accuracy, improving decision-making breadth, and dealing with uncertain factors. The improvement of decision-making accuracy mainly depends on the introduction of deep learning algorithm with strong representation ability and the hierarchical abstraction technology that can decompose complex tasks to alleviate the dimension disaster. The uncertainty is considered by partially observable Markov decision process to improve driving safety.

Key words: autonomous driving, reinforcement learning, decision-making, self-monitoring learning, strategy improvement, decision accuracy, decision breadth, uncertainty

中图分类号: 

  • TP242.6
Baidu
map