一种改进dueling网络的机器人避障方法

doi:10.19665/j.issn1001-2400.2019.01.008

西安电子科技大学学报 ›› 2019, Vol. 46 ›› Issue (1): 46-50.doi: 10.19665/j.issn1001-2400.2019.01.008

一种改进dueling网络的机器人避障方法

周翼^1,²,陈渤^1,²

^1. 西安电子科技大学雷达信号处理国家重点实验室,陕西西安 710071
^2. 西安电子科技大学信息感知技术协同创新中心,陕西西安 710071

收稿日期:2018-04-17 出版日期:2019-02-20 发布日期:2019-03-05
作者简介:周翼(1993-),男,西安电子科技大学硕士研究生,E-mail: zy1993923@sina.com
基金资助:
国家自然科学基金(61771361);国家自然科学基金杰出青年基金(61525105)

Method for robot obstacle avoidance based on the improved dueling network

ZHOU Yi^1,²,CHEN Bo^1,²

^1. National Key Lab. of Radar Signal Processing, Xidian Univ., Xi’an 710071, China;
^2. Collaborative Innovation Center of Information Sensing and Understanding, Xidian Univ., Xi’an 710071, China;

Received:2018-04-17 Online:2019-02-20 Published:2019-03-05

摘要/Abstract

摘要：

针对传统增强学习方法在运动规划领域,尤其是机器人避障问题上存在容易过估计、难以适应复杂环境等不足,提出了一种基于深度增强学习的提升机器人避障性能的新算法模型。该模型将dueling神经网络架构与传统增强学习算法Q学习相结合,并利用两个独立训练的dueling网络处理环境数据来预测动作值,在输出层分别输出状态值和动作优势值,并将两者结合输出最终动作值。该模型能处理较高维度数据以适应复杂多变的环境,并输出优势动作供机器人选择以获得更高的累积奖励。实验结果表明,该新算法模型能有效地提升机器人避障性能。

关键词: 机器人避障, 深度增强学习, dueling网络, 独立训练

Abstract:

In view of the disadvantages of traditional reinforcement learning methods in motion planning, especially the problem of robot obstacle avoidance, it is easy to have overestimation and difficult to adapt to complex environment. A new model based on deep reinforcement learning is proposed to improve the obstacle avoidance performance of robots. The model combines dueling networks with Q-learning which is the traditional reinforcement learning method, and using two independent trained dueling networks to deal with environmental data and predict the action value. In the output layer, the state value and the action advantage are output respectively, with both values combined as the final action value. The model can process high dimension data to adapt to complex and changeable environment, and output advantageous actions for robot selection to get a higher accumulative reward. It can effectively improve the obstacle avoidance performance of a robot.

Key words: robot obstacle avoidance, deep reinforcement learning, dueling networks, independent trained

中图分类号:

TP242.6

周翼,陈渤. 一种改进dueling网络的机器人避障方法[J]. 西安电子科技大学学报, 2019, 46(1): 46-50.

ZHOU Yi,CHEN Bo. Method for robot obstacle avoidance based on the improved dueling network[J]. Journal of Xidian University, 2019, 46(1): 46-50.

图/表 7

图1

图2

图3

图4

表1

图5

图6

参考文献 12

[1]	LAVALLE S M . Motion Planning[J]. IEEE Robotics and Automation Magazine, 2011,18(2):108-118.
[2]	NILSSON N J . Shakey the Robot[J]. Sri International Menlo Park, 1984,42(1991):38-65.
[3]	ORLIN J . Network Flows[J]. Journal of the Operational Research Society, 1993,45(11):791-796.
[4]	STENTZ A. The Focussed D* Algorithm for Real-time Replanning [C]//Proceedings of the 1995 IEEE Joint Conference on Artificial Intelligence. Piscataway: IEEE, 1995: 1652-1659.
[5]	KHATIB O . Real-time Obstacle Avoidance for Manipulators and Mobile Robots[J]. International Journal of Robotics Research, 1986,5(1):90-98. doi: 10.1007/978-1-4613-8997-2_29
[6]	SUTTON R S, BARTO A G. Reinforcement Learning: an Introduction[M]. 2nd edition. Cambridge: The MIT Press, 2017.
[7]	ZHANG Q C, LIN M, YANG L T , et al. Energy-efficient Scheduling for Real-time Systems Based on Deep Q-learning Model[J]. IEEE Transactions on Sustainable Computing, 2017, DOI 10.1109/TSUSC. 2017. 2743704. doi: 10.1109/TSUSC.2017.2743704
[8]	DERHAMI V, MAJD V J, AHMADABADI M N . Fuzzy Sarsa Learning and the Proof of Existence of Its Stationary Points[J]. Asian Journal of Control, 2008,10(5):535-549. doi: 10.1002/asjc.54
[9]	MNIH V, KAVUKCUOGLU K, SILVER D , et al. Human Level Control Through Deep Reinforcement Learning[J]. Nature, 2015,518:529-533. doi: 10.1038/nature14236 pmid: 25719670
[10]	MNIH V, KAVUKCUOGLU K, SILVER D , et al. Playing Atari with Deep Reinforcement Learning[J]. Computer Science, 2013,1312(5602):23-32.
[11]	PAN J, WANG X CHENG Y , et al. Multisource Transfer Double DQN Based on Actor Learning[J]. IEEE Transactions on Neural Networks and Learning Systems, 2018,29(6):2227-2238. doi: 10.1109/TNNLS.2018.2806087
[12]	WANG Z, SCHAUL T, HESSEL M, et al. Dueling Network Architectures for Deep Reinforcement Learning [C]// Proceedings of the 2016 33rd International Conference on Machine Learning. Lille: International Machine Learning Society (IMLS), 2016: 2939-2947.

算法模型	平均累积奖励	最大奖励	平均步数
Q学习	-20.917	41.5	49
深度Q网络	441.958	3600.0	190
可独立训练dueling网络	742.004	3668.0	294

一种改进dueling网络的机器人避障方法

Method for robot obstacle avoidance based on the improved dueling network

RichHTML

PDF (PC)

赞

可视化

摘要/Abstract

引用本文

使用本文

图/表 7

参考文献 12

相关文章 0

Metrics

本文评价

推荐阅读 10