西安电子科技大学学报 ›› 2019, Vol. 46 ›› Issue (1): 46-50.doi: 10.19665/j.issn1001-2400.2019.01.008

• • 上一篇    下一篇

一种改进dueling网络的机器人避障方法

周翼1,2,陈渤1,2   

  1. 1. 西安电子科技大学 雷达信号处理国家重点实验室,陕西 西安 710071
    2. 西安电子科技大学 信息感知技术协同创新中心,陕西 西安 710071
  • 收稿日期:2018-04-17 出版日期:2019-02-20 发布日期:2019-03-05
  • 作者简介:周 翼(1993-),男,西安电子科技大学硕士研究生,E-mail: zy1993923@sina.com
  • 基金资助:
    国家自然科学基金(61771361);国家自然科学基金杰出青年基金(61525105)

Method for robot obstacle avoidance based on the improved dueling network

ZHOU Yi1,2,CHEN Bo1,2   

  1. 1. National Key Lab. of Radar Signal Processing, Xidian Univ., Xi’an 710071, China;
    2. Collaborative Innovation Center of Information Sensing and Understanding, Xidian Univ., Xi’an 710071, China;
  • Received:2018-04-17 Online:2019-02-20 Published:2019-03-05

摘要:

针对传统增强学习方法在运动规划领域,尤其是机器人避障问题上存在容易过估计、难以适应复杂环境等不足,提出了一种基于深度增强学习的提升机器人避障性能的新算法模型。该模型将dueling神经网络架构与传统增强学习算法Q学习相结合,并利用两个独立训练的dueling网络处理环境数据来预测动作值,在输出层分别输出状态值和动作优势值,并将两者结合输出最终动作值。该模型能处理较高维度数据以适应复杂多变的环境,并输出优势动作供机器人选择以获得更高的累积奖励。实验结果表明,该新算法模型能有效地提升机器人避障性能。

关键词: 机器人避障, 深度增强学习, dueling网络, 独立训练

Abstract:

In view of the disadvantages of traditional reinforcement learning methods in motion planning, especially the problem of robot obstacle avoidance, it is easy to have overestimation and difficult to adapt to complex environment. A new model based on deep reinforcement learning is proposed to improve the obstacle avoidance performance of robots. The model combines dueling networks with Q-learning which is the traditional reinforcement learning method, and using two independent trained dueling networks to deal with environmental data and predict the action value. In the output layer, the state value and the action advantage are output respectively, with both values combined as the final action value. The model can process high dimension data to adapt to complex and changeable environment, and output advantageous actions for robot selection to get a higher accumulative reward. It can effectively improve the obstacle avoidance performance of a robot.

Key words: robot obstacle avoidance, deep reinforcement learning, dueling networks, independent trained

中图分类号: 

  • TP242.6
Baidu
map