飞行器强化学习多模在轨控制

doi:10.19665/j.issn1001-2400.2020.02.011

摘要/Abstract

摘要：

为了提高飞行器控制系统长期在轨飞行的可靠性,提出了一种基于强化学习的多模式控制系统方案。该系统包括传感器模块、控制模块和执行模块。其中,传感器模块用于向控制模块实时输入飞行器敏感的飞行数据,该数据分为可供飞行器控制直接使用的具有历史相关性的多维结构化浮点数据以及某特定传感器独有的物理表征量;控制模块使用实时并行化决策机制,分为输入层、特征抽取层和全连接层;执行模块用于接收控制模块实时输出的驱动数据,包括用于决策的状态最优值和用于评价的动作输出值。系统根据用于决策的回报最优值决定使用哪些具体的执行模块,而某个被选定的具体执行模块的输出值取决于用于评价的动作输出值。该系统使飞行器在多模式输入输出状态下具备15ms快响应,5.23GOPs/sec/W(性能功耗比单位)性能功耗比的能力。

关键词: 飞行器, 控制系统, 多模式, 强化学习

Abstract:

In order to improve the long-term in orbit flight reliability of the aircraft control system, a multi-mode control scheme is proposed based on reinforcement learning. This system includes a sensor module, a control module and an execution module. The sensor module is used to input the sensitive flight data of the aircraft to the control module in real time. This data is divided into multidimensional structured floating point data with historical relevance that can be directly used for aircraft control and the unique physical representation quantity of a particular sensor. The control module is divided into an input layer, a feature extraction layer and a full connection layer. The execution module is used to receive the driving data from the control module in real time, which includes the optimal state value for decision-making and the action output value for evaluation. The system decides which specific execution modules to use based on the optimal return value for decision making, with the output value of a selected specific execution module depending on the output value of the action used for evaluation. The system enables the aircraft to complete a long-term orbit operation in the multi-mode input and output state with 15ms fast response and 5.23GOP/s/W Performance per Watt.

Key words: aircraft, control system, multi-mode, reinforcement learning

中图分类号:

TN911.22

张英,韦闽峰,王世会,陶磊岩,曹健,张兴. 飞行器强化学习多模在轨控制[J]. 西安电子科技大学学报, 2020, 47(2): 75-82.

ZHANG Ying,WEI Minfeng,WANG Shihui,TAO Leiyan,CAO Jian,ZHANG Xing. Aircraft reinforcement learning multi-mode control in orbit[J]. Journal of Xidian University, 2020, 47(2): 75-82.

图/表 6

图1

图2

图3

图4

图5

表1

参考文献 23

[1]	YANG Y C, GAO Z C . A New Method for Control Allocation of Aircraft Flight Control System[J]. IEEE Transactions on Automatic Control ( Early Access ), 2019, DOI: 10.1109/TAC.2019.2918122.
[2]	ORLANDO C, ESPOSITO A, ALAIMO A . An Alternative Tuning Scheme for Simple Adaptive Flight Control System[J]. Journal of Physics: Conference Series, 2019,1215(1):012015.
[3]	MA Z, LI H, GU Y , et al. Flight and Hover Control System Design for a Mini-quadrotor Based on Multi-sensors[J]. International Journal of Control, Automation, and Systems, 2019,17(2):486-499.
[4]	THEIL S, AMMANN N, ANDERT F , et al. ATON (Autonomous Terrain-based Optical Navigation) for Exploration Missions: Recent Flight Test Results[J]. CEAS Space Journal, 2018,10(3):325-341.
[5]	ZHANG Ying, ZHANG Xing, CAO Jian , et al. Processor Free Time Forcasting Based on Convolutional Neural Network[C]// 第37届中国控制会议论文集(F). 北京: 中国自动化学会控制理论专业委员会, 2018: 9331-9336.
[6]	LI Y, ZHANG Y, XIE W C . Joint Transmit-receive Subarray Syconfproc Optimization for Hybid MIMO Phased-array Radar[C]// 2017 10th International Congress on Image and Signal Processing, BioMedical Engineering and Informatics. Piscataway: IEEE, 2017, DOI: 10.1109/CISP-BMEI.2017.8302148.
[7]	ZHANG Y, TAO L Y, WANG S H , et al. A redundant fault-tolerant aviation control system based on deep neural network[C]// Proceedings of 2019 IEEE 1st International Conference on Civil Aviation Safety and Information Technology. Piscataway: IEEE, 2019: 475-480.
[8]	ZHANG Y, CAO J, TAO L Y , et al. An Improved Deep Q-learning for Intelligent Transmitter Control System[C]// Lecture Notes in Electrical Engineering: 594. Heidelberg: Springer Verlag, 2020: 344-351.
[9]	张英, 戚红向, 李毅 , 等. 捷联惯导系统的一种标定方法[J]. 导弹与航天运载技术, 2018,12(s2):57-60.
	ZHANG Ying, QI Hongxiang, LI Yi , et al. A Calibration Method of Strapdown Inertial Navigation System[J]. Missiles and Space Vehicles, 2018,12(s2):57-60.
[10]	张英, 戚红向, 李毅 , 等. 一种四元数描述刚体姿态的方法[J]. 导弹与航天运载技术, 2018,12(s2):61-65.
	ZHANG Ying, QI Hongxiang, LI Yi , et al. A Quaternion Method for Describing Rigid Body Attitude)[J]. Missiles and Space Vehicles, 2018,12(s2):61-65.
[11]	CHANG Y X, ZHANG Y, LIAO L W , et al. IP Softcore for a Bubbling Convolutional Accelerator in a Neural Net Work[J]. Electronics World, 2019,125(1993):34-37.
[12]	ZHANG Q, CAO J, ZHANG Y , et al. FPGA Implementation of Quantized Convolutional Neural Networks[C]// Proceedings of 2019 IEEE International Conference on Communication Technology. Piscataway: IEEE, 2019: 1605-1610.
[13]	张文柱, 邵丽娜 . 异构无线网络中基于强化学习的频谱管理算法[J]. 西安电子科技大学学报, 2011,38(4):32-37.
	ZHANG Wenzhu, SHAO Li’na . Dynamic Spectrum Allocation Algorithm for Heterogeneous Radio Networks Based on Reinforcement Learning[J]. Journal of Xidian University, 2011,38(4):32-37.
[14]	马卓然, 马建峰, 苗银宾 , 等. 无人机网络中基于状态迁移的访问控制模型[J]. 西安电子科技大学学报, 2018,45(6):44-50.
	MA Zhuoran, MA Jianfeng, MIAO Yinbing , et al. State Transition-based Access Control Model in the UAV Network[J]. Journal of Xidian University, 2018,45(6):44-50.
[15]	MIROWSKI P, PASCANU R, VIOLA F , et al. Learning to Navigate in Complex Environments[C]// Conference Track Proceedings of the 2017 5th International Conference on Learning Representations. San Diego: International Conference on Learning Representations, 2019: 149804.
[16]	YAHYA A, LI A, KALAKRISHNAN M , et al. Collective Robot Reinforcement Learning with Distributed Asynchronous Guided Policy Search[C]// Proceedings of the 2017 IEEE International Conference on Intelligent Robots and Systems. Piscataway: IEEE, 2017: 79-86.
[17]	DUAN Y, CHEN X, HOUTHOOFT R , et al. Benchmarking Deep Reinforcement Learning for Continuous Control[C]// Proceedings of the 2016 33rd International Conference on Machine Learning. Lille: International Machine Learning Society, 2016: 2001-2014.
[18]	LEE A X, LEVINE S, ABBEEL P . Learning Visual Servoing with Deep Features and Fitted Q-iteration[C]// Conference Track Proceedings of the 2017 5th International Conference on Learning Representations. San Diego: International Conference on Learning Representations, 2019: 149804.
[19]	MAHLER J, LIANG J, NIYAZ S , et al. Dex-Net 2.0: Deep Learning to Plan Robust Grasps with Synthetic Point Clouds andAnalytic Grasp Metrics[J]. Robotics: Science and Systems, 2017,13(1):136707 .
[20]	PEREZ-D’ARPINO C, SHAH J A . C-LEARN: Learning Geometric Constraints from Demonstrations for Multi-step Manipulation in Shared Autonomy[C]// Proceedings of the 2017 IEEE International Conference on Robotics and Automation. Piscataway: IEEE, 2017: 4058-4065.
[21]	ZHANG C, LI P, SUN G Y , et al. Optimizing FPGA-based Accelerator Design for Deep Convolutional Neural Networks[C]// Proceedings of the 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays. New York: ACM, 2015: 161-170.
[22]	SUDA N, CHANDRA V, DASIKA G , et al. Throughput-optimized OpenCL-based FPGA Accelerator for Large-scale Convolutional Neural Networks[C]// Proceedings of the 2016 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays. New York: ACM, 2016: 16-25
[23]	GSCHWEND D . ZynqNet: An FPGA-accelerated Embedded Convolutional Neural Network[D]. Zürich: ETH Zurich. 2016.

内容	主服务器		FPGA2015^[21]	FPGA2016^[22]	ZynqNet^[23]	文中
内容	CPU	GPU	FPGA2015^[21]	FPGA2016^[22]	ZynqNet^[23]	文中
功率/W	69.00	142.00	18.61	25.8	12	1.65
效率/(GOPs/sec/W)	103.31×10^-6	64.70×10^-6	3.31	4.57	0.56	5.23
比例	1.97×10^-5	1.24×10^-5	0.63x	0.87	0.11x	1.00x