电子科技 ›› 2023, Vol. 36 ›› Issue (8): 43-48.doi: 10.16180/j.cnki.issn1007-7820.2023.08.007

• • 上一篇    下一篇

基于Tiny-YOLOv3的网络结构化压缩与加速

胡永阳1,2,李淼1,孟凡开1,2,张峰1,孟艺薇1,3,宋宇鲲2   

  1. 1.中国科学院自动化研究所 国家专用集成电路设计工程技术研究中心,北京 100190
    2.合肥工业大学 微电子学院,安徽 合肥 230009
    3.首都师范大学 信息工程学院,北京 100048
  • 收稿日期:2022-03-21 出版日期:2023-08-15 发布日期:2023-08-14
  • 作者简介:胡永阳(1998-),男,硕士研究生。研究方向:集成电路设计与测试。|李淼(1992-),女,工程师。研究方向:人工智能、数字电路设计。|宋宇鲲(1975-),男,副研究员。研究方向:多核系统设计、数字信号处理VLSI实现等。
  • 基金资助:
    国家重点研发计划(2018YFB2202604)

Structured Compression and Acceleration of Network Based on Tiny-YOLOv3

HU Yongyang1,2,LI Miao1,MENG Fankai1,2,ZHANG Feng1,MENG Yiwei1,3,SONG Yukun2   

  1. 1. National ASIC Design Engineering Center,Institute of Automation,Chinese Academy of Sciences, Beijing 100190,China
    2. School of Microelectronics,Hefei University of Technology,Hefei 230009,China
    3. School of Information Engineering,Capital Normal University,Beijing 100048,China
  • Received:2022-03-21 Online:2023-08-15 Published:2023-08-14
  • Supported by:
    National Key R&D Program of China(2018YFB2202604)

摘要:

针对特定应用场景下,Tiny-YOLOv3(You Only Look Once v3)网络在嵌入式平台部署时存在资源开销大、运行速度慢的问题,文中提出了一种结合剪枝与量化的结构化压缩方案,并搭建了针对压缩后网络的卷积层加速系统。结构化压缩方案使用稀疏化训练与通道剪枝来减少网络中的计算量,使用激活值定点数量化和权重二的整数次幂量化来减少网络卷积层中的参数存储量。在卷积层加速系统中,可编程逻辑部分按照并行加流水线方法设计了一个卷积层加速器核,处理系统部分负责卷积层加速系统调度。实验结果表明,Tiny-YOLOv3经过结构化压缩后的网络平均准确度为0.46,参数压缩率达到了5%。卷积层加速系统在Xilinx的ZYNQ芯片进行部署时,硬件可以稳定运行在250 MHz时钟频率下,卷积运算单元的算力为36 GOPS。此外,加速平台整体功耗为2.6 W,且硬件设计节约了硬件资源。

关键词: 目标检测网络, Tiny-YOLOv3, 神经网络压缩, 结构化剪枝, 量化, 硬件加速, 流水线, ZYNQ

Abstract:

In particular application scenarios, Tiny-YOLOv3 network has problems of high resource cost and slow running speed when deployed on embedded platform. This study proposes a structured compression scheme combining pruning and quantization, and establishes a convolutional layer acceleration system for compressed network. The structured compression scheme uses sparse training and channel pruning to reduce the amount of computation in the network, and utilizes fixed-point quantization of activation value and integer power quantization of weight two to reduce the storage of parameters in the network convolution layer. In the convolution layer accelerator system, the programmable logic part designs a convolution layer accelerator core according to the parallel plus pipeline method, and the processing system part is responsible for the scheduling of the convolution layer accelerator system. The experimental results show that the mean average precision of Tiny-YOLOv3 network after structured compression is 0.46, and the parameter compression ratio reaches 5%. When the convolution layer acceleration system is deployed on Xilinx ZYNQ chip, the hardware can run stably at 250 MHz clock frequency, and the calculation force of the convolution operation unit is 36 GOPS. In addition, the overall power consumption of the acceleration platform is 2.6 W, and the hardware design greatly saves hardware resources.

Key words: object detection network, Tiny-YOLOv3, neural network compression, structural pruning, quantization, hardware acceleration, pipeline, ZYNQ

中图分类号: 

  • TP391