西安电子科技大学学报 ›› 2023, Vol. 50 ›› Issue (6): 148-160.doi: 10.19665/j.issn1001-2400.20230308

• 信息与通信工程 & 计算机科学与技术 • 上一篇    下一篇

Winograd转置卷积快速实现方法研究

李钊1(),黄程程1(),何益智1(),苏晓杰2()   

  1. 1.山东理工大学 计算机科学与技术学院,山东 淄博 255000
    2.重庆大学 自动化学院,重庆 400044
  • 收稿日期:2022-11-04 出版日期:2023-12-20 发布日期:2024-01-22
  • 作者简介:李钊(1983—),男,副教授,E-mail:lizhao_buaa@126.com;|黄程程(1997—),男,山东理工大学硕士研究生,E-mail:sdut_hcc@163.com;|何益智(1995—),男,山东理工大学硕士研究生,E-mail:1127345146@qq.com;|苏晓杰(1985—),男,教授,E-mail:suxiaojie@cqu.edu.cn
  • 基金资助:
    国家重点研发计划(2022YFE0107300);山东省高等学校青年创新团队发展计划(2019KJN048)

Research on the fast implementation method of Winograd transposed convolution

LI Zhao1(),HUANG Chengcheng1(),HE Yizhi1(),SU Xiaojie2()   

  1. 1. School of Computer Science and Technology,Shandong University of Technology,Zibo 255000,China
    2. School of Automation,Chongqing University,Chongqing 400044,China
  • Received:2022-11-04 Online:2023-12-20 Published:2024-01-22

摘要:

Winograd转置卷积算法是现场可编程门阵列中广泛使用的卷积加速方法,可通过分组后执行Winograd卷积来解决转置卷积的零填充问题。然而该方法需要对输入特征映射和卷积核进行分组运算,且需要对运算结果进行重组,以生成完整的输出特征映射,复杂的元素坐标计算增加了设计的复杂度。针对上述问题,提出一种采用统一转换矩阵计算Winograd转置卷积的方法,使用统一的转换矩阵代替对输入特征映射和卷积核进行分组,有效解决了重叠求和、零填充、卷积核翻转、分解和重组等问题。并在该方法的指导下,结合数据重用、双缓冲区设计和流水线等方法,完成了现场可编程门阵列上转置卷积的加速器的设计。选择高斯-泊松生成对抗网络进行实验验证,并与主流的转置卷积设计方法进行了综合比较。实验结果表明,提出的方法可有效降低资源消耗和功耗,加速器的有效性能比现有的转置卷积方法提高了约1.13至23.92倍。

关键词: 统一转换矩阵, Winograd转置卷积, 现场可编程门阵列, 加速器

Abstract:

The Winograd transposed convolution algorithm is a widely used convolution acceleration method for Field Programmable Gate Array(FPGA).It can solve the zero-padding problem of transposed convolution by performing the Winograd convolution after grouping.However,this method requires grouping operation on the input feature map and convolution kernel,and needs to reorganize the operation results to generate a complete output feature map.The complex calculation of element coordinates increases the difficulty of design.To solve the above problems,a Winograd transposed convolution method based on the unified transformation matrix is proposed,which uses the unified transformation matrix instead of grouping the input feature map and convolution kernel,and effectively solves the problems of overlapping summation,zero padding,convolution kernel inversion,decomposition and reorganization.And under the guidance of the Winograd transpose convolution method based on the unified transformation matrix,combined with data reuse,the double buffer and the pipeline,the design of a transposed convolution accelerator on FPGA is completed.The Gaussian-Poisson generative adversarial network is selected for experimental verification,and compared with the mainstream transposed convolution method.Experimental results show that the proposed method can effectively reduce the resource consumption and power consumption,and that the effective performance of the accelerator is 1.13x~23.92x higher than that of the existing transposed convolution methods.

Key words: unified transformation matrix, Winograd transposed convolution, field programmable gate array, accelerator

中图分类号: 

  • TP18
Baidu
map