一种面向二维三维卷积的GPGPU cache旁路系统

doi:10.19665/j.issn1001-2400.2023.02.010

Abstract

Abstract:

As the core computing platform of the convolution neural network,general-purpose graphics processor(GPGPU),its performance of processing two-dimensional and three-dimensional convolution determines the application of the neural network in real-time target recognition and detection.However,limited by inherent cache system design,the current GPGPU architecture cannot achieve efficient acceleration of 2D and 3D convolution computing.Aiming at this problem,a dynamic L1Dcache bypassing design for this problem is proposed.First,we define a new data structure that can dynamically reflect the cache access characteristics of an instruction,and then defines a memory-access-feature record table based on this information,in order to record the execution status of different memory accesses.Second,the warp scheduling strategy with the priority thread block is adopted,which can speed up the sampling of the memory access state.Next,the L1Dcache bypassing decision of memory accesses under different PCs is obtained due to the sampling results.Finally,the L1Dcache bypassing of some low-locality data accesses is completed.As a result,the L1Dcache space is reserved for data with high locality and the memory access stall cycle of 2D and 3D convolution is reduced.In addition,the memory access efficiency of 2D and 3D convolution has been improved.Compared with the original design,experimental results show that the L1Dcache bypassing design brings 2.16% performance improvements in 2D convolution and 19.79% in 3D convolution.Experiments prove the effectiveness and practicality of this design.

Key words: convolution, GPGPU, memory system, cache bypassing

CLC Number:

JIA Shiwei,ZHANG Yuming,QIN Xiang,SUN Chenglu,TIAN Ze. GPGPU cache bypassing system for 2D and 3D convolution[J].Journal of Xidian University, 2023, 50(2): 92-100.

Figures/Tables 9

References 20

[1]	KRIZHEVSKY A, SUTSKEVER I, HINTON G. ImageNet Classification with Deep Convolutional Neural Networks[J]. Advances in neural information processing systems, 2012, 25(2):1106-1114.
[2]	CHATTERJEE S, ZIELINSKI P. On the Generalization Mystery in Deep Learning (2022)[J/OL].[2022-6-3]. https://doi.org/10.48550/arXiv.2203.10036.
[3]	韩永赛, 马时平, 何林远, 等. 改进YOLOv3的快速遥感机场区域目标检测[J]. 西安电子科技大学学报, 2021, 48(5):156-166.
	HAN Yongsai, MA Shiping, HE Linyuan, et al. Detection of the Object in the Fast Remote Sensing Airport Area on the Improved YOLOv3[J]. Journal of Xidian University, 2021, 48(5):156-166.
[4]	ZOU Z, SHI Z, GUO Y, et al. Object Detection in 20 Years:A Survey (2019)[J/OL].[2019-5-16]. https://doi.org/10.48550/arXiv.1905.05055.
[5]	DU H, SHI H, ZENG D, et al. The Elements of End-to-End Deep Face Recognition:A Survey of Recent Advances (2021)[J/OL].[2021-12-27]. https://doi.org/10.48550/arXiv.2009.13290.
[6]	REDMON J, FARHADI A. YOLO9000:Better,Faster,Stronger[C]//IEEE Conference on Computer Vision & Pattern Recognition. Piscataway:IEEE, 2017:6517-6525.
[7]	RUSSAKOVSKY O, DENG J, SU H, et al. ImageNet Large Scale Visual Recognition Challenge[J]. International Journal of Computer Vision, 2015, 115(3):211-252. doi: 10.1007/s11263-015-0816-y
[8]	HE K, GKIOXARI G, DOLLÁR P, et al. Mask R-CNN[C]//Proceedings of the IEEE international conference on computer vision. Piscataway:IEEE, 2017:2961-2969.
[9]	LIU W, ANGUELOV D, ERHAN D, et al. SSD:Single Shot Multibox Detector[C]//European Conference on Computer Vision (ECCV), Heidelberg:Springer, 2016.21-37.
[10]	CHEN X, MA H, WAN J, et al. Multi-View 3D Object Detection Network for Autonomous Driving[C]//2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway:IEEE, 2017.6526-6534.
[11]	HOWARD A G, ZHU M, CHEN B, et al. MobileNets:Efficient Convolutional Neural Networks for Mobile Vision Applications (2017)[J/OL].[2017-04-17]. https://arxiv.org/abs/1704.04861.
[12]	LIANG Y, LU L, XIAO Q, et al. Evaluating Fast Algorithms for Convolutional Neural Networks on FPGAs[J]. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 2020, 39(4):857-870 doi: 10.1109/TCAD.43
[13]	ZHANG S, DU Z, LEI Z, et al. Cambricon-X:An Accelerator for Sparse Neural Networks[C]∥49th Annual IEEE/ACM International Symposium on Microarchitecture. New York: IEEE, 2016:1-12
[14]	GUO K, SUI L, QIU J, et al. Angel-Eye:A Complete Design Flow for Mapping CNN onto Embedded FPGA[J]. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 2017, 37(1):35-47 doi: 10.1109/TCAD.2017.2705069
[15]	CHEN Y H, KRISHNA T, EMER J S, et al. Eyeriss:An Energy-Efficient Reconfigurable Accelerator for Deep Convolutional Neural Networks[J]. IEEE Journal of Solid-State Circuits, 2017, 52(1):127-138 doi: 10.1109/JSSC.2016.2616357
[16]	LUEBKE D, HUMPHREYS G. How GPUs Work[J]. IEEE Computer, 2007, 40:96-100.
[17]	NVIDIA Corporation. NVIDIA TESLA V100 GPU ARCHITECTURE (2017)[DB/OL].[2017-5-8]. https://www.nvidia.cn/content/dam/en-zz/zh_cn/Solutions/Data-Center/volta-gpu-architecture/Volta-Architecture-Whitepaper-v1.1-CN.compressed.pdf.
[18]	NVIDIA Corporation. NVIDIA TURING GPGPU ARCHITECTURE (2018)[DB/OL].[2018-8-14]. https://images.nvidia.cn/aem-dam/en-zz/Solutions/design-visualization/technologies/turing-architecture/NVIDIA-Turing-Architecture-Whitepaper.pdf.
[19]	NVIDIA Corporation. NVIDIA A100 Tensor Core GPU ARCHITECTURE (2020)[DB/OL].[2020-5-5]. https://images.nvidia.cn/aem-dam/en-zz/Solutions/data-center/nvidia-ampere-architecture-whitepaper.pdf#cid=_pa-srch-baid_zh-cn.
[20]	NVIDIA Corporation. NVIDIAH100 Tensor Core GPU ARCHITECTURE (2022)[DB/OL].[2022-9-19]. https://nvdam.widen.net/s/9bz6dw7dqr/gtc22-whitepaper-hopper.pdf.

GPGPU cache bypassing system for 2D and 3D convolution

RichHTML

PDF (PC)

Like

Knowledge

Abstract

Cite this article

share this article

Figures/Tables 9

References 20

Related Articles 15

Metrics

Comments

Recommended 10

[1]	GAN Ping, NONG Liping, ZHANG Wenhui, LIN Jiming, WANG Junyi. Attention spatial-temporal graph neural network for traffic prediction [J]. Journal of Xidian University, 2023, 50(1): 168-176.
[2]	CAO Xincheng, YAO Bin, HE Wangpeng, CHEN Binqiang, QING Tao. Edge-cloud collaborative transfer of process knowledge for digital manufacturing monitoring [J]. Journal of Xidian University, 2022, 49(6): 152-163.
[3]	WANG Kan, WANG Mengyang, LIU Xin, TIAN Guoqiang, LI Chuan, LIU Wei. Event detection by combining self-attention and CNN-BiGRU [J]. Journal of Xidian University, 2022, 49(5): 181-188.
[4]	CHEN Junjie, DENG Honggao, MA Mou, JIANG Junzheng. GRN-GRU:a fault detection model for wireless sensor networks [J]. Journal of Xidian University, 2022, 49(5): 60-67.
[5]	GAO Deyong,KANG Zibing,WANG Song,WANG Yangping. Method to recognize human action by using the convolutional block attention mechanism [J]. Journal of Xidian University, 2022, 49(4): 144-155.
[6]	SHI Yunlong,YUAN Wenhao,HU Shaodong,LOU Yingxi. Convolutional quasi-recurrent network for real-time speech enhancement [J]. Journal of Xidian University, 2022, 49(3): 183-190.
[7]	YANG Zixuan,XIAO Song,DONG Wenqian,QU Jiahui. Thermal target detection method introducing an attention mechanism [J]. Journal of Xidian University, 2022, 49(3): 28-35.
[8]	ZHOU Peng,YANG Jun. Index edge geometric convolution neural network for point cloud classification [J]. Journal of Xidian University, 2022, 49(2): 207-217.
[9]	GAO Jie,HUO Zhiyong. Algorithmfor image inpainting in generative adversarial networks based on gated convolution [J]. Journal of Xidian University, 2022, 49(1): 216-224.
[10]	SHI Jiahui,HAO Xiaohui,LI Yanni. Efficient self-supervised meta-transfer algorithm for few shot learning [J]. Journal of Xidian University, 2021, 48(6): 48-56.
[11]	YU Haoyang,YIN Liang,LI Shufang,LV Shun. Recognition algorithm for the little sample radar modulation signal based on the generative adversarial network [J]. Journal of Xidian University, 2021, 48(6): 96-104.
[12]	CHEN Changchuan,WANG Haining,HUANG Lian,HUANG Tao,LI Lianjie,HUANG Xiangkang,DAI Shaosheng. Facial expression recognition based on local representation [J]. Journal of Xidian University, 2021, 48(5): 100-109.
[13]	SONG Jianfeng,MIAO Qiguang,WANG Chongxiao,XU Hao,YANG Jin. Multi-scale single object tracking based on the attention mechanism [J]. Journal of Xidian University, 2021, 48(5): 110-116.
[14]	DONG Ruchan,JIAO Licheng,ZHAO Jin,SHEN Weiyan. Application of the deep fusion mechanism in object detection of remote sensing images [J]. Journal of Xidian University, 2021, 48(5): 128-138.
[15]	ZHANG Yuhao,CHENG Peitao,ZHANG Shuhao,WANG Xiumei. Lightweight image super-resolution with the adaptive weight learning network [J]. Journal of Xidian University, 2021, 48(5): 15-22.