J4 ›› 2015, Vol. 42 ›› Issue (3): 135-140+191.doi: 10.3969/j.issn.1001-2400.2015.03.023

• 研究论文 • 上一篇    下一篇

宽带信号匹配滤波的GPU实现及性能优化

周航;蔡志明;王希敏   

  1. (海军工程大学 电子工程学院,湖北 武汉  430033)
  • 收稿日期:2014-05-04 修回日期:2014-05-28 出版日期:2015-06-20 发布日期:2015-07-27
  • 通讯作者: 周航
  • 作者简介:周航(1985-),男,海军工程大学博士研究生,E-mail:zhh06@163.com.
  • 基金资助:

    国家自然科学基金资助项目(51009146)

Implementation and optimization of the wideband matched filter on the GPU

ZHOU Hang;CAI Zhiming;WANG Ximin   

  1. (School of Electronic Engineering, Naval Univ. of Engineering, Wuhan  430033, China)
  • Received:2014-05-04 Revised:2014-05-28 Online:2015-06-20 Published:2015-07-27
  • Contact: ZHOU Hang

摘要:

从宽带相关的角度推导了基于小波变换的匹配滤波算法及基于快速傅里叶变换(FFT)算法,并分析了算法复杂度,提出了基于图形处理器(GPU)的可配置宽带匹配滤波的软件实现和理论预测与函数实测结合的优化方法.通过优化线程块的维度、绑定纹理寄存器来改进内核函数性能,再使用计算统一设备架构(CUDA)库来降低FFT与极值搜索的时延,并进行了性能优化设计.在性能测试中,文中方法在GPU平台的实现相比8核CPU平台的实现具有3.3倍加速比,其处理时延能够满足宽带匹配滤波的实时性需求.

关键词: 信号处理, 并行计算, 图形处理器, 程序优化, 连续小波变换

Abstract:

The fine estimation of wideband ambiguity, which has a sharp main ridge, requires large amounts of searching on the time-scale. That desperately needs the well-optimized software on high performance hardware. In terms of wideband correlation, the matched filter based on the CWT and its fast algorithm based on the FFT are studied, and furthermore its complexity is analyzed. Then a reconfigurable implementation on the GPU is proposed, and a method of optimization that combines analysis with testing is proposed. By optimizing the dimension of the thread block and utilizing texture memory, the time of the kernel is reduced; the CUDA library is introduced, so the delays of the FFT and maximum searching are reduced. In comparison with the method in the 8-core CPU, the proposed method improves the overall performance up to 3.3 times. The speed can meet the challenge of real-time processing of the wideband matched filter.

Key words: signal processing, parallel computing, graphics processing unit (GPU), program optimization, continuous wavelet transform (CWT)

Baidu
map