西安电子科技大学学报 ›› 2019, Vol. 46 ›› Issue (5): 162-170.doi: 10.19665/j.issn1001-2400.2019.05.023

• • 上一篇    下一篇

信噪比信息与时频特征修正相位的语音增强

贾海蓉,王卫梅,吉慧芳   

  1. 太原理工大学 信息与计算机学院,山西 太原 030024
  • 收稿日期:2019-06-12 出版日期:2019-10-20 发布日期:2019-10-30
  • 作者简介:贾海蓉(1977—),女,博士,副教授,E-mail:helenjia722@163.com.
  • 基金资助:
    国家自然科学基金(61371193);山西省自然科学基金(201701D121058)

Speech enhancement based on the modified phase using signal-to-noise ratio information and time-frequency characteristics

JIA Hairong,WANG Weimei,JI Huifang   

  1. College of Information and Computer, Taiyuan University of Technology, Taiyuan 030024, China
  • Received:2019-06-12 Online:2019-10-20 Published:2019-10-30

摘要:

针对在基于谐波模型的相位谱语音增强算法中,只对浊音段相位进行重构导致语音失真和听觉不连贯的问题,提出了用信噪比信息与时频特征改进相位重构的新方法。首先,引入与相位失真有关的时频特征并计算决策阈值;然后利用信噪比信息计算带噪语音与纯净语音的相位偏差,两项比较进一步估计清音段与浊音段的语音相位,能有效改善语音的连贯性;最后将重构的相位与改进二元假设模型的幅值估计结合并进行语音增强。经过对不同噪声背景下的不同语音进行实验表明:新算法的相位差更接近于原信号。与对比算法相比,增强语音的信噪比平均提高2.39dB,语音感知评价指标平均提高0.12,有效地降低了语音失真,提高了语音可懂度。

关键词: 相位重构, 信噪比信息, 时频特征, 决策阈值, 相位偏差

Abstract:

Aiming for the problem that the harmonic model-based phase spectrum speech enhancement algorithm can only reconstruct the phase of voiced segment, which leads to speech distortion and auditory discontinuity, a new method to improve phase reconstruction by using signal-to-noise ratio (SNR) information and time-frequency features is proposed. First, the time-frequency characteristics related to phase distortion are introduced and the decision threshold is calculated. Then the phase deviation between noisy speech and clean speech is calculated by using the signal-to-noise ratio information. The two comparisons further estimate the phase of voiced and unvoiced speech, which can effectively improve the coherence of speech. Finally, the reconstructed phase is combined with the amplitude estimation of the improved binary hypothesis model and the speech enhancement is performed. Experiments on different speeches in different noise backgrounds show that phase deviation of the new algorithm is closer to the original signal. Compared with the comparison algorithm, the signal-to-noise ratio of the enhanced speech is increased by 2.39dB on average, and the perceptual evaluation of speech quality is increased by 0.12 on average, which effectively reduces the speech distortion and improves speech intelligibility.

Key words: phase reconstruction, SNR information, time-frequency characteristics, decision threshold, phase deviation

中图分类号: 

  • TN912.35
Baidu
map