信噪比信息与时频特征修正相位的语音增强

doi:10.19665/j.issn1001-2400.2019.05.023

摘要/Abstract

摘要：

针对在基于谐波模型的相位谱语音增强算法中,只对浊音段相位进行重构导致语音失真和听觉不连贯的问题,提出了用信噪比信息与时频特征改进相位重构的新方法。首先,引入与相位失真有关的时频特征并计算决策阈值;然后利用信噪比信息计算带噪语音与纯净语音的相位偏差,两项比较进一步估计清音段与浊音段的语音相位,能有效改善语音的连贯性;最后将重构的相位与改进二元假设模型的幅值估计结合并进行语音增强。经过对不同噪声背景下的不同语音进行实验表明:新算法的相位差更接近于原信号。与对比算法相比,增强语音的信噪比平均提高2.39dB,语音感知评价指标平均提高0.12,有效地降低了语音失真,提高了语音可懂度。

关键词: 相位重构, 信噪比信息, 时频特征, 决策阈值, 相位偏差

Abstract:

Aiming for the problem that the harmonic model-based phase spectrum speech enhancement algorithm can only reconstruct the phase of voiced segment, which leads to speech distortion and auditory discontinuity, a new method to improve phase reconstruction by using signal-to-noise ratio (SNR) information and time-frequency features is proposed. First, the time-frequency characteristics related to phase distortion are introduced and the decision threshold is calculated. Then the phase deviation between noisy speech and clean speech is calculated by using the signal-to-noise ratio information. The two comparisons further estimate the phase of voiced and unvoiced speech, which can effectively improve the coherence of speech. Finally, the reconstructed phase is combined with the amplitude estimation of the improved binary hypothesis model and the speech enhancement is performed. Experiments on different speeches in different noise backgrounds show that phase deviation of the new algorithm is closer to the original signal. Compared with the comparison algorithm, the signal-to-noise ratio of the enhanced speech is increased by 2.39dB on average, and the perceptual evaluation of speech quality is increased by 0.12 on average, which effectively reduces the speech distortion and improves speech intelligibility.

Key words: phase reconstruction, SNR information, time-frequency characteristics, decision threshold, phase deviation

中图分类号:

TN912.35

贾海蓉,王卫梅,吉慧芳. 信噪比信息与时频特征修正相位的语音增强[J]. 西安电子科技大学学报, 2019, 46(5): 162-170.

JIA Hairong,WANG Weimei,JI Huifang. Speech enhancement based on the modified phase using signal-to-noise ratio information and time-frequency characteristics[J]. Journal of Xidian University, 2019, 46(5): 162-170.

图/表 8

图1

图2

图3

图4

图5

图6

表1

表2

参考文献 17

[1]	ZHENG N J, ZHANG X L . Phase-aware Speech Enhancement Based on Deep Neural Networks[J]. IEEE/ACM Transactions on Audio, Speech and Language Processing, 2019,27(1):63-76.
[2]	WAKABAYASHI Y, FUKUMORI T, NAKAYAMA M , et al. Single-channel Speech Enhancement with Phase Reconstruction Based on Phase Distortion Averaging[J]. IEEE/ACM Transactions on Audio, Speech and Language Processing, 2018,26(9):1559-1569.
[3]	BARYSENKA S Y, VOROBIOV V I, MOWLAEE P . Single-channel Speech Enhancement Using Inter-component Phase Relations[J]. Speech Communication, 2018,99:144-160.
[4]	KRAWCZYK M, GERKMANN T . STFT Phase Reconstruction in Voiced Speech for an Improved Single-channel Speech Enhancement[J]. IEEE/ACM Transactions on Audio, Speech and Language Processing, 2014,22(12):1931-1940.
[5]	KULMER J, MOWLAEE P . Phase Estimation in Single Channel Speech Enhancement Using Phase Decomposition[J]. IEEE Signal Processing Letters, 2015,22(5):598-602.
[6]	MOWLAEE P, SAEIDI R . Time-frequency Constraints for Phase Estimation in Single-channel Speech Enhancement[C]// Proceedings of the 2014 14th International Workshop on Acoustic Signal Enhancement. Piscataway: IEEE, 2014: 337-341.
[7]	WAKABAYASHI Y, FUKUMORI T, NAKAYAMA M , et al. Single-channel Speech Enhancement with Phase Reconstruction Based on Phase Distortion Averaging[J]. IEEE/ACM Transactions on Audio, Speech and Language Processing, 2018,26(9):1559-1569.
[8]	王栋, 贾海蓉 . 改进相位谱补偿的语音增强算法[J]. 西安电子科技大学学报, 2017,44(3):83-88.
	WANG Dong, JIA Hairong . Speech Enhancement Using Improved Phase Spectrum Compensation[J]. Journal of Xidian University, 2017,44(3):83-88.
[9]	KANDAGATLA R K, SUBBAIAH PV . Speech Enhancement Using MMSE Estimation of Amplitude and Complex Speech Spectral Coefficients under Phase-uncertainty[J]. Speech Communication, 2018,96:10-27.
[10]	MOWLAEE P, KULMER J . Harmonic Phase Estimation in Single-channel Speech Enhancement Using Phase Decomposition and SNR Information[J]. IEEE/ACM Transactions on Audio, Speech and Language Processing, 2015,23(9):1521-1532.
[11]	KRAWCZYK-BECKER M, GERKMANN T . An Evaluation of the Perceptual Quality of Phase-aware Single-channel Speech Enhancement[J]. Journal of the Acoustical Society of America, 2016,140(4):364-369.
[12]	KRAWCZYK-BECKER M, GERKMANN T . On MMSE-based Estimation of Amplitude and Complex Speech Spectral Coefficients Under Phase-uncertainty[J]. IEEE/ACM Transactions on Audio, Speech and Language Processing, 2016,24(12):2251-2262.
[13]	MOWLAEE P, SAEIDI R, STYLIANOU Y . Advances in Phase-aware Signal Processing in Speech Communication[J]. Speech Communication, 2015,81:2-29.
[14]	GONZALEZ S, BROOKES M . PEFAC - A Pitch Estimation Algorithm Robust to High Levels of Noise[J]. IEEE/ACM Transactions on Audio, Speech and Language Processing, 2014,22(2):518-530.
[15]	GERKMANN T . Bayesian Estimation of Clean Speech Spectral Coefficients Given a Priori Knowledge of the Phase[J]. IEEE Transactions on Signal Processing, 2014,62(16):4199-4208.
[16]	VARY P . Noise Suppression by Spectral Magnitude Estimation-mechanism and Theoretical Limits[J]. Signal Processing, 1985,8(4):387-400.
[17]	PLAPOUS C, MARRO C, MAUUARY L , et al. A Two-step Noise Reduction Technique[C]// Proceedings of the 2004 IEEE International Conference on Acoustics, Speech and Signal Processing. Piscataway: IEEE, 2004: I289-I292.

噪声类型	输入信噪比	文献[4]算法		文中改进算法
噪声类型	输入信噪比	SP01语音	SP15语音	SP01语音	SP15语音
White 噪声	0	7.769 2	4.741 3	8.281 3	6.616 1
	5	10.920 3	5.211 2	11.774 9	8.019 4
	10	12.947 0	10.418 6	15.342 8	14.255 2
	15	14.804 1	15.301 4	18.464 2	18.261 5
Pink 噪声	0	7.729 4	4.147 5	8.067 4	6.391 3
	5	10.500 5	5.959 2	11.632 8	8.759 2
	10	13.321 7	10.019 6	14.764 5	13.794 7
	15	15.063 0	15.876 8	18.127 6	19.398 6
F16 噪声	0	7.569 9	2.299 0	7.626 8	6.016 9
	5	10.493 0	5.409 7	11.176 8	8.209 2
	10	13.042 9	10.680 6	14.535 7	14.444 7
	15	15.004 9	15.540 5	18.235 3	19.992 2

噪声类型	输入信噪比/dB	含噪语音SP01	含噪语音SP15	文献[4]算法		文中改进算法
				SP01	SP15	SP01	SP15
				White 噪声	0	1.456 4	1.306 7	1.907 9	1.758 7	1.941 6	1.792 3
5	1.748 1	1.619 0	2.288 3		2.089 7	2.301 9	2.131 4
10	2.064 9	1.980 5	2.632 5		2.285 3	2.675 8	2.406 4
15	2.407 8	2.371 1	2.753 0		2.654 1	2.871 4	2.903 2
Pink 噪声	0	1.619 9	1.401 9	2.003 8	1.805 3	2.087 3	1.862 1
	5	1.954 6	1.753 7	2.319 8	2.139 3	2.424 0	2.280 4
	10	2.287 4	2.133 1	2.632 0	2.367 3	2.708 8	2.518 2
	15	2.587 4	2.562 9	2.766 4	2.502 0	2.963 9	2.741 8
F16 噪声	0	1.880 3	1.429 9	2.229 9	1.710 1	2.243 3	1.934 4
	5	2.156 2	1.794 3	2.516 8	2.252 8	2.585 5	2.358 9
	10	2.462 1	2.184 3	2.692 1	2.398 4	2.843 1	2.566 7
	15	2.656 9	2.600 2	2.848 6	2.517 4	3.016 6	2.725 0