信噪比信息与时频特征修正相位的语音增强

doi:10.19665/j.issn1001-2400.2019.05.023

Abstract

Abstract:

Aiming for the problem that the harmonic model-based phase spectrum speech enhancement algorithm can only reconstruct the phase of voiced segment, which leads to speech distortion and auditory discontinuity, a new method to improve phase reconstruction by using signal-to-noise ratio (SNR) information and time-frequency features is proposed. First, the time-frequency characteristics related to phase distortion are introduced and the decision threshold is calculated. Then the phase deviation between noisy speech and clean speech is calculated by using the signal-to-noise ratio information. The two comparisons further estimate the phase of voiced and unvoiced speech, which can effectively improve the coherence of speech. Finally, the reconstructed phase is combined with the amplitude estimation of the improved binary hypothesis model and the speech enhancement is performed. Experiments on different speeches in different noise backgrounds show that phase deviation of the new algorithm is closer to the original signal. Compared with the comparison algorithm, the signal-to-noise ratio of the enhanced speech is increased by 2.39dB on average, and the perceptual evaluation of speech quality is increased by 0.12 on average, which effectively reduces the speech distortion and improves speech intelligibility.

Key words: phase reconstruction, SNR information, time-frequency characteristics, decision threshold, phase deviation

CLC Number:

TN912.35

JIA Hairong,WANG Weimei,JI Huifang. Speech enhancement based on the modified phase using signal-to-noise ratio information and time-frequency characteristics[J].Journal of Xidian University, 2019, 46(5): 162-170.

Figures/Tables 8

References 17

[1]	ZHENG N J, ZHANG X L . Phase-aware Speech Enhancement Based on Deep Neural Networks[J]. IEEE/ACM Transactions on Audio, Speech and Language Processing, 2019,27(1):63-76.
[2]	WAKABAYASHI Y, FUKUMORI T, NAKAYAMA M , et al. Single-channel Speech Enhancement with Phase Reconstruction Based on Phase Distortion Averaging[J]. IEEE/ACM Transactions on Audio, Speech and Language Processing, 2018,26(9):1559-1569.
[3]	BARYSENKA S Y, VOROBIOV V I, MOWLAEE P . Single-channel Speech Enhancement Using Inter-component Phase Relations[J]. Speech Communication, 2018,99:144-160.
[4]	KRAWCZYK M, GERKMANN T . STFT Phase Reconstruction in Voiced Speech for an Improved Single-channel Speech Enhancement[J]. IEEE/ACM Transactions on Audio, Speech and Language Processing, 2014,22(12):1931-1940.
[5]	KULMER J, MOWLAEE P . Phase Estimation in Single Channel Speech Enhancement Using Phase Decomposition[J]. IEEE Signal Processing Letters, 2015,22(5):598-602.
[6]	MOWLAEE P, SAEIDI R . Time-frequency Constraints for Phase Estimation in Single-channel Speech Enhancement[C]// Proceedings of the 2014 14th International Workshop on Acoustic Signal Enhancement. Piscataway: IEEE, 2014: 337-341.
[7]	WAKABAYASHI Y, FUKUMORI T, NAKAYAMA M , et al. Single-channel Speech Enhancement with Phase Reconstruction Based on Phase Distortion Averaging[J]. IEEE/ACM Transactions on Audio, Speech and Language Processing, 2018,26(9):1559-1569.
[8]	王栋, 贾海蓉 . 改进相位谱补偿的语音增强算法[J]. 西安电子科技大学学报, 2017,44(3):83-88.
	WANG Dong, JIA Hairong . Speech Enhancement Using Improved Phase Spectrum Compensation[J]. Journal of Xidian University, 2017,44(3):83-88.
[9]	KANDAGATLA R K, SUBBAIAH PV . Speech Enhancement Using MMSE Estimation of Amplitude and Complex Speech Spectral Coefficients under Phase-uncertainty[J]. Speech Communication, 2018,96:10-27.
[10]	MOWLAEE P, KULMER J . Harmonic Phase Estimation in Single-channel Speech Enhancement Using Phase Decomposition and SNR Information[J]. IEEE/ACM Transactions on Audio, Speech and Language Processing, 2015,23(9):1521-1532.
[11]	KRAWCZYK-BECKER M, GERKMANN T . An Evaluation of the Perceptual Quality of Phase-aware Single-channel Speech Enhancement[J]. Journal of the Acoustical Society of America, 2016,140(4):364-369.
[12]	KRAWCZYK-BECKER M, GERKMANN T . On MMSE-based Estimation of Amplitude and Complex Speech Spectral Coefficients Under Phase-uncertainty[J]. IEEE/ACM Transactions on Audio, Speech and Language Processing, 2016,24(12):2251-2262.
[13]	MOWLAEE P, SAEIDI R, STYLIANOU Y . Advances in Phase-aware Signal Processing in Speech Communication[J]. Speech Communication, 2015,81:2-29.
[14]	GONZALEZ S, BROOKES M . PEFAC - A Pitch Estimation Algorithm Robust to High Levels of Noise[J]. IEEE/ACM Transactions on Audio, Speech and Language Processing, 2014,22(2):518-530.
[15]	GERKMANN T . Bayesian Estimation of Clean Speech Spectral Coefficients Given a Priori Knowledge of the Phase[J]. IEEE Transactions on Signal Processing, 2014,62(16):4199-4208.
[16]	VARY P . Noise Suppression by Spectral Magnitude Estimation-mechanism and Theoretical Limits[J]. Signal Processing, 1985,8(4):387-400.
[17]	PLAPOUS C, MARRO C, MAUUARY L , et al. A Two-step Noise Reduction Technique[C]// Proceedings of the 2004 IEEE International Conference on Acoustics, Speech and Signal Processing. Piscataway: IEEE, 2004: I289-I292.

噪声类型	输入信噪比	文献[4]算法		文中改进算法
噪声类型	输入信噪比	SP01语音	SP15语音	SP01语音	SP15语音
White 噪声	0	7.769 2	4.741 3	8.281 3	6.616 1
	5	10.920 3	5.211 2	11.774 9	8.019 4
	10	12.947 0	10.418 6	15.342 8	14.255 2
	15	14.804 1	15.301 4	18.464 2	18.261 5
Pink 噪声	0	7.729 4	4.147 5	8.067 4	6.391 3
	5	10.500 5	5.959 2	11.632 8	8.759 2
	10	13.321 7	10.019 6	14.764 5	13.794 7
	15	15.063 0	15.876 8	18.127 6	19.398 6
F16 噪声	0	7.569 9	2.299 0	7.626 8	6.016 9
	5	10.493 0	5.409 7	11.176 8	8.209 2
	10	13.042 9	10.680 6	14.535 7	14.444 7
	15	15.004 9	15.540 5	18.235 3	19.992 2

噪声类型	输入信噪比/dB	含噪语音SP01	含噪语音SP15	文献[4]算法		文中改进算法
				SP01	SP15	SP01	SP15
				White 噪声	0	1.456 4	1.306 7	1.907 9	1.758 7	1.941 6	1.792 3
5	1.748 1	1.619 0	2.288 3		2.089 7	2.301 9	2.131 4
10	2.064 9	1.980 5	2.632 5		2.285 3	2.675 8	2.406 4
15	2.407 8	2.371 1	2.753 0		2.654 1	2.871 4	2.903 2
Pink 噪声	0	1.619 9	1.401 9	2.003 8	1.805 3	2.087 3	1.862 1
	5	1.954 6	1.753 7	2.319 8	2.139 3	2.424 0	2.280 4
	10	2.287 4	2.133 1	2.632 0	2.367 3	2.708 8	2.518 2
	15	2.587 4	2.562 9	2.766 4	2.502 0	2.963 9	2.741 8
F16 噪声	0	1.880 3	1.429 9	2.229 9	1.710 1	2.243 3	1.934 4
	5	2.156 2	1.794 3	2.516 8	2.252 8	2.585 5	2.358 9
	10	2.462 1	2.184 3	2.692 1	2.398 4	2.843 1	2.566 7
	15	2.656 9	2.600 2	2.848 6	2.517 4	3.016 6	2.725 0

Speech enhancement based on the modified phase using signal-to-noise ratio information and time-frequency characteristics

RichHTML

PDF (PC)

Like

Knowledge

Abstract

Cite this article

share this article

Figures/Tables 8

References 17

Related Articles 2

Metrics

Comments

Recommended 10

[1]	SHANG Junping;ZUO Yanchun;HU Yonghao;WANG Yuan;SONG Kang. Research on phaseless near-field antenna measurement [J]. Journal of Xidian University, 2017, 44(2): 47-51.
[2]	LIU Ya-bo;ZHANG Lei;XING Meng-dao;BAO Zheng. Coherent single range doppler interferometry based space debris radar image [J]. J4, 2010, 37(4): 660-664+670.