融合非线性幂函数和谱减法的CFCC特征提取

doi:10.19665/j.issn1001-2400.2019.01.014

Abstract

Abstract:

This paper presents an improved speech feature extraction algorithm for improving the accuracy of speech recognition in noisy environment. A New Cochlear Filter Cepstral Coefficient(NCFCC) is extracted by the power-law nonlinear function which can simulate the auditory characteristics of the human ear. Then, the spectral subtraction is introduced in the feature extraction front end to enhance the signal, and the new feature and the first order difference are composed of a mixed feature parameter, after which the combined principal component analysis is made to reduce the dimension of the hybrid feature. The final feature is used in a non-specific persons, isolated words, and small-vocabulary speech recognition system. Experimental results show that, compared with the traditional Cochlear Filter Cepstral Coefficients(CFCC) feature, the Cochlear Filter Cepstral Coefficients extracted from the power-law nonlinear function significantly improve the accuracy of speech recognition. The mixed feature parameter can achieve a better speech recognition performance than a single feature. Combined with the feature set of the principal component analysis(PCA) ,the recognition accuracy can reach up to 88.10% when the signal to noise ratio(SNR) is 0 dB.

Key words: speech recognition, power-law nonlinearity function, cochlear filter cepstral coefficients, spectral subtraction

CLC Number:

TN912.34

BAI Jing,SHI Yanyan,XUE Peiyun,GUO Qianyan. CFCC feature extraction for fusion of the power-law nonlinearity function and spectral subtraction[J].Journal of Xidian University, 2019, 46(1): 86-92.

Figures/Tables 6

References 14

[1]	LI Q, HUANG Y. Robust Speaker Identification Using an Auditory-based Feature [C]//Proceedings of the 2010 IEEE International Conference on Acoustics, Speech, and Signal Processing. Piscataway: IEEE, 2010: 4514-4517.
[2]	LI Q, HUANG Y . An Auditory-based Feature Extraction Algorithm for Robust Speaker Identification under Mismatched Conditions[J]. IEEE Transactions on Audio, Speech, and Language Processing, 2011,19(6):1791-1801. doi: 10.1109/TASL.2010.2101594
[3]	PATEL T B, GANDHINAGAR G, PATIL H. Combining Evidences from Mel Cepstral, Cochlear Filter Cepstral and Instantaneous Frequency Features for Detection of Natural vs. Spoofed Speech [C]//Proceedings of the 2015 Annual Conference of the International Speech Communication. Baixas: International Speech Communication Association, 2015: 2062-2066.
[4]	PATEL T B, PATIL H A . Cochlear Filter and Instantaneous Frequency Based Features for Spoofed Speech Detection[J]. IEEE Journal of Selected Topics in Signal Processing, 2017,11(4):618-631. doi: 10.1109/JSTSP.2016.2647201
[5]	佀文娟, 程艳玲, 杨丹丹 , 等. 小鼠外侧丘系背核神经元对纯音的反应特性[J]. 生理学报, 2016 ( 1):1-11.
	SI Wenjuan, CHENG Yanling, YANG Dandan , et al. Response Characteristics of Neurons to Tone in Dorsal Nucleus of the Lateral Lemniscus of the Mouse[J]. Acta Physiologica Sinica, 2016 ( 1):1-11.
[6]	岳倩倩, 周萍, 景新幸 . 基于非线性幂函数的听觉特征提取算法研究[J]. 微电子学与计算机, 2015,32(6):163-166.
	YUE Qianqian, ZHOU Ping, JING Xinxing . The Auditory Feature Extraction Algorithm Based on Power-law Nonlinearity Function[J]. Microelectronics and Computer, 2015,32(6):163-166.
[7]	ALONSO J B, CABRERA J, SHYAMNANI R , et al. Automatic Anuran Identification Using Noise Removal and Audio Activity Detection[J]. Expert Systems with Applications, 2017,72:83-92. doi: 10.1016/j.eswa.2016.12.019
[8]	吴迪, 陶智, 张晓俊 , 等. 感知听觉场景分析的说话人识别[J]. 声学学报, 2016,41(2):260-272.
	WU Di, TAO Zhi, ZHANG Xiaojun , et al. Perception Auditory Scene Analysis for Speaker Recognition[J]. Acta Acustica, 2016,41(2):260-272.
[9]	李大卫, 杨日杰, 韩建辉 . 舰船噪声环境下改进语音信号增强算法[J]. 西安电子科技大学学报, 2016,43(5):133-138. doi: 10.3969/j.issn.1001-2400.2016.05.024
	LI Dawei, YANG Rijie, HAN Jianhui . Study of Speech Enhancement in the Background of Ship-radiated Noise[J]. Journal of Xidian University, 2016,43(5):133-138. doi: 10.3969/j.issn.1001-2400.2016.05.024
[10]	BOULMAIZ A, MESSADEG D, DOGHMANE N , et al. Design and Implementation of a Robust Acoustic Recognition System for Water Bird Species Using TMS320C6713 DSK[J]. International Journal of Ambient Computing and Intelligence, 2017,8(1):98-118. doi: 10.4018/IJACI.2017010105
[11]	杨威, 刘宏清, 黎勇 , 等. 冲击噪声下的LMS和RLS联合滤波算法[J]. 西安电子科技大学学报, 2017,44(2):165-170. doi: 10.3969/j.issn.1001-2400.2017.02.028
	YANG Wei, LIU Hongqing, LI Yong , et al. Joint Estimation Algorithms Based on LMS and RLS in the Presence of Impulsive Noise[J]. Journal of Xidian University, 2017,44(2):165-170. doi: 10.3969/j.issn.1001-2400.2017.02.028
[12]	BEHNAM M, POURGHASSEM H . Real-time Seizure Prediction Using RLS Filtering and Interpolated Histogram Feature Based on Hybrid Optimization Algorithm of Bayesian Classifier and Hunting Search[J]. Computer Methods and Programs in Biomedicine, 2016,132:115-136. doi: 10.1016/j.cmpb.2016.04.014 pmid: 27282233
[13]	吴迪, 曹洁, 王进花 . 基于自适应高斯混合模型与静动态听觉特征融合的说话人识别[J]. 光学精密工程, 2013,21(6):1598-1604. doi: 10.3788/OPE.20132106.1598
	WU Di, CAO Jie, WANG Jinhua . Speaker Recognition Based on Adapted Gaussian Mixture Model and Static and Dynamic Auditory Feature Fusion[J]. Optics and Precision Engineering, 2013,21(6):1598-1604. doi: 10.3788/OPE.20132106.1598
[14]	兰巍, 贾素玲, 宋世民 , 等. 基于随机森林的航天器电信号多分类识别方法[J]. 北京航空航天大学学报, 2017,43(9):1773-1778. doi: 10.13700/j.bh.1001-5965.2016.0661
	LAN Wei, JIA Suling, SONG Shimin , et al. Multi-classification Spacecraft Electrical Signal Identification Method Based on Random Forest[J]. Journal of Beijing University of Aeronautics and Astronautics, 2017,43(9):1773-1778. doi: 10.13700/j.bh.1001-5965.2016.0661

词汇量	模拟人耳听觉特性函数	SNR/dB					平均识别率/%
词汇量	模拟人耳听觉特性函数	0	5	10	15	20	平均识别率/%
10词	立方根	64.76	63.81	64.76	67.62	68.10	65.81
	对数	73.14	79.84	82.81	87.14	86.19	81.82
	非线性幂函数	75.24	80.48	83.33	88.10	88.57	83.14
20词	立方根	59.43	59.61	61.69	61.64	62.24	60.92
	对数	66.10	73.42	78.07	80.45	83.99	76.41
	非线性幂函数	67.23	73.83	79.96	81.10	84.38	77.30

实验	特征参数	信噪比/dB					平均识别率/%
实验	特征参数	0	5	10	15	20	平均识别率/%
实验1	NCFCC	75.24	80.48	83.33	88.10	88.57	83.14
实验2	FFPRLS	78.57	82.86	86.94	88.57	89.52	85.29
实验3	FFPSS	79.48	85.71	88.10	89.18	89.93	86.48
实验4	FFPSS+Δ	80.95	86.19	90.48	90.95	90.00	87.71
实验5	FFPSS+Δ+PCA	88.10	89.52	90.48	92.38	92.38	90.57

实验	特征参数	信噪比/dB					平均识别率/%
实验	特征参数	0	5	10	15	20	平均识别率/%
实验1	NCFCC	67.23	73.83	79.96	81.10	84.38	77.30
实验2	FFPRLS	68.83	77.18	82.18	82.87	84.73	79.16
实验3	FFPSS	74.26	79.70	82.52	83.96	84.76	81.04
实验4	FFPSS+Δ	74.86	80.30	82.71	86.03	87.34	82.25
实验5	FFPSS+Δ+PCA	78.81	81.67	87.38	88.81	91.67	85.67

CFCC feature extraction for fusion of the power-law nonlinearity function and spectral subtraction

RichHTML

PDF (PC)

Like

Knowledge

Abstract

Cite this article

share this article

Figures/Tables 6

References 14

Related Articles 4

Metrics

Comments

Recommended 10

[1]	ZHANG Shudong,GAO Haichang,CAO Xiwen,KANG Shuai. Adaptive fast and targeted adversarial attack for speech recognition [J]. Journal of Xidian University, 2021, 48(1): 168-175.
[2]	LI Dawei;YANG Rijie;HAN Jianhui. Study of speech enhancement in the background of ship-radiated noise [J]. Journal of Xidian University, 2016, 43(5): 133-138.
[3]	YAO Hui;SUN Ying;ZHANG Xueying. Research on nonlinear dynamics features of emotional speech [J]. Journal of Xidian University, 2016, 43(5): 167-172.
[4]	CHEN Bin;ZHANG Lianhai;QU Dan;LI Bicheng. Regularized discriminative segmental feature transform method [J]. Journal of Xidian University, 2016, 43(2): 102-107.