结合语音融合特征和随机森林的构音障碍识别

doi:10.3969/j.issn.1001-2400.2018.03.026

Abstract

Abstract:

This paper proposes a method for speech recognition combining the speech fusion feature and random forest to classify normal voices and voices with dysarthria. This work aimes at analyzing the differences about pronunciation between pathological people and normal people, and providing doctors with scientific and objective evidence for diagnosis and treatment. First, the proposed method uses pathological voice database developed by Toronto University as the corpus, then extracts five types of prosodic features and Mel Frequency Cepstrum Coefficient(MFCC), and calculats their statistical features, which composes the fusion feature. Finally, the random forest is used as the classifier. The results show that, compared with the single type of feature, the proposed fusion feature significantly optimizes the recognition performance, and after combining with the random forest, the classification accuracy for male reaches 99.21％, the classification accuracy for female reaches 98.97％, and comprehensive classification accuracy reaches 98.00％. Meanwhile, the research finds that the pronunciation of a patient when he/she speak short words is more accurate than when he/she speaks sentences.

Key words: prosodic feature, Mel frequency cepstrum coefficient, fusion feature, random forest, dysarthria recognition

LI Dong;ZHANG Xueying;DUAN Shufei;YAN Mimi. Dysarthria recognition combining speech fusion feature and random forest[J].Journal of Xidian University, 2018, 45(3): 149-155.

References

［1］ DOYLE P, LEEPER H, KOTLER A L, et al. Dysarthric Speech: a Comparison of Computerized Speech Recognition and Listener Intelligibility［J］. Journal of Rehabilitation Research and Development, 1997, 34(3): 309-316.
［2］刘伟, 陈刚, 迟广明. 脑瘫治疗的现状［J］. 中国康复理论与实践, 2007, 13(12): 1118-1120.
LIU Wei , CHEN Gang , CHI Guangming. Current Treatment of Cerebral Palsy ［J］. Chinese Journal of Rehabilitation Theory and Practice, 2007, 13(12): 1118-1120.
［3］ BAGHAI-RAVARY L, BEET S W. Automatic Speech Signal Analysis for Clinical Diagnosis and Assessment of Speech Disorders［M］. Springerbriefs in Electrical and Computer Engineering. Berlin: Springer, 2013.
［4］ RUDZICZ F, NAMASIVAYAM A K, WOLFF T. The TORGO Database of Acoustic and Articulatory Speech from Speakers with Dysarthria ［J］. Language Resources and Evaluation, 2012, 46(4): 523-541.
［5］ GUPTA R, CHASPARI T, KIM J, et al. Pathological Speech Processing: State-of-the-art, Current Challenges, and Future Directions［C］//Proceedings of the 2016 IEEE International Conference on Acoustics, Speech and Signal Processing. Piscataway: IEEE, 2016: 6470-6474.
［6］ GODINO-LLORENTE J I, GOMEZ-VILDA P, BLANCO-VELASCO M. Dimensionality Reduction of a Pathological Voice Quality Assessment System Based on Gaussian Mixture Models and Short-term Cepstral Parameters［J］. IEEE Transactions on Biomedical Engineering, 2006, 53(10): 1943-1953.
［7］袁亚南, 何凌, 龚晓峰, 等. 基于MFCC和HMM的腭裂语音辅音省略识别算法［J］. 计算机工程与设计, 2014, 35(2): 615-619.
YUAN Ya'nan, HE Ling, GONG Xiaofeng, et al. Recognition Algorithm of Consonants Omission for People with Cleft Palate Based on MFCC and HMM ［J］. Computer Engineering and Design, 2014, 35(2): 615-619.
［8］ AMARA F, FEZARI M, BOUROUBA H. An Improved GMM-SVM System Based on Distance Metric for Voice Pathology Detection［J］. Applied Mathematics and Information Sciences, 2016, 10(3): 1061-1070.
［9］常静雅, 张晓俊, 顾玲玲, 等. 小波域能量谱和非线性降维的病理嗓音识别［J］. 计算机工程与应用, 2017, 53(2): 166-171.
CHANG Jingya, ZHANG Xiaojun, GU Lingling, et al. Wavelet Domain Energy Spectrum and Nonlinear Dimensionality Reduction in Pathological Voice Recognition［J］. Computer Engineering and Applications, 2017, 53(2): 166-171.
［10］ ALI Z, ALSULAIMAN M, ELAMVAZUTHI I, et al. Voice Pathology Detection Based on the Modified Voice Contour and SVM［J］. Biologically Inspired Cognitive Architectures, 2016, 15: 10-18.
［11］姚慧, 孙颖, 张雪英. 情感语音的非线性动力学特征［J］. 西安电子科技大学学报, 2016, 43(5): 167-172.
YAO Hui, SUN Ying, ZHANG Xueying. Research on Nonlinear Dynamics Features of Emotional Speech［J］. Journal of Xidian University, 2016, 43(5): 167-172.
［12］ BREIMAN L. Random Forests［J］. Machine Learning, 2001, 45(1): 5-32.
［13］ WRENCH A. The MOCHA-TIMIT Articulatory Database ［DB/OL］. ［2017-05-06］. http://www. cstr. ed. ac. uk/artic/mocha. html.
［14］王栋，贾海蓉. 改进相位谱补偿的语音增强算法［J］. 西安电子科技大学学报, 2017, 44(3): 83-88.
WANG Dong, JIA Hairong. Speech Enhancement Using Improved Phase Spectrum Compensation［J］. Journal of Xidian University, 2017, 44(3): 83-88.

[1]	ZHANG Zhi,ZHENG Jin. Interframe target regression network for vehicle detection in UAV video [J]. Journal of Xidian University, 2021, 48(4): 151-158.
[2]	DENG Yanzi;LU Zhaoyang;LI Jing. Segmentation of the image with multi-visual features for a traffic scene [J]. J4, 2015, 42(6): 11-16.

Dysarthria recognition combining speech fusion feature and random forest

PDF (PC)

Like

Knowledge

Abstract

Cite this article

share this article

References

Related Articles 2

Metrics

Comments

Recommended 10