结合语音融合特征和随机森林的构音障碍识别

doi:10.3969/j.issn.1001-2400.2018.03.026

西安电子科技大学学报

结合语音融合特征和随机森林的构音障碍识别

李东;张雪英;段淑斐;闫密密

(太原理工大学信息工程学院，山西太原 030024)

收稿日期:2017-07-26 出版日期:2018-06-20 发布日期:2018-07-18
通讯作者: 张雪英(1964-)，女，教授，博士，E-mail: tyzhangxy@163.com
作者简介:李东(1991-)，男，太原理工大学硕士研究生，E-mail: lidongtyut@163.com
基金资助:
国家自然科学基金资助项目(61371193);山西省应用基础研究青年基金资助项目(201601D202045)

Dysarthria recognition combining speech fusion feature and random forest

LI Dong;ZHANG Xueying;DUAN Shufei;YAN Mimi

(College of Information Engineering, Taiyuan Univ. of Technology, Taiyuan 030024, China)

Received:2017-07-26 Online:2018-06-20 Published:2018-07-18

摘要/Abstract

摘要：

为分析病理人群与正常人群的发音差异性，提出一种结合语音融合特征和随机森林的语音识别方法来进行正常语音与构音障碍语音的分类识别，从而为医学诊断和治疗提供科学和客观的依据．首先，使用多伦多大学开发的病理语音数据库，提取出语音的五种韵律特征以及梅尔频率倒谱系数，再计算其统计特征，构成融合特征，最后结合随机森林算法进行分类识别．结果显示，相比于单一类型特征，提出的融合特征在识别性能上有着显著优化作用，与随机森林分类器结合后，对于男性声音的分类准确率达到99.21％，对于女性声音的分类准确率达到98.97％，综合分类准确率达到98.00％．同时研究还发现，相较于句子，患者对短语的发音更为准确．

关键词: 韵律特征, 梅尔频率倒谱系数, 融合特征, 随机森林, 构音障碍识别

Abstract:

This paper proposes a method for speech recognition combining the speech fusion feature and random forest to classify normal voices and voices with dysarthria. This work aimes at analyzing the differences about pronunciation between pathological people and normal people, and providing doctors with scientific and objective evidence for diagnosis and treatment. First, the proposed method uses pathological voice database developed by Toronto University as the corpus, then extracts five types of prosodic features and Mel Frequency Cepstrum Coefficient(MFCC), and calculats their statistical features, which composes the fusion feature. Finally, the random forest is used as the classifier. The results show that, compared with the single type of feature, the proposed fusion feature significantly optimizes the recognition performance, and after combining with the random forest, the classification accuracy for male reaches 99.21％, the classification accuracy for female reaches 98.97％, and comprehensive classification accuracy reaches 98.00％. Meanwhile, the research finds that the pronunciation of a patient when he/she speak short words is more accurate than when he/she speaks sentences.

Key words: prosodic feature, Mel frequency cepstrum coefficient, fusion feature, random forest, dysarthria recognition

李东;张雪英;段淑斐;闫密密. 结合语音融合特征和随机森林的构音障碍识别[J]. 西安电子科技大学学报, 2018, 45(3): 149-155.

LI Dong;ZHANG Xueying;DUAN Shufei;YAN Mimi. Dysarthria recognition combining speech fusion feature and random forest[J]. Journal of Xidian University, 2018, 45(3): 149-155.

参考文献

［1］ DOYLE P, LEEPER H, KOTLER A L, et al. Dysarthric Speech: a Comparison of Computerized Speech Recognition and Listener Intelligibility［J］. Journal of Rehabilitation Research and Development, 1997, 34(3): 309-316.
［2］刘伟, 陈刚, 迟广明. 脑瘫治疗的现状［J］. 中国康复理论与实践, 2007, 13(12): 1118-1120.
LIU Wei , CHEN Gang , CHI Guangming. Current Treatment of Cerebral Palsy ［J］. Chinese Journal of Rehabilitation Theory and Practice, 2007, 13(12): 1118-1120.
［3］ BAGHAI-RAVARY L, BEET S W. Automatic Speech Signal Analysis for Clinical Diagnosis and Assessment of Speech Disorders［M］. Springerbriefs in Electrical and Computer Engineering. Berlin: Springer, 2013.
［4］ RUDZICZ F, NAMASIVAYAM A K, WOLFF T. The TORGO Database of Acoustic and Articulatory Speech from Speakers with Dysarthria ［J］. Language Resources and Evaluation, 2012, 46(4): 523-541.
［5］ GUPTA R, CHASPARI T, KIM J, et al. Pathological Speech Processing: State-of-the-art, Current Challenges, and Future Directions［C］//Proceedings of the 2016 IEEE International Conference on Acoustics, Speech and Signal Processing. Piscataway: IEEE, 2016: 6470-6474.
［6］ GODINO-LLORENTE J I, GOMEZ-VILDA P, BLANCO-VELASCO M. Dimensionality Reduction of a Pathological Voice Quality Assessment System Based on Gaussian Mixture Models and Short-term Cepstral Parameters［J］. IEEE Transactions on Biomedical Engineering, 2006, 53(10): 1943-1953.
［7］袁亚南, 何凌, 龚晓峰, 等. 基于MFCC和HMM的腭裂语音辅音省略识别算法［J］. 计算机工程与设计, 2014, 35(2): 615-619.
YUAN Ya'nan, HE Ling, GONG Xiaofeng, et al. Recognition Algorithm of Consonants Omission for People with Cleft Palate Based on MFCC and HMM ［J］. Computer Engineering and Design, 2014, 35(2): 615-619.
［8］ AMARA F, FEZARI M, BOUROUBA H. An Improved GMM-SVM System Based on Distance Metric for Voice Pathology Detection［J］. Applied Mathematics and Information Sciences, 2016, 10(3): 1061-1070.
［9］常静雅, 张晓俊, 顾玲玲, 等. 小波域能量谱和非线性降维的病理嗓音识别［J］. 计算机工程与应用, 2017, 53(2): 166-171.
CHANG Jingya, ZHANG Xiaojun, GU Lingling, et al. Wavelet Domain Energy Spectrum and Nonlinear Dimensionality Reduction in Pathological Voice Recognition［J］. Computer Engineering and Applications, 2017, 53(2): 166-171.
［10］ ALI Z, ALSULAIMAN M, ELAMVAZUTHI I, et al. Voice Pathology Detection Based on the Modified Voice Contour and SVM［J］. Biologically Inspired Cognitive Architectures, 2016, 15: 10-18.
［11］姚慧, 孙颖, 张雪英. 情感语音的非线性动力学特征［J］. 西安电子科技大学学报, 2016, 43(5): 167-172.
YAO Hui, SUN Ying, ZHANG Xueying. Research on Nonlinear Dynamics Features of Emotional Speech［J］. Journal of Xidian University, 2016, 43(5): 167-172.
［12］ BREIMAN L. Random Forests［J］. Machine Learning, 2001, 45(1): 5-32.
［13］ WRENCH A. The MOCHA-TIMIT Articulatory Database ［DB/OL］. ［2017-05-06］. http://www. cstr. ed. ac. uk/artic/mocha. html.
［14］王栋，贾海蓉. 改进相位谱补偿的语音增强算法［J］. 西安电子科技大学学报, 2017, 44(3): 83-88.
WANG Dong, JIA Hairong. Speech Enhancement Using Improved Phase Spectrum Compensation［J］. Journal of Xidian University, 2017, 44(3): 83-88.

结合语音融合特征和随机森林的构音障碍识别

Dysarthria recognition combining speech fusion feature and random forest

PDF (PC)

赞

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 1

Metrics

本文评价

推荐阅读 10