西安电子科技大学学报

• 研究论文 • 上一篇    下一篇

结合语音融合特征和随机森林的构音障碍识别

李东;张雪英;段淑斐;闫密密   

  1. (太原理工大学 信息工程学院,山西 太原 030024)
  • 收稿日期:2017-07-26 出版日期:2018-06-20 发布日期:2018-07-18
  • 通讯作者: 张雪英(1964-),女,教授,博士,E-mail: tyzhangxy@163.com
  • 作者简介:李东(1991-),男,太原理工大学硕士研究生,E-mail: lidongtyut@163.com
  • 基金资助:

    国家自然科学基金资助项目(61371193);山西省应用基础研究青年基金资助项目(201601D202045)

Dysarthria recognition combining speech fusion feature and random forest

LI Dong;ZHANG Xueying;DUAN Shufei;YAN Mimi   

  1. (College of Information Engineering, Taiyuan Univ. of Technology, Taiyuan 030024, China)
  • Received:2017-07-26 Online:2018-06-20 Published:2018-07-18

摘要:

为分析病理人群与正常人群的发音差异性,提出一种结合语音融合特征和随机森林的语音识别方法来进行正常语音与构音障碍语音的分类识别,从而为医学诊断和治疗提供科学和客观的依据.首先,使用多伦多大学开发的病理语音数据库,提取出语音的五种韵律特征以及梅尔频率倒谱系数,再计算其统计特征,构成融合特征,最后结合随机森林算法进行分类识别.结果显示,相比于单一类型特征,提出的融合特征在识别性能上有着显著优化作用,与随机森林分类器结合后,对于男性声音的分类准确率达到99.21%,对于女性声音的分类准确率达到98.97%,综合分类准确率达到98.00%.同时研究还发现,相较于句子,患者对短语的发音更为准确.

关键词: 韵律特征, 梅尔频率倒谱系数, 融合特征, 随机森林, 构音障碍识别

Abstract:

This paper proposes a method for speech recognition combining the speech fusion feature and random forest to classify normal voices and voices with dysarthria. This work aimes at analyzing the differences about pronunciation between pathological people and normal people, and providing doctors with scientific and objective evidence for diagnosis and treatment. First, the proposed method uses pathological voice database developed by Toronto University as the corpus, then extracts five types of prosodic features and Mel Frequency Cepstrum Coefficient(MFCC), and calculats their statistical features, which composes the fusion feature. Finally, the random forest is used as the classifier. The results show that, compared with the single type of feature, the proposed fusion feature significantly optimizes the recognition performance, and after combining with the random forest, the classification accuracy for male reaches 99.21%, the classification accuracy for female reaches 98.97%, and comprehensive classification accuracy reaches 98.00%. Meanwhile, the research finds that the pronunciation of a patient when he/she speak short words is more accurate than when he/she speaks sentences.

Key words: prosodic feature, Mel frequency cepstrum coefficient, fusion feature, random forest, dysarthria recognition

Baidu
map