正则化分段区分性特征变换方法

doi:10.3969/j.issn.1001-2400.2016.02.018

西安电子科技大学学报 ›› 2016, Vol. 43 ›› Issue (2): 102-107.doi: 10.3969/j.issn.1001-2400.2016.02.018

正则化分段区分性特征变换方法

陈斌;张连海;屈丹;李弼程

(解放军信息工程大学信息系统工程学院，河南郑州 450001)

收稿日期:2014-12-04 出版日期:2016-04-20 发布日期:2016-05-27
通讯作者: 陈斌
作者简介:陈斌(1987-)，男，解放军信息工程大学博士研究生，E-mail: chenbin873335@163.com．
基金资助:
国家自然科学基金资助项目(61175017，61403415)；国家863计划资助项目(2012AA011603)

Regularized discriminative segmental feature transform method

CHEN Bin;ZHANG Lianhai;QU Dan;LI Bicheng

(Institute of Information System Engineering, PLA Information Engineering Univ., Zhengzhou 450001, China)

Received:2014-12-04 Online:2016-04-20 Published:2016-05-27
Contact: CHEN Bin

摘要/Abstract

摘要：

针对基于分帧特征变换稳定性不够的问题，提出了一种分段的区分性特征变换方法，并采用正则化方法确定出每一语音段的特征变换矩阵．该方法将特征变换视为数据受限条件下的参数选择问题，在训练阶段，采用状态绑定的方式训练得到区域相关线性变换特征变换矩阵，将所有的变换矩阵构成一个过完备字典；在测试阶段，采用强制对齐的方式对语音进行分段，在似然度目标函数中加入正则项，利用快速迭代收敛阈值算法进行求解，在求解过程中从字典里确定出最佳的特征变换矩阵子集及其组合系数．实验结果表明，结合L₁和L₂正则化，相比于状态绑定的区域相关线性变换方法，当声学模型采用最大似然准则训练时，识别率可以提高1.30％；模型区分性训练后，识别性能提升了1.66％．

关键词: 特征变换, 语音识别, 域划分, 正则化, 区分性训练

Abstract:

In order to improve the stability of the frame based feature transform method, a segment based discriminative feature transform method is proposed, and the feature transform matrix of each speech segment is determined using the regularization technique. In the novel method, the feature transform is viewed as a parameter selection problem with limited data. In the training stage, an over-complete dictionary is constructed by the feature transform matrices of tied-state based region dependent linear transform. During testing, after the speech signal is segmented through force alignment, an appropriate regularization term is added to the likelihood objective function. An optimal subset of the transform matrices is selected from the dictionary and their corresponding coefficients are estimated following the fast iterative shrinkage thresholding optimization algorithm. Experimental results show that compared with the tied-state RDLT method, after combining L₁ and L₂ regularization, the recognition rate is increased by 1.30％ using the maximum likelihood training criterion. The performance gain is increased to 1.66％ after discriminative training.

Key words: feature transform, speech recognition, region dependent, regularization, discriminative training

中图分类号:

TN912.3

陈斌;张连海;屈丹;李弼程. 正则化分段区分性特征变换方法[J]. 西安电子科技大学学报, 2016, 43(2): 102-107.

CHEN Bin;ZHANG Lianhai;QU Dan;LI Bicheng. Regularized discriminative segmental feature transform method[J]. Journal of Xidian University, 2016, 43(2): 102-107.

参考文献

［1］ NASERSHARIF B, AKBARI A. SNR-dependent Compression of Enhanced Mel Subband Energies for Compensation of Noise Effects on MFCC Features ［J］. Pattern Recognition Letters, 2011, 28(11): 1320-1326.
［2］ POVEY D, KINGSBURY B, MANGU L, et al. fMPE: Discriminatively Trained Features for Speech Recognition［C］//Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing. Piscataway: IEEE, 2005: 961-964.
［3］ ZHANG B, MATSOUKAS S, SCHWARTZ R. Recent Progress on the Discriminative Region-dependent Transform for Speech Feature Extraction［C］//Proceedings of the 9th International Conference on Spoken Language Processing. Baixas: ISCA, 2006: 1495-1498.
［4］ FUKUDA T, ICHIKAWA O, NISHIMURA M, et al. Regularized Feature-space Discriminative Adaptation for Robust ASR［C］//Proceedings of the Annual Conference of the International Speech Communication Association. Baixas: ISCA, 2014: 2185-2188.
［5］ POVEY D. Improvements to fMPE for Discriminative Training of Features［C］//Proceedings of the Annual Conference of the International Speech Communication Association. Baixas: ISCA, 2005: 2977-2980.
［6］ KARAFIAT M, JANDA M, CERNOCKY J, et al. Region Dependent Linear Transforms in Multilingual Speech Recognition［C］//Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing. Piscataway: IEEE, 2012: 4885-4888.
［7］ DENG L, CHEN J S. Sequence Classification Using the High-Level Features Extracted from Deep Neural Networks ［C］//Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing. Piscataway: IEEE, 2014: 6844-6898.
［8］ LIU D Y, WEI S, GUO W, et al. Lattice Based Optimization of Bottleneck Feature Extractor with Linear Transformation［C］//Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing. Piscataway: IEEE, 2014: 5617-5621.
［9］ YAN Z J, HUO Q, XU J, et al. Tied-state Based Discriminative Training of Context-expanded Region-dependent Feature Transforms for LVCSR［C］//Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing. Piscataway: IEEE, 2013:6940-6944.
［10］ ZIBULEVSKY M. L₁-L₂ Optimization in Signal and Image Processing ［J］. IEEE Signal Processing Magazine, 2010, 27(3): 76-88.
［11］ EMRE Y, JORT F G, HUGO V H. Noise Robust Exemplar Matching Using Sparse Representations of Speech ［J］. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2014, 22(8): 1306-1319.
［12］ ZHANG W B, FUNG P. Discriminatively Trained Sparse Inverse Covariance Matrices for Speech Recognition ［J］. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2014, 22(5): 873-882.
［13］ LU L, GHOSHAL A, RENALS S. Regularized Subspace Gaussian Mixture Models for Speech Recognition ［J］. IEEE Signal Processing Letters, 2011, 18(7): 419-422.
［14］ YU Z, EKAPOL C, JAMES G. Extracting Deep Neural Network Bottleneck Features Using Low-rank Matrix Factorization［C］//Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing. Piscataway: IEEE, 2014: 185-189.
［15］ BECK A, TEBOULLE M. A Fast Iterative Shrinkage-thresholding Algorithm for Linear Inverse Problems ［J］. SIAM Journal on Imaging Sciences, 2009, 2(1): 183-202.

[1]	乔鱼,冯象初. 博弈方法下的图像去噪与边界提取[J]. 西安电子科技大学学报, 2021, 48(4): 128-135.
[2]	张树栋,高海昌,曹曦文,康帅. 针对ASR系统的快速有目标自适应对抗攻击[J]. 西安电子科技大学学报, 2021, 48(1): 168-175.
[3]	白静,史燕燕,薛珮芸,郭倩岩. 融合非线性幂函数和谱减法的CFCC特征提取[J]. 西安电子科技大学学报, 2019, 46(1): 86-92.
[4]	房嘉奇;冯大政;李进. TDOA中的修正牛顿及泰勒级数方法[J]. 西安电子科技大学学报, 2016, 43(6): 27-33.
[5]	姚慧;孙颖;张雪英. 情感语音的非线性动力学特征[J]. 西安电子科技大学学报, 2016, 43(5): 167-172.
[6]	罗晓梅;索志勇;刘且根. 采用自适应字典学习的InSAR降噪方法[J]. J4, 2016, 43(1): 18-23.
[7]	李晓辉;吴雅颖;黑永强 . 大规模MIMO系统用户数优化算法[J]. J4, 2015, 42(2): 1-6+101.
[8]	程璟星;侯榆青;董芳;贺小伟;余景景. 稀疏正则和自适应有限元的荧光分子断层成像 [J]. J4, 2015, 42(2): 174-179.
[9]	李维;易黄建;张岐坦;梁继民. 简化球谐近似与拉普拉斯正则化的FMT成像方法[J]. J4, 2015, 42(1): 82-85.
[10]	闫云斌;全厚德;崔佩璋. 一种新的跳频信号重构方法[J]. J4, 2013, 40(5): 163-168.
[11]	徐刚;陈倩倩;侯育星;李亚超;邢孟道. 前视扫描SAR超分辨成像[J]. J4, 2012, 39(5): 79-84+95.
[12]	何周灿;王庆;杨恒. 图像特征匹配中一种快速关键维过滤搜索算法[J]. J4, 2010, 37(3): 534-540.
[13]	崔艳鹏;胡建伟;杨绍全;朱燕. 利用尺度不变量特征的ISAR二维像自动识别技术[J]. J4, 2009, 36(4): 725-729.
[14]	田斌;田红心;易克初. 语音识别中的加性噪声补偿研究[J]. J4, 2001, 28(3): 292-296.
[15]	田斌;田红心;易克初. 一种改进的汉语N元文法统计语言模型[J]. J4, 2000, 27(1): 62-65.

正则化分段区分性特征变换方法

Regularized discriminative segmental feature transform method

PDF (PC)

赞

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

Metrics

本文评价

推荐阅读 10