J4 ›› 2012, Vol. 39 ›› Issue (5): 107-112.doi: 10.3969/j.issn.1001-2400.2012.05.019

• 研究论文 • 上一篇    下一篇

利用膜粒子群优化的条件随机域特征选择

豆增发;高琳   

  1. (西安电子科技大学 计算机学院,陕西 西安  710071)
  • 收稿日期:2011-10-19 出版日期:2012-10-20 发布日期:2012-12-13
  • 通讯作者: 豆增发
  • 作者简介:豆增发(1979-),男,高级工程师,西安电子科技大学博士研究生,E-mail: jssdzf@126.com.
  • 基金资助:

    国家自然科学基金重点资助项目(60933009);高等学校博士学科点专项科研基金资助项目(200807010013);国家自然科学基金资助项目(60970065)

Feature selection in conditional random fields using a membrane particle swarm optimizer

DOU Zengfa;GAO Lin   

  1. (School of Computer Science and Technology, Xidian Univ., Xi'an  710071, China)
  • Received:2011-10-19 Online:2012-10-20 Published:2012-12-13
  • Contact: DOU Zengfa

摘要:

提出了一种新的基于膜粒子群优化的特征选择方法.该方法利用了膜系统的分层结构和消息传递机制,将粒子群优化算法作为区域子算法部署到各个区域中.不同于传统粒子群优化算法,该方法将粒子群优化的搜索速率分解为局部搜索速率和全局搜索速率.膜系统的所有外层区域采用局部搜索速率,搜索局部最优解;最内层区域采用全局搜索速率,搜索全局最优解.所有外部区域将最优解传递给相邻内部区域,内部区域将最差解传递给相邻外部区域,最内区域向相邻外部区域传递最差解.当各个区域之间的解传递在一段时间内停止时,或者算法迭代次数达到限定次数时,算法收敛,取最内层区域的最优解为最终解.以条件随机域模型的最大似然估计函数为目标函数,利用膜粒子群优化计算各个特征权重系数,最后剔除那些权重系数小于阈值的特征.实验结果表明,在进行生物文本的基因名称识别时,利用该方法对条件随机域的特征进行选择后,可以消除冗余特征的干扰,能获得更高的准确度.

关键词: 膜系统, 粒子群优化, 生物医学文本, 特征选择, 条件随机域

Abstract:

In order to delete redundant features in conditional random fields to recognize the gene name from literature, a novel particle swarm optimizer based on the membrane system for feature selection is proposed. In this new algorithm, the particle swarm optimizer is assigned to all sub-regions as sub-algorithms using hierarchy and message mechanism of the membrane system. Based on the structure of the membrane system, the original particle swarm optimizer is disassembled into two parts, the local optimizer and the global optimizer. The local optimizer is assigned to the all outer regions to search for the local best solution and the global optimizer is assigned to the innermost region to search for the global best solution. All outer regions send its best solution to its adjacent inner region and send its worst solution to its adjacent outer region, and the innermost region only sends its worst solution to its adjacent region. When the communication between regions stops in a specific duration or iteration reaches limit times, the iteration is stopped and gets the best solution in the innermost region as the output of the algorithm. We use the maximum log likelihood estimation function of conditional random fields as the objective function, calculate weights of all feature functions by the membrane particle swarm optimizer, and delete those feature functions with a smaller weight than a specific value. Experiment results show that selecting feature functions in conditional random fields by the algorithm we proposed to recognize the gene name from literature can reduce interference produced by redundant features and improve the accuracy of conditional random fields.

Key words: membrane system, particle swarm optimizer, biomedical literature, feature selection, conditional random fields

中图分类号: 

  • TP301
Baidu
map