J4 ›› 2015, Vol. 42 ›› Issue (2): 58-64+121.doi: 10.3969/j.issn.1001-2400.2015.02.010

• 研究论文 • 上一篇    下一篇

使用聚类稳定性分析方法增强单类学习算法

刘家辰;苗启广;宋建锋;曹莹   

  1. (西安电子科技大学 计算机学院,陕西 西安 710071)
  • 收稿日期:2013-11-20 修回日期:2013-12-19 出版日期:2015-04-20 发布日期:2015-04-14
  • 通讯作者: 刘家辰
  • 作者简介:刘家辰(1988-),男,西安电子科技大学博士研究生,E-mail:jcliu@stu.xidian.edu.cn.
  • 基金资助:
    国家自然科学基金资助项目(61472302,61272280,41271447,61272195);教育部新世纪优秀人才支持计划资助项目(NCET-12-0919);中央高校基本科研业务费专项资金资助项目(K5051203020,K5051303016,K5051303018,BDY081422,K50513100006);陕西省自然科学基金资助项目(2014JM8310);西安市科技局资助项目(CXY1341(6),CXY1440(1))

Enhanced one-class learning based on clustering stability analysis

LIU Jiachen;MIAO Qiguang;SONG Jianfeng;CAO Ying   

  1. (School of Computer Science and Technology, Xidian Univ., Xi'an 710071, China)
  • Received:2013-11-20 Revised:2013-12-19 Online:2015-04-20 Published:2015-04-14
  • Contact: LIU Jiachen

摘要: 针对传统单类学习模型对多模态或多密度分布数据描述能力不足的问题,将集成聚类和聚类稳定性分析引入单类学习.首先将确定聚类簇个数与确定聚类簇分布统一到同一个增强单类学习框架中,之后各聚类簇互为正负类分别建起立多个单类分类模型,最后采用最大融合体积方法融合其决策边界.以经典的支持向量数据描述(SVDD)为例,设计了基于集成聚类的稳定支持向量数据描述算法——ECS-SVDD.在标准UCI数据集和一个真实恶意程序行为数据集上的实验结果表明,ECS-SVDD的性能较单个支持向量数据描述及同类单类学习方法更优.该方法可直接推广到其他最小包含体积集合类型的单类学习算法上,以增强单类学习算法处理多模态和多密度分布数据的能力.

关键词: 单类学习, 离群点分析, 聚类分析, 聚类稳定性, 支持向量数据描述

Abstract: Conventional one-class learning models perform poorly when data are multi-modal or multi-density. To address this problem, ensemble clustering and clustering stability analysis for one class learning are introduced. Firstly, identifying the number of clusters and their distributions are unified in one enhancing framework. Then multiple one-class learning models are constructed to describe clusters of the target class. Lastly these one-class learning models are fused following the maximum fusion volume method. Using classic support vector data description (SVDD) as an instance of one-class learning algorithm, an ensemble cluster based stable SVDD, ECS-SVDD, is proposed. Experimental results on UCI benchmark datasets and a real-world malware detection dataset show that the ECS-SVDD outperforms the single SVDD and some other related one-class learning algorithms. Besides, the method proposed can also enhance the abilities of handling multi-modal and multi-density data of other one-class learning algorithms that follow the volume set minimizing scheme.

Key words: one-class learning, outlier analysis, cluster analysis, cluster stability, support vector data description

中图分类号: 

  • TP181
Baidu
map