J4 ›› 2013, Vol. 40 ›› Issue (6): 116-124.doi: 10.3969/j.issn.1001-2400.2013.06.021

• 研究论文 • 上一篇    下一篇

AdaBoost恶意程序行为检测新算法

曹莹;刘家辰;苗启广;高琳   

  1. (西安电子科技大学 计算机学院,陕西 西安  710071)
  • 收稿日期:2012-08-08 出版日期:2013-12-20 发布日期:2014-01-10
  • 通讯作者: 曹莹
  • 作者简介:曹莹(1987-),女,西安电子科技大学博士研究生,E-mail: yingcao@stu.xidian.edu.cn.
  • 基金资助:

    国家自然科学基金资助项目(61072109,61272280,41271447,61272195);教育部新世纪优秀人才支持计划资助项目(NCET-12-0919);中央高校基本科研业务费专项资金资助项目(K5051203020,K5051203001,K5051303018)

Improved behavior-based malware detection algorithm  with AdaBoost

CAO Ying;LIU Jiachen;MIAO Qiguang;GAO Lin   

  1. (School of Computer Science and Technology, Xidian Univ., Xi'an  710071, China)
  • Received:2012-08-08 Online:2013-12-20 Published:2014-01-10
  • Contact: CAO Ying

摘要:

提出了一种新的程序行为抽象方法,将程序执行时发起的API调用、网络数据包信息以及静态分析给出的文件结构特征作为数据源,对API序列进行低聚合度依赖关系分析,将网络数据包信息及静态分析结果转化为离散值特征,共同嵌入到高维特征空间中.在此基础上,采用决策树作为子分类器,针对AdaBoost.M1算法容易过度拟合噪声数据的问题,设计出一种基于改进AdaBoost.M1算法的恶意程序行为检测算法.该算法采用一种新的损失函数,降低了噪声数据进入训练下一个子分类器的训练样本集的概率,提高了算法的抗噪声能力; 同时,为每个子分类器生成一个投票向量,而不是单一的投票权值,以区分子分类器对不同类别样本分类的能力.

关键词: 恶意程序, 行为抽象, 分类, 决策树, AdaBoost, 损失函数

Abstract:

We present a new algorithm for abstracting features of a program from its API calls, network packages and static analysis characteristics. API calls are aggregated by a low level data dependence analysis to form the abstract behaviors.Network packages and static analysis characteristics are directly utilized as discrete value features.All of these abstract features are then embedded in a high dimension vector space. Besides, we further design a new behavior-based malware classification algorithm, which advances the AdaBoost boosted decision tree algorithm. Firstly, the new algorithm optimizes an anti-noise loss function to lower the probability of the noise data to train the next classifier, and thus improves the anti-noise ability of the AdaBoost algorithm. Secondly, to improve the algorithm's performance in multi-class classif bication problem, a vote vector is adopted to combine base classifiers, which discriminates the accuracy with which a classifier classifies samples from different classes.

Key words: malware, behavior abstraction, classification, decision tree, AdaBoost, loss function

中图分类号: 

  • TP339
Baidu
map