AdaBoost恶意程序行为检测新算法

doi:10.3969/j.issn.1001-2400.2013.06.021

J4 ›› 2013, Vol. 40 ›› Issue (6): 116-124.doi: 10.3969/j.issn.1001-2400.2013.06.021

AdaBoost恶意程序行为检测新算法

曹莹;刘家辰;苗启广;高琳

(西安电子科技大学计算机学院，陕西西安 710071)

收稿日期:2012-08-08 出版日期:2013-12-20 发布日期:2014-01-10
通讯作者: 曹莹
作者简介:曹莹(1987-)，女，西安电子科技大学博士研究生，E-mail: yingcao@stu.xidian.edu.cn．
基金资助:
国家自然科学基金资助项目(61072109，61272280，41271447，61272195)；教育部新世纪优秀人才支持计划资助项目(NCET-12-0919)；中央高校基本科研业务费专项资金资助项目(K5051203020，K5051203001，K5051303018)

Improved behavior-based malware detection algorithm with AdaBoost

CAO Ying;LIU Jiachen;MIAO Qiguang;GAO Lin

(School of Computer Science and Technology, Xidian Univ., Xi'an 710071， China)

Received:2012-08-08 Online:2013-12-20 Published:2014-01-10
Contact: CAO Ying

摘要/Abstract

摘要：

提出了一种新的程序行为抽象方法，将程序执行时发起的API调用、网络数据包信息以及静态分析给出的文件结构特征作为数据源，对API序列进行低聚合度依赖关系分析，将网络数据包信息及静态分析结果转化为离散值特征，共同嵌入到高维特征空间中．在此基础上，采用决策树作为子分类器，针对AdaBoost.M1算法容易过度拟合噪声数据的问题，设计出一种基于改进AdaBoost.M1算法的恶意程序行为检测算法．该算法采用一种新的损失函数，降低了噪声数据进入训练下一个子分类器的训练样本集的概率，提高了算法的抗噪声能力; 同时，为每个子分类器生成一个投票向量，而不是单一的投票权值，以区分子分类器对不同类别样本分类的能力．

关键词: 恶意程序, 行为抽象, 分类, 决策树, AdaBoost, 损失函数

Abstract:

We present a new algorithm for abstracting features of a program from its API calls, network packages and static analysis characteristics. API calls are aggregated by a low level data dependence analysis to form the abstract behaviors．Network packages and static analysis characteristics are directly utilized as discrete value features．All of these abstract features are then embedded in a high dimension vector space. Besides, we further design a new behavior-based malware classification algorithm, which advances the AdaBoost boosted decision tree algorithm. Firstly, the new algorithm optimizes an anti-noise loss function to lower the probability of the noise data to train the next classifier, and thus improves the anti-noise ability of the AdaBoost algorithm. Secondly, to improve the algorithm's performance in multi-class classif bication problem, a vote vector is adopted to combine base classifiers, which discriminates the accuracy with which a classifier classifies samples from different classes.

Key words: malware, behavior abstraction, classification, decision tree, AdaBoost, loss function

中图分类号:

TP339

曹莹;刘家辰;苗启广;高琳. AdaBoost恶意程序行为检测新算法[J]. J4, 2013, 40(6): 116-124.

CAO Ying;LIU Jiachen;MIAO Qiguang;GAO Lin. Improved behavior-based malware detection algorithm with AdaBoost[J]. J4, 2013, 40(6): 116-124.

参考文献

［1］ Bayer U, Kruegel C, Kirda E. TTAnalyze: a Tool for Analyzing Malware［DB/OL］. ［2013-09-28］. https://www.auto.tuwien.ac.at/～chris/research/doc/eicar06_ttanalyze.pdf.
［2］ Willems C, Holz T, Freiling F. Toward Automated Dynamic Malware Analysis Using Cwsandbox［J］. IEEE Security ＆ Privacy, 2007, 5(2): 32-39.
［3］ Portokalidis G, Slowinska A, Bos H. Argos: an Emulator for Fingerprinting Zero-day Attacks for Advertised Honeypots with Automatic Signature Generation［J］. ACM SIGOPS Operating Systems Review, 2006, 40(4): 15-27.
［4］ Baecher P, Koetter M, Holz T, et al. The Nepenthes Platform: An Efficient Approach to collect malware［C］//Proceedings of the 9th International Symposium on Recent Advances in Intrusion Detection. Heidelberg: Springer, 2006: 165-184.
［5］ Jiang X, Xu D. Collapsar: A VM-based Architecture for Network Attack Detention Center［C］//Proceedings of the 13th conference on USENIX Security Symposium. Berkeley: USENIX Association, 2004: 15-28.
［6］ Christodorescu M, Jha S, Kruegel C. Mining Specifications of Malicious Behavior［C］//Proceedings of the 6th Joint Meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on the Foundations of Software Engineering. New York: ACM, 2007: 5-14.
［7］ Eskandari M, Hashemi S. A Graph Mining Approach for Detecting Unknown Malwares［J］. Journal of Visual Languages ＆ Computing, 2012, 23(3): 154-162.
［8］杨轶, 苏璞睿, 应凌云, 等. 基于行为依赖特征的恶意代码相似性比较方法［J］. 软件学报, 2011, 22(10): 2438-2453.
Yang Yi, Su Purui, Ying Lingyun, et al. Dependency-Based Malware Similarity Comparison Method［J］. Journal of Software, 2011, 22(10): 2438-2453.
［9］王蕊, 苏璞睿, 杨轶, 等. 一种抗混淆的恶意代码变种识别系统［J］. 电子学报, 2011, 39(10): 2322-2330.
Wang Rui, Su Purui, Yang Yi, et al. An Anti-obfuscation Malwaer Variants Identification System［J］. Acta Electronica Sinica, 2011, 39(10): 2322-2330.
［10］ Han K S, Kim I K, Im E G. Detection Methods for Malware Variant Using API Call Related Graphs［C］//Proceedings of the 3rd International Conference on Information Technology Convergence and Security. Heidelberg: Springer, 2011: 607-611.
［11］ Jain S, Meena Y K. Byte Level n-Gram Analysis for Malware Detection［C］//Proceedings of the 5th International Conference on Information Processing Computer Networks and Intelligent Computing. Heidelberg: Springer, 2011: 51.
［12］ Ye Y, Wang D, Li T, et al. An intelligent PE-malware Detection System Based on Association Mining［J］. Journal in Computer Virology, 2008, 4(4): 323-334.
［13］ Fredrikson M, Jha S, Christodorescu M, et al. Synthesizing Near-optimal Malware Specifications from Suspicious Behaviors［C］//Proceedings of the 2010 IEEE Symposium on Security and Privacy. New York: IEEE, 2010: 45-60.
［14］ Freund Y, Schapire R E. Experiments with a New Boosting Algorithm［C］//Proceedings of the 13th International Conference on Machine Learning. San Francisco: Morgan Kaufmann, 1996: 148-156.
［15］王勇, 陶晓玲. 分级结构的AdaBoost入侵检测方法研究［J］. 西安电子科技大学学报, 2008, 35(2): 345-350.
Wang Yong, Tao Xiaoling. Study of the Intrusion Detection Method Based on AdaBoost with a Hierarchical Structure［J］. Journal of Xidian University, 2008, 35(2): 345-350.
［16］ Mason L, Baxter J, Bartlett P, et al. Boosting Algorithms As Gradient Descent in Function Space［C］//Proceedings of the Advances in Neural Information Processing Systems. Cambridge: MIT Press, 1999: 512-518.
［17］ Hall M, Frank E, Holmes G, et al. The WEKA Data Mining Software: an Update［J］. ACM SIGKDD Explorations Newsletter, 2009, 11(1): 10-18.

[1]	陈昌川,王海宁,黄炼,黄涛,李连杰,黄向康,代少升. 一种基于局部表征的面部表情识别算法[J]. 西安电子科技大学学报, 2021, 48(5): 100-109.
[2]	周建宇,位寅生,许荣庆. 一种改进模糊C均值聚类的电离层杂波分类方法[J]. 西安电子科技大学学报, 2021, 48(2): 35-41.
[3]	段崇棣,韩超垒,杨志伟,张庆君. 一种杂波分类辅助的近海岸模糊杂波抑制方法[J]. 西安电子科技大学学报, 2021, 48(2): 64-71.
[4]	李杉,张坤,水鹏朗. 近海船长分布建模与雷达舰船分类能力评估[J]. 西安电子科技大学学报, 2021, 48(2): 84-91.
[5]	李海,尚金雷,孙婷逸,冯青,庄子波. MC-DTSVMs的双偏振气象雷达降水粒子分类方法[J]. 西安电子科技大学学报, 2020, 47(4): 132-140.
[6]	刘道华,王莎莎,杨志鹏,崔玉爽. 一种改进的局部嵌入网络人脸图像分类方法[J]. 西安电子科技大学学报, 2020, 47(4): 18-23.
[7]	李江,冯存前,王义哲,许旭光. 一种用于锥体目标微动分类的深度学习模型[J]. 西安电子科技大学学报, 2020, 47(3): 105-112.
[8]	宋建锋,韦玥,苗启广,权义宁,陈毓生. 压缩激励机制驱动的尿液细胞图像分类算法[J]. 西安电子科技大学学报, 2020, 47(2): 39-45.
[9]	闫林,刘凯,段玫妤. 一种用于点云分类的轻量级深度神经网络[J]. 西安电子科技大学学报, 2020, 47(2): 46-53.
[10]	张志昌,张治满,张珍文. 融合局部语义和全局结构信息的健康问句分类[J]. 西安电子科技大学学报, 2020, 47(2): 9-15.
[11]	曹毅,黄子龙,张威,刘晨,李巍. N-DenseNet的城市声音事件分类模型[J]. 西安电子科技大学学报, 2019, 46(6): 9-16.
[12]	肖利军,郭继昌,顾翔元. 一种采用冗余性动态权重的特征选择算法[J]. 西安电子科技大学学报, 2019, 46(5): 155-161.
[13]	杨宏宇,那玉琢. 一种Android恶意软件检测模型[J]. 西安电子科技大学学报, 2019, 46(3): 45-51.
[14]	孙宸, 成立业. 空间感知矩阵学习的极化SAR图像分类[J]. 西安电子科技大学学报, 2018, 45(6): 92-98.
[15]	李雪, 艾丽蓉, 周晓京, 张凯. 幼儿视力检查的自动评测方法[J]. 西安电子科技大学学报, 2018, 45(6): 150-155.

AdaBoost恶意程序行为检测新算法

Improved behavior-based malware detection algorithm with AdaBoost

PDF (PC)

赞

可视化

被引次数

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

Metrics

本文评价

推荐阅读 0