结合紧密度和分散度的近邻亲和相似度函数

doi:10.3969/j.issn.1001-2400.2014.03.018

J4 ›› 2014, Vol. 41 ›› Issue (3): 123-130.doi: 10.3969/j.issn.1001-2400.2014.03.018

结合紧密度和分散度的近邻亲和相似度函数

李娟^1,2;王宇平¹

(1. 西安电子科技大学计算机学院，陕西西安 710071；
2. 陕西师范大学远程教育学院，陕西西安 710062)

收稿日期:2013-03-13 出版日期:2014-06-20 发布日期:2014-07-10
通讯作者: 李娟
作者简介:李娟(1979-)，女，讲师，西安电子科技大学博士研究生，E-mail: ally_2004@126.com．
基金资助:
国家自然科学基金资助项目(61272119)

New nearest neighbor affinity similarity function based on separation and compactness between samples

LI Juan^1,2;WANG Yuping¹

(1. School of Computer Science and Technology, Xidian Univ., Xi'an 710071， China;
2. School of Distance Education, Shaanxi Normal Univ., Xi'an 710062， China)

Received:2013-03-13 Online:2014-06-20 Published:2014-07-10
Contact: LI Juan

摘要/Abstract

摘要：

针对传统距离或相似度度量未考虑个体样本对整体样本集影响的情况，对K近邻算法提出了一种相似度改进策略．首先提出了一种新的亲和距离函数，以样本对整体样本集的紧密度和分散度为关注点；其次在亲和距离函数的基础上，提出了一种新的基于紧密度和分散度的亲和相似度函数，并将其作为K近邻算法相似度度量函数；最后通过理论分析及18个数值类型UCI数据集，以5交叉验证模式对所提出亲和相似度函数与传统距离和相似度函数进行验证对比．实验表明，所提出方法是一种有效的相似度策略，且与高效索引算法相结合，可降低在大规模数据集的分类时间．

关键词: 机器学习, 近邻, 亲和相似度, 分散度, 紧密度

Abstract:

Traditional distance and similarity measurements did not take into account the influence of the individual sample on the whole sample set. To deal with this issue, a new similarity improvement strategy of k-nearest neighbor algorithm (KNN) is proposed in the paper. First, a new affinity distance function is introduced, which focuses on the separation and compactness between each individual sample and the whole sample set. Second, a new similarity function using this affinity distance function is proposed and taken as the similarity measure function in the KNN. Third, a theoretical analysis of and experiments on eighteen numerical UCI (University of California Irvine) datasets are made to compare the affinity similarity function proposed in this paper with classical distance or similarity functions through 5-fold partitioning cross-validations. Finally, classification results indicate that the proposed affinity similarity function is not only an effective similarity strategy for classification, but can reduce the classification time for large-scale data sets by combining efficient indexing algorithms.

Key words: machine learning, nearest neighbors, affinity similarity, separation, compactness

李娟;王宇平. 结合紧密度和分散度的近邻亲和相似度函数[J]. J4, 2014, 41(3): 123-130.

LI Juan;WANG Yuping. New nearest neighbor affinity similarity function based on separation and compactness between samples[J]. J4, 2014, 41(3): 123-130.

参考文献

［1］ Wu Xindong, Kumar V, Quinlan J R, et al. Top 10 Algorithms in Data Mining ［J］. Knowledge and Information Systems, 2008, 14(1): 1-37.
［2］ Hakan A. Improving the k-nearest Neighbour Rule: Using Geometrical Neighbourhoods and Manifold-based Metrics ［J］. Experts Systems, 2011, 28(4): 391-406.
［3］ Towell G, Shavlik J, Noordewier M. Refinement of Approximate Domain Theories by Knowledge-Based Neural Networks ［C］//Proceedings of 18th National Conference on Artificial Intelligence. Cambridge: MIT Press, 1990: 861-866.
［4］ Lin Zhiwei, Wang Hui, Sally M. A Multidimensional Sequence Approach to Measuring Tree Similarity ［J］. IEEE Transactions on Knowledge and Data Engineering, 2012, 24(2): 197-208.
［5］ Olson D L, Delen D. Advanced Data Mining Techniques ［M］. Berlin: Springer, 2008: 39-52.
［6］ Huan J, Wang W, Prins J, et al. Spin: Mining Maximal Frequent Subgraphs from Graph Databases ［C］//Proceedings of the 10th ACM SIGKDD International conference on Knowledge Discovery and Data Mining. New York: ACM, 2004: 581-586.
［7］陈凤, 杜兰, 保铮. 一种优化K 近邻准则及在雷达HRRP 目标识别中的应用［J］. 西安电子科技大学学报, 2007, 34(5): 681-686.
Chen Feng, Du Lan, Bao Zheng. Modified KNN Rule with Its Application in Radar HRRP Target Recognition ［J］. Journal of Xidian University, 2007, 34(5): 681-686.
［8］ Hui Wang. Neighborhood Counting Measure and Minimum Risk Metric ［J］. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2010, 32(4): 766- 768.
［9］ Zeng Yong, Yang Yupu, Zhao Liang. Pseudo Nearest Neighbor Rule for Pattern Classification ［J］. Expert Systems with Applications, 2009, 36(2): 3587-3595.
［10］ Bhattacharyra G, Ghosh K, Chowdhury A S. An Affinity-based New Local Distance Function and Similarity Measure for kNN ［J］. Patter Recognition Letters, 2012, 33(3): 356-363.
［11］ Hu Qinghua, Zhu Pengfei, Yang Yongbin, et al. Large-margin Nearest Neighbor Classifiers Via Sample Weight Learning ［J］. Neurocomputing, 2011, 74(4): 656-660.
［12］ Gou Jianping, Zhang Yi, Du Lan, et al. A Local Mean-Based k-Nearest Centroid Neighbor Classifier ［J］. Computer Journal, 2012, 55(9): 1058-1071.
［13］ Gao Yunlong, Pan Jinyan, Ji Guoli, et al. A Novel Two-level Nearest Neighbor Classification Algorithm Using an Adaptive Distance Metric ［J］. Knowledge-based Systems, 2012(26): 103-110.
［14］ Mitra P, Murthy C A, Pal S K. Unsupervised Feature Selection Using Feature Similarity ［J］. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2002, 24(3): 301-312.
［15］ Adam M. An Externalization of the k-d tree ［J］. Romanian Journal of Information Science and Technology, 2007, 10(4): 323-333.
［16］ Asuncion A, Newman D J. UCI Machine Learning Repository ［EB/OL］. ［2012-06-10］. http://archive.ics.uci.edu/ml/.

[1]	王波,邓科. DB-SMOTE及多层堆叠用于心律失常识别[J]. 西安电子科技大学学报, 2021, 48(4): 136-143.
[2]	曾勇,吴正远,董丽华,刘志宏,马建峰,李赞. 加密流量中的恶意流量识别技术[J]. 西安电子科技大学学报, 2021, 48(3): 170-187.
[3]	张树栋,高海昌,曹曦文,康帅. 针对ASR系统的快速有目标自适应对抗攻击[J]. 西安电子科技大学学报, 2021, 48(1): 168-175.
[4]	王俊祥,黄霖,张影,倪江群,林朗. 低复杂度的增强图像来源检测算法[J]. 西安电子科技大学学报, 2021, 48(1): 96-106.
[5]	闫林,刘凯,段玫妤. 一种用于点云分类的轻量级深度神经网络[J]. 西安电子科技大学学报, 2020, 47(2): 46-53.
[6]	王博远,刘学林,蔚保国,贾瑞才,甘兴利,黄璐. WiFi指纹定位中改进的加权k近邻算法[J]. 西安电子科技大学学报, 2019, 46(5): 41-47.
[7]	张雨禾;耿国华;魏潇然;石晨晨;张顺利. 采用密度空间聚类的散乱点云特征提取方法[J]. 西安电子科技大学学报, 2017, 44(2): 114-120.
[8]	王秀美;丁利杰;高新波. 一种相似性保持的线性嵌入哈希方法[J]. J4, 2016, 43(1): 94-98.
[9]	马艳萍;姬光荣;邹海林;谢洪涛. 数据依赖的多索引哈希算法[J]. J4, 2015, 42(4): 159-164.
[10]	魏冬梅;周卫东. 近邻样本协作表示的人脸识别算法[J]. J4, 2015, 42(3): 115-121.
[11]	田玉敏；云艳娥；马天骏. 判别近邻保持嵌入人脸识别[J]. J4, 2011, 38(3): 24-28+98.
[12]	何周灿;王庆;杨恒. 图像特征匹配中一种快速关键维过滤搜索算法[J]. J4, 2010, 37(3): 534-540.
[13]	王法松1;3;李宏伟1;李睿2. 非参数GKNN估计的高效独立成分分析算法 [J]. J4, 2008, 35(4): 764-768.
[14]	曹向海;刘宏伟;吴顺君. 数据加长和最近邻特征线分类器用于距离像识别 [J]. J4, 2007, 34(6): 930-934.
[15]	陈凤;杜兰;保铮. 一种优化K近邻准则及在雷达HRRP目标识别中的应用 [J]. J4, 2007, 34(5): 681-686.

结合紧密度和分散度的近邻亲和相似度函数

New nearest neighbor affinity similarity function based on separation and compactness between samples

PDF (PC)

赞

可视化

被引次数

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

Metrics

本文评价

推荐阅读 10