面向文本分类的中文文本语义表示方法

doi:10.3969/j.issn.1001-2400.2013.02.015

Abstract

Abstract:

Text representation based on word frequency statistics is often unsatisfactory because it ignores the semantic relationships between words, and considers them as independent features. In this paper, a new Chinese text semantic representation model is proposed by considering contextual semantic and background information on the words in the text. The method captures the semantic relationships between words using Wikipedia as a knowledge base. Words with strong semantic relationships are combined into a word-package as indicated by a graph node, which is weighted with the sum of the number and frequency of the words it contains. The contextual relationship between words in different word-packages is stated by a directed edge, which is weighted with the maximum weight of its adjacent nodes. The model retains the contextual information on each word with a large extent. Meanwhile, the semantic meaning between words is strengthened. Experimental results of Chinese text classification show that the proposed model can express the content of a text accurately and improve the performance of text classification. Compared to Support Vector Machines, Text Semantic Graph-based Classification can improve the efficiency by 7.8％, reduce the error rate by 1/3, and show more stability.

Key words: classification, knowledge representation, similarity, text semantic graph

CLC Number:

TP181

SONG Shengli;WANG Shaolong;CHEN Ping. Chinese text semantic representation for text classification[J].J4, 2013, 40(2): 89-97+129.

References

［1］ Li Yuhua, Mclean D, Bandar Z A, et al. Sentence Similarity Based on Semantic Nets and Corpus Statistics ［J］. IEEE Trans on Knowledge and Data Engineering, 2006,18(8): 1138-1150.
［2］ Schenker A, Last M, Bunke H, et al. Classification of Web Documents Using a Graph Model ［C］//Proc of the 7th International Conference on Document Analysis and Recognition. Washington: IEEE Computer Society, 2003: 240-244.
［3］吴江宁, 刘巧凤. 基于图结构的中文文本表示方法研究［J］. 情报学报, 2010, 29(4): 618-624.
Wu Jiangning, Liu Qiaofeng. Research on Graph Structure Based Method for Chinese Text Representation ［J］. Journal of The China Society for Scientific and Technical Information, 2010, 29(4): 618-624.
［4］ Manuel M G, Aurelio L L, Alexander G. Information Retrieval with Conceptual Graph Matching［C］//Proc of the 11th International Conference on Database and Expert Systems Applications. London: Springer-Verlag, 2000: 312-321.
［5］ Bhoopesh C, Pushpak B. Text Clustering Using Semantics ［C］//Proc of the 11th International Conference on World Wide Web. New York: ACM Press, 2002: 79.
［6］ Svetlana H. Construction of Conceptual Graph Representation of Texts ［C］//Proc of Student Research Workshop at HLT-NAACL 2004. Stroudsburg: Association for Computational Linguistics, 2004: 49-54.
［7］ Song W, Park S C. A Novel Document Clustering Model Based on Latent Semantic Analysis ［C］//Proc of the 3rd International Conference on Semantics, Knowledge and Grid. Washington: IEEE Computer Society, 2007: 539-542.
［8］ Lee C S, Kao Y F, Kuo Y H, et al. Automated Ontology Construction for Unstructured Text Documents［J］. Data ＆ Knowledge Engineering, 2007, 60(3): 547-566.
［9］ Stavrianou A, Andritsos P, Nicoloyannis N. Overview and Semantic Issues of Text Mining ［J］. ACM SIGMOD Record, 2007, 36(3): 23-34.
［10］ Jin W, Srihari R K. Graph-based Text Representation and Knowledge Discovery ［C］//Proc of the 2007 ACM Symposium on Applied Computing. New York: ACM Press, 2007: 807-811.
［11］ Chang M W, Ratinov L, Roth D, et al. Importance of Semantic Representation: Dataless Classification ［C］//Proc of the 23rd AAAI Conference on Artificial Intelligence. California: AAAI Press, 2008: 830-835.
［12］ Gabrilovich E, Markovitch S. Computing Semantic Relatedness using Wikipedia-based Explicit Semantic Analysis［C］//Proc of The 20th International Joint Conference for Artificial Intelligence. California: AAAI Press, 2007:1606-1611.
［13］ Li Yanjun, Chung S M, Holt J D. Text Document Clustering Based on Frequent Word Meaning Sequences ［J］. Data ＆ Knowledge Engineering, 2008, 64(1): 381-404.
［14］ Shaban K. A Semantic Approach for Document Clustering ［J］. Journal of Software, 2009, 4(5): 391-404.
［15］ Gad W K, Kamel M S. New Semantic Similarity Based Model for Text Clustering Using Extended Gloss Overlaps［C］//Proc of the 6th International Conference on Machine Learning and Data Mining in Pattern Recognition. Berlin: Springer-Verlag, 2009: 663-677.
［16］ Liu Jianyi, Wang Jinghua, Wang Cong. Research on Text Network Representation ［C］//Proc of IEEE International Conference on Networking, Sensing and Control. Washington: IEEE Computer Society, 2008: 1217-1221.
［17］李益红, 卢朝阳, 李静, 等. 一种提取局部区域共同向量的瑕疵分类算法［J］. 西安电子科技大学学报, 2011, 38(5): 59-64.
Li Yihong, Lu Zhaoyang, Li Jing, et al. Algorithm for Extraction of the Local Region Common Vector for Defect Classification ［J］. Journal of Xidian University, 2011, 38(5): 59-64.

[1]	QIN Ningning,WU Yisong,YANG Le. Algorithm for multivariate similarity localization based on dynamic fuzzy matching [J]. Journal of Xidian University, 2021, 48(4): 27-35.
[2]	DUAN Chongdi,HAN Chaolei,YANG Zhiwei,ZHANG Qingjun. Inshore ambiguity clutter suppression method aided by clutter classification [J]. Journal of Xidian University, 2021, 48(2): 64-71.
[3]	LI Shan,ZHNAG Kun,SHUI Penglang. Ship length distribution modeling on China offshore and ability evaluation of radar ship classification [J]. Journal of Xidian University, 2021, 48(2): 84-91.
[4]	ZHOU Jianyu,WEI Yinsheng,XU Rongqing. Improved ionospheric clutter classification method based on fuzzy C-means clustering [J]. Journal of Xidian University, 2021, 48(2): 35-41.
[5]	LI Hai,SHANG Jinlei,SUN Tingyi,FENG Qing,ZHUANG Zibo. Method for hydrometeor classification based on MC-DTSVMs [J]. Journal of Xidian University, 2020, 47(4): 132-140.
[6]	LIU Daohua,WANG Shasha,YANG Zhipeng,CUI Yushuang. Improved face image classification method based on the local embedding network [J]. Journal of Xidian University, 2020, 47(4): 18-23.
[7]	LI Jiang,FENG Cunqian,WANG Yizhe,XU Xuguang. Deep learning model for micro-motion classification of cone targets [J]. Journal of Xidian University, 2020, 47(3): 105-112.
[8]	YAN Lin,LIU Kai,DUAN Meiyu. Lightweight deep neural network for point cloud classification [J]. Journal of Xidian University, 2020, 47(2): 46-53.
[9]	ZHANG Zhichang,ZHANG Zhiman,ZHANG Zhenwen. Classifying health questions with local semantic and global structural information [J]. Journal of Xidian University, 2020, 47(2): 9-15.
[10]	SONG Jianfeng,WEI Yue,MIAO Qiguang,QUAN Yining,CHEN Yusheng. Urine cell image classification algorithm based on the squeeze and excitation mechanism [J]. Journal of Xidian University, 2020, 47(2): 39-45.
[11]	CAO Yi,HUANG Zilong,ZHANG Wei,LIU Chen,LI Wei. Urban sound event classification with the N-order dense convolutional network [J]. Journal of Xidian University, 2019, 46(6): 9-16.
[12]	XIAO Lijun,GUO Jichang,GU Xiangyuan. Algorithm for selection of features based on dynamic weights using redundancy [J]. Journal of Xidian University, 2019, 46(5): 155-161.
[13]	XIE Lixia,WEI Ruixin. IoT Node trust comprehensive evaluation model [J]. Journal of Xidian University, 2019, 46(4): 58-65.
[14]	YANG Hongyu,NA Yuzhuo. Android malware detection model [J]. Journal of Xidian University, 2019, 46(3): 45-51.
[15]	SUN Chen, CHENG Liye. Spatial sensing matrix learning for PolSAR image classification [J]. Journal of Xidian University, 2018, 45(6): 92-98.

Chinese text semantic representation for text classification

PDF (PC)

Like

Knowledge

Cited

Abstract

Cite this article

share this article

References

Related Articles 15

Metrics

Comments

Recommended 10