Research Of Web Text Clustering Based On Semantic

Posted on:2015-03-10

Degree:Master

Type:Thesis

Country:China

Candidate:X Chen

Full Text:PDF

GTID:2268330428966818

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

With the development of information technology, especially the Internettechnology and mature, people more and more available information. With such vastamounts of information, on the one hand, people demand for fast, accurate andcomprehensive information. On the other hand, information is redundancies andchaotic. As the most pressing issues in the information processing field, effectivelyacquisition, analysis, management information has become more and more importantin the researchers. Therefore, Web text clustering become one of the importantresearch direction in the field of information retrieval. At present, the traditional textclustering method based on vector space model due to its text eigenvector highdimensional and sparse sexual characteristics, the research of this direction is hard tohave any breakthrough and innovation. The research object of existing text clusteringmethod based on semantic is more confined to the traditional text, and a lack of WebChinese text clustering analysis, which lead to these clustering methods getunderachiever result when applied to Web text in Chinese.This paper analyzes the present situation of the study on Chinese text clusteringmethod. Based on that, deal with the Web text characteristics, such as updates fast,short length and non-standard words,the analysis method based on HowNet semanticsis used to study the Web text clustering. First of all, on the basis of understanding thestructure of HowNet, this paper improved word similarity calculation method, made itmore in line with the specification of semantic. Then through the analysis of therelated difficulty of Web text clustering algorithm, HowNet semantics similaritycalculation is introduced into the traditional Fuzzy C-Means algorithm. This is animproved algorithm of K-Means algorithm, whichuses semantic similarity thresholdvalue to control the number of iterations of clustering. Based on this algorithm, themicroblogging topic discovery system was designed and implemented. The systemcan automatically fetching updated daily Weibo from Sina Weibo. The content of the microblog weibo in the same clustering cluster will be considered to be talking aboutthe same topic, which can realize the function of the weibo topics found.Finally, the effect of algorithm and the experiment analysis of functions of thesystem show that, the algorithm has obvious effect of improvementcompared with thetraditional Web text clustering. Based on this algorithm,the designed and implementedsystem can better meet the expected requirements.

Keywords/Search Tags:

HowNet, Fuzzy C-Means, text clustering, semantic similarity

PDF Full Text Request

Related items

1	Research On Document Clustering Based On Semantic Similarity Of Hownet
2	Clustering Algorithm Research Of Short Text Based On Semantic Similarity
3	Research On Text Clustering Based On Hownet
4	Search Of Group Intelligent Text Clustering Methods Based On Semantic Similarity
5	Research On The Scale Free Graph K-medoids Cluster Algorithm
6	Research On Chinese Spam Filtering Based On Semantic Body And Text Clustering
7	Research And Implementation Of Text Similarity Computing Based On HowNet Sememe Space
8	Research On Ontology-Based Semantic Text Categorization
9	Study On The Chinese Text Clustering Algorithm Based On Semantic Similarity
10	Research On Text Similarity Measure Method Of Combining New Word Analysis And Semantic Analysis