Font Size: a A A

Research And Application Of Web Text Information Clustering Algorithm

Posted on:2010-11-06Degree:MasterType:Thesis
Country:ChinaCandidate:S K ZhangFull Text:PDF
GTID:2178360275973455Subject:Communication and Information System
Abstract/Summary:PDF Full Text Request
Along with the rapid development and universal popularization of Internet,information resources on Web have been expanded increasingly in which there are both useful information and some reactionary,superstitious content.Therefore,it has become an important and urgent issue that how to make an accurate analysis and prediction of the public opinion on the network by way of searching quickly and efficiently in this huge information resources.To solve this problem,data mining technology came into being,which is based on database technology and integrated the achievement of multi-discipline such as statistics, automatic learning and fuzzy logics,to study how to acquire valuable information or pattern which is usually connotative and unknown in the database.Clustering analysis is a key technology of data mining.By contrasting the similarity and differences,it can find out the inner characteristic and distribution rule of data.After systematically reviewing the development of Web information retrieval,data mining and clustering algorithm,this dissertation summarizes the existing problems of general clustering algorithm.We attempt to design a clustering algorithm which has a wonderful performance in the application of Chinese Web clustering and implement a Web text information clustering system.The main contribution and innovation of this paper are as follows:(1) Advantages and disadvantages of several common clustering algorithms are analyzed,including Partitioning Method,Hierarchical Method,Density-Based, Grid-Based and Model-Based Clustering algorithms.By evaluate the efficiency of these algorithms we analyze their defects in the application of Web information clustering.(2) Making researches on several key technology of Chinese Network Information clustering,including word partition techniques,text representation,dimension lowering, weight analysis and calculation of similarity.(3) We mainly focus on the suffix tree clustering algorithm and make improvement based on Binary Search Tree in the application of Chinese Network information clustering.(4) On the basis of experiment results,K-means,Suffix Tree Clustering and improved Suffix Tree Clustering are compared in terms of accuracy and time-complexity. (5) Moreover,we design and implement a Chinese network information clustering system based on improved Suffix Tree Clustering algorithm which is proved efficiency and feasible by experiment results.
Keywords/Search Tags:Data Mining, Network Information Retrieval, Clustering Algorithm, Suffix Tree Clustering
PDF Full Text Request
Related items