Font Size: a A A

Research On The Algorithm For Text Clustering Based On Feature Words

Posted on:2010-04-22Degree:MasterType:Thesis
Country:ChinaCandidate:K CaiFull Text:PDF
GTID:2178360275956359Subject:Applied Mathematics
Abstract/Summary:PDF Full Text Request
As computer network is developing rapidly, manage and analyze the text information has turn into concern issue because of the increasing exponential of text information is continually appearance with multifarious form, it bring on difficulty in information searching, information filtering and management. High quality and high speed text clustering technology will distinguish the large numbers of text information into some significative cluster, this technique can provide navigation, browse and improve the searches performance. Hence the research of the text clustering technology have already became one of all-important part in text data mining.Text clustering technology among the core technology of data mining, the aim at divide the text aggregate in some cluster meanwhile require the content of one cluster must resemble by all means and the content of differ cluster must highly dissimilarity. The aim of this research paper therefore focus on analyzing and research the "curses of Dimension ", optimize of clustering initialization and text clustering algorithm in text clustering. The report is therefore produced into several ways to address each key task:Analyzing the core technique of text clustering process, researching the character item 's weight calculation method and using the position weight information of character item to betterment the classical TF-IDF weight calculation. Bring forward P-TF-IDF weight calculation method. k-means algorithm and F1-measure etc. clustering validity evaluate guideline to validate the validity effect in weight calculation method at P-TF-IDF after ameliorated due to the experiment.Put forward a sort of character decline dimension method called topN methods in allusion to be faced with "curses of dimension" issue during clustering process. Moreover use clustering validity validate to prove the validity for text clustering by topN method.Eventually, combine weight calculation method P-TF-IDF and topN method this paper base on divisiory text clustering bring forward the algorithm for text clustering based on feature words. Also via compare experiment data testing and vary algorithm analysis, the Algorithm shows preferable performance.
Keywords/Search Tags:Text Mining, Text Clustering, Feature Dimension Reduction, Feature Words
PDF Full Text Request
Related items