Font Size: a A A

Research On The Optimization Of TextRank Keyword Extraction Algorithm And SOM Text Clustering Model

Posted on:2017-03-06Degree:MasterType:Thesis
Country:ChinaCandidate:W Z ChenFull Text:PDF
GTID:2308330485499330Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the rapid development of internet information technology, text clustering has gradually become the focus of people’s research in order to meet the requirements of the vast network of text information retrieval. Keyword extraction and clustering algorithm play an important role in the process of text clustering. To improve the text clustering effect, this paper carries on the research from two aspects:1. An improved TextRank keyword extraction algorithm is proposed for text preprocessing. Term mutual information based on sliding window, as the edge weight, will be added to the graph model of TextRank algorithm, optimized the candidate words score distribution of TextRank. And then, put vertex weight-single document term frequency (Term Frequency, TF) into the TextRank’s weight iteration formula. The term frequency is used to adjust the probability of word "jumping", to certain extent, the problem of equal probability "jumping" is solved. The experimental results show that the presented algorithm’s precision, recall ratio and F1-measure have been improved, the iterative calculation efficiency have enhanced by 20%. Extracted keywords have more representatives to the text feature, and benefit to improve the subsequent text clustering effect.2. Bayesian regularization theory is introduced to Self-Organizing Map text clustering algorithm, during the weight adjustment process, the penalty term that reflects the complexity of the network weights is added to the weight adjustment formula, thereby avoid overfitting; Bayesian inference is used to obtain the optimal hyper parameters in the weight adjustment formula, so that the network weights distribution and input data probability distribution become more consistent during the iterative training, in order to improve the text clustering effect. The experimental results on UCI and Chinese text dataset show that compared with the traditional SOM algorithm, clustering cohesion of the presented algorithm improves average 1.5 times, the accuracy of clustering is also improved, clustering effect is much better.
Keywords/Search Tags:Text clustering, TextRank algorithm, Self-Organized Mapping, Bayesian regularization
PDF Full Text Request
Related items