Research On The Optimization Of TextRank Keyword Extraction Algorithm And SOM Text Clustering Model

Posted on:2017-03-06

Degree:Master

Type:Thesis

Country:China

Candidate:W Z Chen

Full Text:PDF

GTID:2308330485499330

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

With the rapid development of internet information technology, text clustering has gradually become the focus of peopleâ€™s research in order to meet the requirements of the vast network of text information retrieval. Keyword extraction and clustering algorithm play an important role in the process of text clustering. To improve the text clustering effect, this paper carries on the research from two aspects:1. An improved TextRank keyword extraction algorithm is proposed for text preprocessing. Term mutual information based on sliding window, as the edge weight, will be added to the graph model of TextRank algorithm, optimized the candidate words score distribution of TextRank. And then, put vertex weight-single document term frequency (Term Frequency, TF) into the TextRankâ€™s weight iteration formula. The term frequency is used to adjust the probability of word "jumping", to certain extent, the problem of equal probability "jumping" is solved. The experimental results show that the presented algorithmâ€™s precision, recall ratio and F1-measure have been improved, the iterative calculation efficiency have enhanced by 20%. Extracted keywords have more representatives to the text feature, and benefit to improve the subsequent text clustering effect.2. Bayesian regularization theory is introduced to Self-Organizing Map text clustering algorithm, during the weight adjustment process, the penalty term that reflects the complexity of the network weights is added to the weight adjustment formula, thereby avoid overfitting; Bayesian inference is used to obtain the optimal hyper parameters in the weight adjustment formula, so that the network weights distribution and input data probability distribution become more consistent during the iterative training, in order to improve the text clustering effect. The experimental results on UCI and Chinese text dataset show that compared with the traditional SOM algorithm, clustering cohesion of the presented algorithm improves average 1.5 times, the accuracy of clustering is also improved, clustering effect is much better.

Keywords/Search Tags:

Text clustering, TextRank algorithm, Self-Organized Mapping, Bayesian regularization

PDF Full Text Request

Related items

1	Research On Short Text Clustering Based On CSUAP And TextRank Algorithm
2	Research On Chinese Text Summary Extraction Algorithm Based On TextRank
3	Research On American Think Tank Text From The Perspective Of Keywords
4	Significant Study Of Text Clustering Model Based On Machine Learning
5	Research And Implementation Of Text Clustering Algorithm Based On Non-parametric Bayesian Model
6	Optimize SOM Algorithm To Apply In Text Clustering
7	Research On Short Text Automatic Summarization Algorithm Based On TextRank And Word2Vec
8	A Deep Dictionary Learning Model Based On Tensor
9	Design And Implementation Of Automatic Summarization System Based On Textrank Algorithm
10	Research On Text Mining Based Web Information Retrieval