The Research Of Text Clustering And Keywords Extraction Based On Complex Network Theory

Posted on:2012-05-20

Degree:Master

Type:Thesis

Country:China

Candidate:F H Xie

Full Text:PDF

GTID:2218330335475790

Subject:Computer software and theory

Abstract/Summary:

PDF Full Text Request

With the rapid development of information technology, the number of text data is increasing amazingly. How to quickly access the useful text information in large text data, properly manage and use these text messages has become the urgent problem. Getting use of the data mining technology reasonable can efficiently help to solve this problem.Text clustering and text keyword extraction is an important field in text mining research. Text clustering divides the text of document into several clusters, which requires that the texts assigned to each cluster are more similar to each other than the texts assigned to different clusters. As an unsupervised machine learning method, text clustering doesn't require the training set or need to know the number of clusters in advance. It has a great of flexibility and reality. Text keyword extraction is one of the important text information processing technology. It is the premise and foundation of information processing including automatic categorization, automatic clustering, automatic summary generation and so on.This thesis introduced the background of the text mining and text keywords extraction, research significance, research status and relevant theoretical knowledge. This thesis summarized domestic and foreign classics theoretical knowledge, proposed a new text clustering method and a new text keywords extraction. Main work includes the following two aspects:1. Based on partitioning community in complex network a text clustering method is proposed. Firstly, a new algorithm for detecting community structures in a weighted complex network is proposed. To partition the weighted complex network into groups, the algorithm looks for the density sets constantly and some proper operations are executed. Secondly, the proposal is applied to cluster text documents which are represented by the vector space model. A weighted complex network is constructed in terms of the similarity between two documents calculated by the cosine function. And then the community structure in this network is detected by the proposed algorithm. Finally, the experiment results show that the proposed algorithm has a good clustering efficiency by clustering some samples of Reuters-21578 data sets.2. Analyzed the characteristic and disadvantages of the existing keywords extraction algorithm based on complex network, a new keywords extraction algorithms based on weighted complex network is proposed. First of all, a weighted complex network model is constructed according to the relationship between the feature words of text. Secondly, the weighted clustering coefficient and betweenness are introduced to calculate the node's multi-feature value. Finally, the keywords are extracted by the multi-feature value. The experiment results show that the keywords extracted in this algorithm have great contribution to the text subject, and the accuracy of keywords extraction is better than the existing algorithms.

Keywords/Search Tags:

text clustering, keywords extraction, weighted complex network, density set, multifeature value

PDF Full Text Request

Related items

1	Automatic Extraction Of Keywords And Text Summarization In Text Mining
2	The Applied Research Of Complex Networks In Processing Of Web News Information
3	Text Correlation Research Based On Subspace Clustering
4	Text Keyword Extraction Analysis Platform Based On The Complex Network
5	The Research And Application Of Clustering Algorithm Based On Density
6	Reasearch On The Telecommunication Complaint Text Clustering Based On Improved CFSFDP Algorithm
7	Research On Recommendation Of Network Public Opinion Hotspot Keywords Based On Clustering
8	Research On Multi-strategy Keywords Extraction And Quick Text Classification
9	Research And Implementation Of Text Mining Technology Based On Public Security Information
10	Research On Clustering Algorithm Based On Density Peak And Its Application In Text Clustering