Chinese Text Clustering Algorithm Based On Suffix Tree Research

Posted on:2006-09-21

Degree:Master

Type:Thesis

Country:China

Candidate:L H Lu

Full Text:PDF

GTID:2208360182456267

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

Text mining means the implied, useful and interesting patterns and knowledge discovered in substantial conglomeration of text documents or corpus. The availability of text mining technique makes it possible to process the large store of text resources in great batches. The processing upon texts offers much potential for development in such fields as information retrieval.The thesis is will deal with the text clustering. Text clustering which is known as a significant way towards text mining and at the same time an important branch of data mining, lays its emphasis on Chinese text clustering based on suffix tree. As a data structure, the suffix tree was first presented to support the string matching and queries, for instance: searching the maximum repetition substring, matching of the similar strings, stings comparisons etc. STC is a method that regards the text as phrase string not as word corpus. Thus it enables us to use the similar information between the phrases to effect a better clustering. STC has already been successfully utilized in some areas in English text clustering. This paper is devoted to affect the STC in Chinese text clustering.This paper underline it's emphasis on the techniques and theories of data mining, especially focuses on Chinese text clustering. The paper includes the following main aspects:(1) Research on text clustering algorithm, especially on k-means algorithm and its application to the Chinese texts.(2) Study on Chinese text clustering models in compliance with the characteristics of Chinese texts.(3) The feasibility of applying the suffix tree technique to Chinese text clustering has been studied deeply and tested.(4) Design and implement a Chinese text clustering system which has the clustering function in the k-means and STC algorithms.(5) Some valuable results on several groups of the Chinese text data sets are obtained and theoretically explained and demonstrated after some experiments are carried out and comparisons are made between thek-means and STC algorithms. The problems occurred in the experiments are discussed and a future research direction is presented.Lu Lihua (Computer Application Technology) Directed by Prof. Gao Maoting...

Keywords/Search Tags:

Text Mining, Text Clustering, K-means, STC

PDF Full Text Request

Related items

1	Text Clustering Based On K-means Algorithm And Realization
2	Chinese Text Clustering Algorithm Based On Suffix Tree Research
3	Design And Implementation Of Distributed Text Clustering System Based On K-means
4	K-NN, K-means And The Application In Text Mining
5	Based On The Text Of The K-means Clustering Analysis
6	Clustering And Its Application In Text Mining
7	An Improved K-Means Algorithm And Its Application In Bidding Data Analysis
8	K-means Text Clustering Algorithm Based On Double Genetic Algorithm In Text Mining
9	Research On Web Text Mining
10	The Research Of Clustring Analysis's Application In Web Text Mining