The Study On Feature Selection Algorithm In Chinese Text Clustering

Posted on:2007-03-15

Degree:Master

Type:Thesis

Country:China

Candidate:J Gong

Full Text:PDF

GTID:2178360212973117

Subject:Computer application technology

Abstract/Summary:

In recent years, tremendous volumes of text documents have become available on the Internet, digital libraries, news sources and company-wide intranets. This has led to an increased interest in developing methods that can help users to effectively navigate, summarize and organize this information. Fast and high-quality document clustering algorithms play an important role towards this goal as they have been shown to provide both an navigation/browsing mechanism by organizing large amounts of information into a small number of meaningful clusters as well as to greatly improve the retrieval performance either via cluster-weighting. Now text clustering is one of most important topics in data mining. The research of Chinese text clustering is at its early stage, and there exist many problem that we study in this paper. The specific work is listed as:Firstly, we do certain improvement to the present the method to calculate the value of term, and propose a method based on for Computing Weight of Text Characteristic Item Based on Multiple Factors Weighting in this paper. In this method, we consider not only the appearance rate of word but also the semantic information of its in the text.Secondly, we summarize the shortcomings of the present methods to select features, and propose a method to select features based on term dedication. The test proves that this method to select feature can improve correct rate of text clustering, so it improves the overall performance of clustering and achieve the aim to drop dimension effectively.Thirdly, we study the text clustering algorithm, the k-means clustering algorithm is a simple and efficient text clustering algorithm, but it can caught local minima when the bad initial cluster centers are selected, and the solution is partial solution, rather than the global optimal solution. Therefore, we propose a modified k-means algorithm, which can increase the stability and improve the result of clustering.Finally, In Chapter V, we have a series of experiments.

Keywords/Search Tags:

Chinese Text, Text Clustering, Feature Selection, Vector Space Model(VSM)

Related items

1	On Research For Chinese Automatic Text Categorization Technology Based On VSM Model And Feature Selection
2	Research On Chinese Text Categorization Algorithms Based On Technology Text
3	Automatic Classification Research On Chinese Web Document Orientation
4	Chinese Text Data Classification
5	Research Of Text Clustering Technology Based On Colony Intelligence
6	Research On Data Mining Technologies Applied To Web Chinese Text
7	The Research And Implementation Of Chinese Text Categorization System
8	Extraction Of Chi-square Features In Chinese Text Classification And Improvement Of TF-IDF Weight
9	Design And Implementation Of Text Clustering Based On Vector Space Model
10	Text Classification Method Based On Unsupervised Clustering And Naive Bayesian Classifier