Font Size: a A A

The Study On Feature Selection Algorithm In Chinese Text Clustering

Posted on:2007-03-15Degree:MasterType:Thesis
Country:ChinaCandidate:J GongFull Text:PDF
GTID:2178360212973117Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
In recent years, tremendous volumes of text documents have become available on the Internet, digital libraries, news sources and company-wide intranets. This has led to an increased interest in developing methods that can help users to effectively navigate, summarize and organize this information. Fast and high-quality document clustering algorithms play an important role towards this goal as they have been shown to provide both an navigation/browsing mechanism by organizing large amounts of information into a small number of meaningful clusters as well as to greatly improve the retrieval performance either via cluster-weighting. Now text clustering is one of most important topics in data mining. The research of Chinese text clustering is at its early stage, and there exist many problem that we study in this paper. The specific work is listed as:Firstly, we do certain improvement to the present the method to calculate the value of term, and propose a method based on for Computing Weight of Text Characteristic Item Based on Multiple Factors Weighting in this paper. In this method, we consider not only the appearance rate of word but also the semantic information of its in the text.Secondly, we summarize the shortcomings of the present methods to select features, and propose a method to select features based on term dedication. The test proves that this method to select feature can improve correct rate of text clustering, so it improves the overall performance of clustering and achieve the aim to drop dimension effectively.Thirdly, we study the text clustering algorithm, the k-means clustering algorithm is a simple and efficient text clustering algorithm, but it can caught local minima when the bad initial cluster centers are selected, and the solution is partial solution, rather than the global optimal solution. Therefore, we propose a modified k-means algorithm, which can increase the stability and improve the result of clustering.Finally, In Chapter V, we have a series of experiments.
Keywords/Search Tags:Chinese Text, Text Clustering, Feature Selection, Vector Space Model(VSM)
PDF Full Text Request
Related items