Font Size: a A A

Research For Feature Selection Algorithm Based On Text Clustering

Posted on:2013-05-23Degree:MasterType:Thesis
Country:ChinaCandidate:D H FanFull Text:PDF
GTID:2248330392451230Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
In recent years, we can get number amazing document from the electronicpublications, Email, Web site. At the same time, a large number of documents also ledto the difficulties of the search, filtering and management information. Therefore, themanagement and analysis of massive text data is very important.At present, the text clustering technology has become a very important directionof text data mining. But in order to achieve the purpose of improving recognition rate,usually resulting in a huge number of original features, the original features may reachthousands of dimension, or even higher. There are a large number of redundantfeatures, causing the dimension disaster. At the same time the existing clusteringalgorithms emphasis on improving the efficiency, ignoring the fuzzy points processingand leads to correct rate of the clustering results is not good.In this paper, we analysis and research the dimension disaster problem of the textclustering and existing text clustering algorithm. We mainly do the following work:Firstly, summarize the existing feature selection methods and similaritymeasure, propose feature selection method based on word co-occurrence, improve thetext clustering correct rate, and reduce the redundancy of feature selection. therebyenhancing overall performance of the clustering algorithm, achieving the purpose ofreducing the dimension.Secondly, study of the more popular text mining algorithms, these algorithmswere described in detail, analyze their advantages and disadvantages, and thenpropose an improved algorithm for fuzzy points handling to improve the clusteringresults.Thirdly, we do a series of experiments and analyse the experimental results.Which prove the validity of the improved algorithm.Finally, this work was summarized. The future direction for further research wasdiscussed.
Keywords/Search Tags:Feature selection, Text clustering, Similarity measure, Word co-occurrence
PDF Full Text Request
Related items