Research For Feature Selection Algorithm Based On Text Clustering

Posted on:2013-05-23

Degree:Master

Type:Thesis

Country:China

Candidate:D H Fan

Full Text:PDF

GTID:2248330392451230

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

In recent years, we can get number amazing document from the electronicpublications, Email, Web site. At the same time, a large number of documents also ledto the difficulties of the search, filtering and management information. Therefore, themanagement and analysis of massive text data is very important.At present, the text clustering technology has become a very important directionof text data mining. But in order to achieve the purpose of improving recognition rate,usually resulting in a huge number of original features, the original features may reachthousands of dimension, or even higher. There are a large number of redundantfeatures, causing the dimension disaster. At the same time the existing clusteringalgorithms emphasis on improving the efficiency, ignoring the fuzzy points processingand leads to correct rate of the clustering results is not good.In this paper, we analysis and research the dimension disaster problem of the textclustering and existing text clustering algorithm. We mainly do the following work:Firstly, summarize the existing feature selection methods and similaritymeasure, propose feature selection method based on word co-occurrence, improve thetext clustering correct rate, and reduce the redundancy of feature selection. therebyenhancing overall performance of the clustering algorithm, achieving the purpose ofreducing the dimension.Secondly, study of the more popular text mining algorithms, these algorithmswere described in detail, analyze their advantages and disadvantages, and thenpropose an improved algorithm for fuzzy points handling to improve the clusteringresults.Thirdly, we do a series of experiments and analyse the experimental results.Which prove the validity of the improved algorithm.Finally, this work was summarized. The future direction for further research wasdiscussed.

Keywords/Search Tags:

Feature selection, Text clustering, Similarity measure, Word co-occurrence

PDF Full Text Request

Related items

1	Chinese Text Clustering Based On Text Similarity
2	Research On Local Feature Selection Of Chinese Text
3	The Research And Application Of Clustering Feature Selection Methods
4	The Description Of Text's Feature Based On Semanteme Concept
5	A Study On Similarity Of Student's Homework Text Under Certain Condition
6	Research And Improvement On Text Classfication Based On Word Embedding
7	Study On Similarity-based Text Clustering Algorithm And It's Application
8	Study On Chinese Text Similarity Computing Based On Word Segmentation
9	Research On The Key Techniques Of Chinese Text Clustering
10	An Automatic Similarity Detection Engine Between Sacred Texts Using Text Mining and Similarity Measure