Font Size: a A A

Research On Feature Selection Of Text Mining Using Cloud Model

Posted on:2013-10-04Degree:MasterType:Thesis
Country:ChinaCandidate:J WanFull Text:PDF
GTID:2248330371992595Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the arrival of information age and extensive popularization of the Internet applications, great changes have taken place in people’ life, they turn more time and energy from newspapers and publication to internet media. The Internet media has become an important means to help people with more information. However, it is a problem to localize the content that we are interested in quickly and precisely when face the vast data in the Internet.Text mining is a kind of technology which uses computer to obtain valuable information from texts by some device. Text categorization and text clustering are two very important methods in this computer processing technology. The study showed that the traditional feature selection methods in text categorization and text clustering can not investigate the distribution of terms’ frequnency in text collection, so it ignores the terms’discrimination among categories. To remedy this problem, this paper introduced cloud model into feature selection. The main contributions are summarized as follows.Firstly, text categorization and text clustering were expounded concisely in this paper. We has explored and discussed thoroughly the feature selection and made detailed analysis and comparison among them.Secondly, in text categorization, we use cloud model to measure the importance of a term by relevance and discrimination. The paper maps terms into cloud droplets, uses relevance-cloud to describe the droplets’ distribution in one category and uses discrimination-cloud to describe the droplets’ distribution among all categories. Then, we build two filters named relevance-cloud-filter and discrimination-cloud-filter to select text features. In the experiment, we use naive bayes classifier and SVM classifier to verify the effectiveness of this method.Thirdly, in text clustering, we also map terms into cloud droplets and condense them into clustering-document-cloud. Then, we build clustering-document-cloud-filter to choose terms which are discriminatory among the documents with no category logo.In the experiment, we use K-means to verify the effectiveness of this method.Generally speaking, based on the uncertainty of cloud model and combines the randomness and the fuzziness, the paper has primarily researched and discussed the feature selection in text mining and achieved some results.
Keywords/Search Tags:cloud model, feature selection, text categorization, text clustering
PDF Full Text Request
Related items