Research On Feature Selection Of Text Mining Using Cloud Model

Posted on:2013-10-04

Degree:Master

Type:Thesis

Country:China

Candidate:J Wan

Full Text:PDF

GTID:2248330371992595

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

With the arrival of information age and extensive popularization of the Internet applications, great changes have taken place in people’ life, they turn more time and energy from newspapers and publication to internet media. The Internet media has become an important means to help people with more information. However, it is a problem to localize the content that we are interested in quickly and precisely when face the vast data in the Internet.Text mining is a kind of technology which uses computer to obtain valuable information from texts by some device. Text categorization and text clustering are two very important methods in this computer processing technology. The study showed that the traditional feature selection methods in text categorization and text clustering can not investigate the distribution of terms’ frequnency in text collection, so it ignores the terms’discrimination among categories. To remedy this problem, this paper introduced cloud model into feature selection. The main contributions are summarized as follows.Firstly, text categorization and text clustering were expounded concisely in this paper. We has explored and discussed thoroughly the feature selection and made detailed analysis and comparison among them.Secondly, in text categorization, we use cloud model to measure the importance of a term by relevance and discrimination. The paper maps terms into cloud droplets, uses relevance-cloud to describe the droplets’ distribution in one category and uses discrimination-cloud to describe the droplets’ distribution among all categories. Then, we build two filters named relevance-cloud-filter and discrimination-cloud-filter to select text features. In the experiment, we use naive bayes classifier and SVM classifier to verify the effectiveness of this method.Thirdly, in text clustering, we also map terms into cloud droplets and condense them into clustering-document-cloud. Then, we build clustering-document-cloud-filter to choose terms which are discriminatory among the documents with no category logo.In the experiment, we use K-means to verify the effectiveness of this method.Generally speaking, based on the uncertainty of cloud model and combines the randomness and the fuzziness, the paper has primarily researched and discussed the feature selection in text mining and achieved some results.

Keywords/Search Tags:

cloud model, feature selection, text categorization, text clustering

PDF Full Text Request

Related items

1	The Research Of Text Representation And Feature Selection In Text Categorization
2	Text Categorization And Feature Dimension Reduction Research
3	Research On Key Problems In Text Mining Based On Cloud Method
4	Research On Chinese Text Categorization Algorithms Based On Technology Text
5	A Study On Text Categorization Based On Machine Learning
6	Research On Text Categorization Based On LDA And SVM
7	Theoretical Analysis And Algorithm Study On Feature Selection For Text Categorization
8	The Research And Implementation Of Chinese Text Categorization System
9	Research On Text Feature Selection Algorithm And Its Application In Micro-Blog
10	Knn Text Classification Algorithm Based On The Semantics Of The Center