Font Size: a A A

Study On Feature Extraction Based On Maximizing The Distance Between Classes

Posted on:2015-09-29Degree:MasterType:Thesis
Country:ChinaCandidate:F WangFull Text:PDF
GTID:2298330431998677Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the popularization of the Internet and extensive application of the onlineelectronic texts, along with the explosive growth of the texts. Text classification, animportant technology to organize and manage large amounts of text information, hasbeen widely used in many fields, which has became a hot topic in data mining area.However, the high-dimensional of text data is an important problem topersecute the performance of text classification. The huge number of feature spacenot only increases the computational complexity, but also affects the classificationperformance and generalization ability,resulting "over-learn" phenomenon.Therefore, dimension reduction has become an essential step in text classification.Generally speaking, feature selection and feature extraction are two commonmethods to reduce feature dimension. Feature selection constructs a rating functionto score the features, then selects features according the rating sequence, which is asimple and straightforward method By comparison, feature extraction generates anew set of features by assembling the original features, then selects the new features,which can solve the problem of synonyms and polysemous problems.However, due to the feature selection neglects the synonyms and polysemousphenomenon, resulting the low performance of text classification. Therefore, thispaper presents a feature extraction-based method which maximizes the distancebetween classes and constructs an optimal function by maximizing the distancebetween different categories of documents to obtain mapping matrix. Evaluation onChinese texts released by Fudan University shows that the effectiveness of ourmodel. It also shows that our model significantly outperforms chi squarestatistic-based model currently in the research field.
Keywords/Search Tags:text classification, dimension reduction, feature extraction
PDF Full Text Request
Related items