Study On Feature Extraction Based On Maximizing The Distance Between Classes

Posted on:2015-09-29

Degree:Master

Type:Thesis

Country:China

Candidate:F Wang

Full Text:PDF

GTID:2298330431998677

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

With the popularization of the Internet and extensive application of the onlineelectronic texts, along with the explosive growth of the texts. Text classification, animportant technology to organize and manage large amounts of text information, hasbeen widely used in many fields, which has became a hot topic in data mining area.However, the high-dimensional of text data is an important problem topersecute the performance of text classification. The huge number of feature spacenot only increases the computational complexity, but also affects the classificationperformance and generalization ability,resulting "over-learn" phenomenon.Therefore, dimension reduction has become an essential step in text classification.Generally speaking, feature selection and feature extraction are two commonmethods to reduce feature dimension. Feature selection constructs a rating functionto score the features, then selects features according the rating sequence, which is asimple and straightforward method By comparison, feature extraction generates anew set of features by assembling the original features, then selects the new features,which can solve the problem of synonyms and polysemous problems.However, due to the feature selection neglects the synonyms and polysemousphenomenon, resulting the low performance of text classification. Therefore, thispaper presents a feature extraction-based method which maximizes the distancebetween classes and constructs an optimal function by maximizing the distancebetween different categories of documents to obtain mapping matrix. Evaluation onChinese texts released by Fudan University shows that the effectiveness of ourmodel. It also shows that our model significantly outperforms chi squarestatistic-based model currently in the research field.

Keywords/Search Tags:

text classification, dimension reduction, feature extraction

PDF Full Text Request

Related items

1	Research On Feature Dimension Reduction In Text Classification
2	Study On Feature Extraction Based On Maximizing The Distance Between Classes
3	Research And Application Of Feature Dimension Reduction Algorithm In Text Classification
4	Text Emotional Classification Based On Text Mining
5	Dimension Reduction Method Research In Text Classification
6	Chinese Keyword Extraction Method Based On Word Span And Its Application In Text Classification
7	Research On Text Classification Based On Feature Selection And Feature Weighting Algorithm
8	Text Categorization And Feature Dimension Reduction Research
9	Research On The Feature Extraction Of Polarized SAR Image And The Method Of Ground Object Classification
10	Dimension Reduction Technology Research Based On Text Features