Research On Feature Reduction And Semantic Weighting Algorithm Based On N-grams

Posted on:2016-11-21

Degree:Master

Type:Thesis

Country:China

Candidate:S X Liu

Full Text:PDF

GTID:2428330548977869

Subject:Software engineering

Abstract/Summary:

PDF Full Text Request

Text feature is very important for text categorization,and it has a direct impact on the performance of the classification model and the final test results.Compared with other features,n-grams has many advantages,but its own shortcomings limit its wide application in text categorization:1)Too much sparse data.2)Feature redundancy.3)High dimension.In order to overcome the three defects mentioned above,n-grams can be used in the field of text categorization better.The paper proposes a feature reduction algorithm based on n-grams language model and a semantic weighting algorithm.Firstly,the algorithm reduces the dimension of the untreated n-grams feature set,reduces the overhead to a single n-grams feature,and then removes the redundant words in each n-grams feature to achieve the goal of n-grams reduction;finally,the test text or the training text is weighted to avoid the absolute matching of the 0 data.Experimental results in the Netease text corpus show that the proposed algorithm can accurately select high quality n-grams features from the text,and avoid the high dimensionality,redundant words and sparse data.In Vector Machine SVM(Support),the classification performance is greatly improved compared to the algorithm.

Keywords/Search Tags:

feature selection, feature weighting, n-grams language model, semantic approximation, redundancy, relevance

PDF Full Text Request

Related items

1	Research On Measures And Models In Feature Selection
2	Feature Selection Based On Class Center And Feature Weighting
3	A Research Of Feature Selection And Feature Learning
4	Feature Selection Research Based On Maximum Relevance Minimum Redundancy
5	Research On Feature Selection Method Based On Text Category Relevance Degree And Latent Semantic Analysis
6	Research Of Feature Selection For Text Classification
7	Research On Filter Feature Selection Algorithm
8	Research On Gene Selection Based On Max-Relevance And Min-Redundancy Feature Selection Algorithm
9	Improved CHI Method On Text Feature Selection
10	Research On New Feature Selection Algorithm