Research On Feature Extraction And Classification Algorithm In Text Categorization

Posted on:2019-10-19

Degree:Master

Type:Thesis

Country:China

Candidate:H F Li

Full Text:PDF

GTID:2428330548976808

Subject:Management Science and Engineering

Abstract/Summary:

PDF Full Text Request

The gradual maturity of the Internet and the development of social media such as micro-blog have greatly changed the way people live,and more and more users like to publish information,view real time information and evaluate information through the network.The rapid expansion of network traffic leads to the rapid increase of network data,which makes the user find a contradictory relationship between the speed of searching the required information and the amount of data.Text categorization is one of the effective ways to deal with and reduce information overload.Feature extraction and classification modeling are the two most important tasks in text categorization.There are many problems in Chinese text,such as polysemyand synonym,making semantic method has a good effect in feature extraction.In addition,considering the contribution of boundary samples and class center samples to classification contribution,enhancing the role of boundary samples in classification is beneficial to improving classification performance.At the same time,in traditional classification modeling,a single classifier is usually used to model.The characteristics of single classifier determine the different classification effects in different application scenarios,making the single classifier lose part of the sample information in different scenarios.In view of the above problems,this paper has done the following research:(1)In the feature extraction method,in view of the problem that the Sprinkling semantic feature extraction method does not fully consider the weight of the sample category,this paper presents a K-Sprinkling feature extraction method based on the sample weight function,and the sample weight function is used to represent the size of sample class contribution to the sample.This method uses the Cauchy distribution membership function as the weight of the sample class,and improves the membership function of the Cauchy distribution by analyzing the shortcomings of the membership function of the Cauchy distribution,thus constructing the sample class weight function which combines the sample membership and the sample tightness,and integrates the weight of the samples into the Sprinkling.In the feature vector space,singular value decomposition is used to realize the deep level feature mining,thus completing the text classification task.(2)In the classification modeling method,considering the different concerns of different feature extraction methods,a text classification method based on multi type classifier fusion is proposed.The method combines word2vec,principal component analysis,latent semantic indexing and TFIDF feature extraction as a feature extraction method for multi type classifier fusion.In view of the problem that the weighted voting method of multi type classifiers ignores the class information,a class weight classifier weight calculation method is proposed.The classification weight in this method is based on the classification accuracy of the training set in the classifier as the weight of the sample class,and then uses the weight of the sample to vote in the voting decision to achieve the text classification modeling.The experimental results show that the method used in this paper is more effective than the improved feature extraction method.It has a great improvement in accuracy,recall and F1 values,and also has a good classification effect on the unbalanced dataset and the data set under specific situations.It is also of support for text classification in other fields.

Keywords/Search Tags:

text classification, feature extraction, sample weight, feature extraction method fusion

PDF Full Text Request

Related items

1	Feature Extraction And Classification Methods For Imbalanced And Small Sample Datasets
2	Term Weight-Based Chinese Text Classification Algorithm
3	Text Entities And Their Relationship Mining Based On Feature Fusion
4	Chinese Keyword Extraction Method Based On Word Span And Its Application In Text Classification
5	Research On Feature Vector Optimization Techniques In Web Text Classification
6	Design And Implementation Of Text Classification Model Based On The Improved TF-IDF Feature Extraction
7	Research On Text Feature Extraction Method Based On Spark
8	Research And Application Of Talent Job Online Matching Based On Text Feature Extraction Technology
9	Research On Text Feature Extraction Based On A Method Named CM-RS
10	Web Text Classification Method And System Realization