Research On Text Classification Algorithm Based On Support Vector Machine And Neural Network

Posted on:2020-10-15

Degree:Master

Type:Thesis

Country:China

Candidate:W F Zhu

Full Text:PDF

GTID:2428330590995461

Subject:Circuits and Systems

Abstract/Summary:

PDF Full Text Request

Nowadays,Internet development results in data explosion,especially for text messages and news.Realizing the goal of obtaining useful information immediately has become a research hotspot.The mean work of this thesis is to explore the method of mining and analyzing throughout the combination of the traditional text feature extraction method and Support Vector Machine(SVM).At the same time,in order to consider the semantic information of the text and reduce the influence of human factors,the text classification method also uses the self-learning characteristics of neural network.It is widely known that the text representation is the basis of text categorization.In general,the traditional text representation adopts statistical methods such as Information Gain(IG),CHI-Square and Mutual Information.These methods assume that each word is independent of each other,ignoring the redundancy between feature words.For the SVM-based text classifier,the performance of classification is degraded as the SVM's single-core kernel function can not completely match the data distribution.In addition,most traditional machine learning algorithms are classified as the shallow model.As a result,many problems,such as lossing feature information,easily occur because of the larger training sets and longer tests.Meanwhile,the feature selection methods which based on traditional statistical methods will increase the influence of noise.Therefore,this thesis will utlize the self-learning characteristics of neural networks in deep learning to reduce the impact of human factors.To address the above problems in text classification,the main innovations are as follows:1.Considering the fact that the redundancy between feature words is excluded in the traditional feature extraction method,this thesis proposes a two-stage text feature selection algorithm by combining IG and the improved Minimal Redundancy Maximal Relevance(MRMR)standard.Firstly,the algorithm uses IG to extract the feature set with strong class correlation.Then,the weight of mutual information between the feature and the class in MRMR algorithm is dynamically changed by the difference of class difference method.As a result,the optimal feature subset is obtained.Finally,the simulation results indicate that the proposed algorithm has better feature representation ability than the traditional feature selection algorithm.As certain precisions have achieved,the required number of features would be less.2.To improve the performance of SVM in text classification,the thesis innovatively introduces the hybrid Fourier kernel function,and proposes a SVM text classification model based on this function.In The process not only give the equation of hybrid Fourier kernel,but also prove the rationality of the proposed kernel function.The experimental simulation results reveal that the new function greatly reduces the training quantities and improves the final text classification accurancy.3.Traditional machine learning algorithm requires manual screening features and long training time during training.To solve these problems,this thesis proposes a short text classifycation model based on the attention mechanism(ABLGCNN),which adopts the attention mechanism in convolution model to connect the Long-short Term Memory(LSTM)and the Gated Recurrent Unit(GRU)in parallel.The results demonstrate that the proposed model has significant advantages in final classification accuracy and convergence speed.

Keywords/Search Tags:

Text classification, Feature extraction, Support vector machine, Kernel function, Deep learning, Attention mechanism

PDF Full Text Request

Related items

1	Research Of Automatic Text Classification Method Based On Machine Learning
2	Text Classification Based On Machine Learning
3	Research On Text Classification Based On Support Vector Machine With Mixture Of Kernels
4	Kernels For Feature Extraction And Research On Nonlinear Multiple Kernel Learning
5	Research On Application Of Support Vector Machine In Liver B Ultrasonic Images Classification
6	Research And Implementation Of Text Classification Based On Depth Learning Theory And SVM Technology
7	Research On Feature Description And Classifier Construction Algorithm In Chinese Text Classification
8	Research On Text Classification Of Mixed-kernel Parallel Support Vector Machine Based On Hadoop
9	Text Classification Algorithm Based On Deep Learning And Support Vector Machine
10	Research Of Images Classification Based On Support Vector Machine And Semi-supervised Deep Belief Network Learning