Font Size: a A A

Research On Text Classification Algorithm Based On Support Vector Machine And Neural Network

Posted on:2020-10-15Degree:MasterType:Thesis
Country:ChinaCandidate:W F ZhuFull Text:PDF
GTID:2428330590995461Subject:Circuits and Systems
Abstract/Summary:PDF Full Text Request
Nowadays,Internet development results in data explosion,especially for text messages and news.Realizing the goal of obtaining useful information immediately has become a research hotspot.The mean work of this thesis is to explore the method of mining and analyzing throughout the combination of the traditional text feature extraction method and Support Vector Machine(SVM).At the same time,in order to consider the semantic information of the text and reduce the influence of human factors,the text classification method also uses the self-learning characteristics of neural network.It is widely known that the text representation is the basis of text categorization.In general,the traditional text representation adopts statistical methods such as Information Gain(IG),CHI-Square and Mutual Information.These methods assume that each word is independent of each other,ignoring the redundancy between feature words.For the SVM-based text classifier,the performance of classification is degraded as the SVM's single-core kernel function can not completely match the data distribution.In addition,most traditional machine learning algorithms are classified as the shallow model.As a result,many problems,such as lossing feature information,easily occur because of the larger training sets and longer tests.Meanwhile,the feature selection methods which based on traditional statistical methods will increase the influence of noise.Therefore,this thesis will utlize the self-learning characteristics of neural networks in deep learning to reduce the impact of human factors.To address the above problems in text classification,the main innovations are as follows:1.Considering the fact that the redundancy between feature words is excluded in the traditional feature extraction method,this thesis proposes a two-stage text feature selection algorithm by combining IG and the improved Minimal Redundancy Maximal Relevance(MRMR)standard.Firstly,the algorithm uses IG to extract the feature set with strong class correlation.Then,the weight of mutual information between the feature and the class in MRMR algorithm is dynamically changed by the difference of class difference method.As a result,the optimal feature subset is obtained.Finally,the simulation results indicate that the proposed algorithm has better feature representation ability than the traditional feature selection algorithm.As certain precisions have achieved,the required number of features would be less.2.To improve the performance of SVM in text classification,the thesis innovatively introduces the hybrid Fourier kernel function,and proposes a SVM text classification model based on this function.In The process not only give the equation of hybrid Fourier kernel,but also prove the rationality of the proposed kernel function.The experimental simulation results reveal that the new function greatly reduces the training quantities and improves the final text classification accurancy.3.Traditional machine learning algorithm requires manual screening features and long training time during training.To solve these problems,this thesis proposes a short text classifycation model based on the attention mechanism(ABLGCNN),which adopts the attention mechanism in convolution model to connect the Long-short Term Memory(LSTM)and the Gated Recurrent Unit(GRU)in parallel.The results demonstrate that the proposed model has significant advantages in final classification accuracy and convergence speed.
Keywords/Search Tags:Text classification, Feature extraction, Support vector machine, Kernel function, Deep learning, Attention mechanism
PDF Full Text Request
Related items