Font Size: a A A

Research On Chinese Text Classification Based On Deep Learning

Posted on:2019-09-05Degree:MasterType:Thesis
Country:ChinaCandidate:C B SongFull Text:PDF
GTID:2428330623468771Subject:Engineering
Abstract/Summary:PDF Full Text Request
Natural language processing is the use of computer to deal with text,mainly includes the analysis of the text(for example: emotional analysis,text classification),extract the key information(e.g.keywords fetching)and use different ways to represent the same text information(such as machine translation).As mobile Internet,new media and social network platform of explosive growth,in a network with a lot of the lack of effective information organization but has the research value of the text,and text classification as one of the key technology of natural language processing,can effectively solve the problem of information clutter and is widely used in the search engines,spam filtering,personalized news and information classification,etc.Deep learning has been widely used in natural language processing,However,because of the special nature of natural language structure,texts have a certain correlation in the context.Compared with image processing and speech recognition,there is still a big gap between the depth,lead to the existing neural network model for text categorization,the classification effect is not ideal,at the same time,as a result of the statement in Chinese and English sentence structure is different,lead to the common method to Chinese word segmentation relative to the original statement after word segmentation ambiguity,also to a certain extent,affected the accuracy of text classification.In view of the above problems,this paper studies and improves Chinese segmentation,text expression and classification model.In the part of Chinese word segmentation,this paper presents a Chinese word segmentation method based on bidirectional LSTM,using the records of the history of arbitrary length of hidden layers on the advantages of LSTM,combining words tagging method at the same time,make effective use of context in the process of scoring word history relationship,to a certain extent,reduces the segmentation of ambiguity;In the neural network classification model,the text may have a strong context dependent relationship with the previous text because of the natural language phenomenon such as inversion and advance,which leads to the existing problems of the existing text classification models,such as the difficulty in determining the size of the convolution kernel,the high dimension of the text vector and so on,and these text classification models and image processing and language.Compared with the field of voice recognition,its shallow depth is difficult to extract the deep and complex features of the text,resulting in the final classification effect is not ideal.In order to solve the above problem,this paper proposes a deep convolution neural network model.In this model,the word embedding layer is introduced into the word vector.The word embedding is a vector which can use the low dimension vector and characterizing the relationship between words and words.The vector of relationship between words,after input into the neural network model,can be well extracted features,and then effectively improve the classification effect of the text,while incorporating the Batch Normalization and Res Nets in the CNN model.Shortcut effectively solved the problem of the disappearance of gradients(Vanishing Gradients)and the decline in classification accuracy as the depth of the network layer increases.In view of the relationship between the natural language and the long text,This paper proposes a a hybrid model of deep convolution neural network and Long Short-Term Memory network(LSTM),In the process of feature extraction,the hybrid model has the ability to retain both the historical information and the information of the previous text,which makes up the shortage of CNN.Finally,the CNN model and the CNN and LSTM hybrid models are used for classification experiments.The experimental results are compared with other classification models,the accuracy of text classification has been significantly improved...
Keywords/Search Tags:text classification, CNN, LSTM, word embedding, ResNets
PDF Full Text Request
Related items