Font Size: a A A

Research On Chinese Text Classification Method Based On Long And Short Term Memory Network

Posted on:2022-08-14Degree:MasterType:Thesis
Country:ChinaCandidate:H O ChenFull Text:PDF
GTID:2518306329951209Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
In order to improve the accuracy of text classification,improvements are made in two aspects: data preprocessing and text classification models.First,the word embedding method is improved when the data is preprocessed,and then the text classification model is improved on the basis of the existing deep learning model.Based on the above two aspects,the performance of text classification is improved.The specific research results of the thesis are as follows.First,in terms of text word embedding,through the fusion of the Bert algorithm and the Word2 Vec algorithm,a new word embedding method is proposed: the BW word embedding method,which combines the advantages of the two algorithms: the Bert algorithm uses the Transformer structure,compared to other methods Embedding methods can be more efficient,can capture longer-distance dependencies,and can better obtain contextual information.Compared with the previous word embedding method,it achieves two-way deep acquisition of information in a true sense.The Word2 Vec algorithm has fewer dimensions than the previous Embedding method,so it is faster;it can obtain information statically,which is more versatile,and can be used in various natural language processing related tasks.At the same time,it makes up for the shortcomings of the two algorithms: the understanding of the full text through the Bert algorithm solves the shortcomings of the Word2 Vec algorithm that cannot solve the ambiguity of a word and cannot be dynamically optimized for specific tasks.The shallow word embedding of the Word2 Vec algorithm solves the problem of the Bert algorithm overly concealing information when making a mask,which causes the word meaning to be unable to be expressed correctly.Combining the two word embedding methods,a better-performing word embedding method can be obtained.Second,in terms of text classification models,this paper proposes an improved LSTM model for the problem that the traditional long-short-term memory network(LSTM)cannot automatically select the most important latent semantic factors in text classification.First,the traditional LSTM operation relationship is extended to a two-way mode,so that the network fully memorizes the before-and-after relationship of the input feature words;then a pooling layer is added in front of the output layer to better select and find the text classification effect among many words Raise the most important underlying semantic factor.Combining the above two aspects of improvement,the model proposed in this thesis has successfully improved the success rate of classification.
Keywords/Search Tags:Natural Language Processing, Text Classification, Recurrent Neural Network, Long and Short Term Memory Neural Network
PDF Full Text Request
Related items