Research On Chinese Text Classification Method Based On Long And Short Term Memory Network

Posted on:2022-08-14

Degree:Master

Type:Thesis

Country:China

Candidate:H O Chen

Full Text:PDF

GTID:2518306329951209

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

In order to improve the accuracy of text classification,improvements are made in two aspects: data preprocessing and text classification models.First,the word embedding method is improved when the data is preprocessed,and then the text classification model is improved on the basis of the existing deep learning model.Based on the above two aspects,the performance of text classification is improved.The specific research results of the thesis are as follows.First,in terms of text word embedding,through the fusion of the Bert algorithm and the Word2 Vec algorithm,a new word embedding method is proposed: the BW word embedding method,which combines the advantages of the two algorithms: the Bert algorithm uses the Transformer structure,compared to other methods Embedding methods can be more efficient,can capture longer-distance dependencies,and can better obtain contextual information.Compared with the previous word embedding method,it achieves two-way deep acquisition of information in a true sense.The Word2 Vec algorithm has fewer dimensions than the previous Embedding method,so it is faster;it can obtain information statically,which is more versatile,and can be used in various natural language processing related tasks.At the same time,it makes up for the shortcomings of the two algorithms: the understanding of the full text through the Bert algorithm solves the shortcomings of the Word2 Vec algorithm that cannot solve the ambiguity of a word and cannot be dynamically optimized for specific tasks.The shallow word embedding of the Word2 Vec algorithm solves the problem of the Bert algorithm overly concealing information when making a mask,which causes the word meaning to be unable to be expressed correctly.Combining the two word embedding methods,a better-performing word embedding method can be obtained.Second,in terms of text classification models,this paper proposes an improved LSTM model for the problem that the traditional long-short-term memory network(LSTM)cannot automatically select the most important latent semantic factors in text classification.First,the traditional LSTM operation relationship is extended to a two-way mode,so that the network fully memorizes the before-and-after relationship of the input feature words;then a pooling layer is added in front of the output layer to better select and find the text classification effect among many words Raise the most important underlying semantic factor.Combining the above two aspects of improvement,the model proposed in this thesis has successfully improved the success rate of classification.

Keywords/Search Tags:

Natural Language Processing, Text Classification, Recurrent Neural Network, Long and Short Term Memory Neural Network

PDF Full Text Request

Related items

1	Long Short Term Memory Recurrent Neural Network Application To Handwritten Recognition
2	Research On Automatic Answering Technique Of English Test
3	Natural Language Inference Based On Seq-Tree Encoder With Syntax Information
4	Text Classification Research Based On Deep Neural Network And Attention Mechanism
5	Design Of A Blind Equalizer Based On Long Short-term Memory Neural Network
6	Research On Quantization Methods Of Weight And Gate Parameters In Lstmneural Network Model
7	Research On Sign Language Recogniton Method Based On Convolutional Neural Networks And Recurrent Neural Networks
8	Intelligent Device Text Classification Method Based On Natural Language Processing
9	Research On Short Text Sentiment Classification Model Based On Deep Learning
10	Improved Recurrent Neural Network Method And Its Application