Font Size: a A A

Research On Long Text Classification Based On Word Embedding Technology

Posted on:2020-08-15Degree:MasterType:Thesis
Country:ChinaCandidate:Z M YangFull Text:PDF
GTID:2518306353464544Subject:Control Engineering
Abstract/Summary:PDF Full Text Request
Text classification has always been one of the most important tasks in the field of natural language processing.Due to the grammatical features of the natural participles of the Latin language system,the task of text classification has taken the lead in achieving excellent results.However,in the Chinese context,because Chinese characters do not have the characteristics of natural word segmentation,and there are complex situations such as the ambiguity of Chinese characters and the polysemy of words,many tasks of Chinese text have failed to achieve excellent results.This thesis classifies Chinese long texts as the research object,aiming to solve the problem of text classification difficulty in Chinese context based on the existing word embedding technology and further solve the problem of multiple semantics in long text.A finegrained classification problem of classification.Firstly,the typical process and principle of text classification are analyzed.The methods and techniques of text classification using deep learning are described in detail,from the prototype of neural network to technical details,such as activation function,stochastic gradient descent method and other technical details.The working mechanism of convolutional neural networks and cyclic neural networks is described in detail,and its application value in text classification tasks is explained.In order to use the tools of deep learning to carry out text classification tasks,this thesis explains in detail the root position of word embedding technology in the field of Chinese natural language processing,which lays a theoretical foundation for further introducing the use of deep learning model to solve text classification problems.Secondly,this thesis studies the traditional text classification problem,that is,each input sequence corresponds to a single category such as sentiment analysis.In the further research,the attention mechanism was introduced,and the module of the convolutional neural network was constructed to realize the attention mechanism.At the same time,a two-way long-term and short-term memory network is built as the input module of the sequence,which fully utilizes the advantages of long-term and short-term memory networks in the long-distance dependence of long texts,and cooperates with the attention score information obtained by the convolutional neural network module to obtain The final textual semantic information is used to make the final text classification.Compared with other text classification models,the best results have been achieved in the Chinese long text classification task.In addition,for the problem of complex semantics of long texts,this thesis further proposes a fine-grained text classification problem.Because in the context of long text,there may be multiple central words in a single sample,and each of the central words has its own different emotional polarity.After several sets of comparative experiments,this thesis builds a central word extraction module and opinion summarization module.All the central words with the classification value in the input sequence are automatically extracted and labeled,and the emotional polarity is classified.The model finally achieved ideal classification accuracy.
Keywords/Search Tags:word embedding, fine-grained, convolutional neural network, recurrent neural network, attention mechanism
PDF Full Text Request
Related items