Font Size: a A A

Research On Text Classification Based On Multi Word Vector Integration And Neural Network

Posted on:2019-07-02Degree:MasterType:Thesis
Country:ChinaCandidate:Y PengFull Text:PDF
GTID:2428330563453729Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Text Classification is an important research field in Text Mining and Natural Language Processing.It aims to classify text into one or more categories that are predefined,and many different applications can eventually be transformed into classification problems.The traditional text classification methods often suffer from data sparsity and word order lost.Traditional classifiers often have problems such as poor generality and difficulty in adjusting parameters.In recent years,deep learning technology based on neural networks brings new ideas for text classification.Based on the current research situations about the text classification method,this paper propose a text classification method based on neural network.The main work of this paper is as follows:First,we comb the research status of text classification,and focus on the related technology of text classification based on neural network,and expounds its theoretical basis in detail.We compare the difference between traditional text classification and neural network text classification,then summarize the structure of related neural network for text classification,and finally introduce various word vector representation technology.Second,we proposed a new neural network text classification model based on multi word vector integration.This model can integrate multiple word vectors,and use the abundant word sense information contained in them to generate high quality text representation.The model consists of four modules,which are input,text representation vector generation,text representation vector correction and classification.The model uses a variety of word vectors to initialize the input layer and no longer relies on the traditional text feature representation to avoid the problem of data sparsity.At the same time,through a specific network structure(such as convolution),the model can also effectively learns the word order and context information of the text.In order to integrate all kinds of word vectors more reasonably,an adaptive correction strategy is used to modify the text representation generated by each word vector.This ensures that the final generated text representation can express the meaning of the original text accurately and improve the accuracy of the classification.The experimental results on a number of Chinese and English classification data sets show that the model can achieve good classification results and is superior to multiple benchmark models.Third,we design and implement the text representation vector adaptive correction strategy.Due to the difference between training model and corpus,different versions of word vectors can model different words' semantic information,which makes different word vectors have different contributions to a specific classification task.Therefore,it is necessary to distinguish the word vectors and adjust their impact on the final text representation.In this dissertation,two kinds of text representation vector correction strategies are proposed,which are Highway network based and attention based correction strategy.The correction strategy is input by text representation generated by each word vector.Then,the representation vectors are weighted by a specific network structure,and finally the final text representation is generated by merging each representation vector.The correction strategy makes the model have the ability of distinguishing.The important word vector will have more influence on the final text representation,while the influence of the unimportant word vector will be weakened,so as to avoid its negative impact on classification.The experimental results show that the two correction strategies can effectively improve the classification accuracy of the model.
Keywords/Search Tags:Text Classification, Word Embedding, Neural Network, Highway Network, Attention
PDF Full Text Request
Related items