Font Size: a A A

Research On Deep Learning Method Based On Word Vector Representation In Text Classification

Posted on:2020-01-13Degree:MasterType:Thesis
Country:ChinaCandidate:T E LiFull Text:PDF
GTID:2428330596994507Subject:Air transportation big data project
Abstract/Summary:PDF Full Text Request
With the popularization and promotion of Internet technology in human daily life,a large number of industries such as social media and mobile intelligence have emerged,and human life and ways of thinking have also undergone tremendous changes.Among them,text classification technology,as one of the key technologies to help people manage and use text data efficiently and quickly,has always been a research hotspot in the field of natural language processing.In this paper,the deep learning method based on word vector representation is used to solve the problem of text classification.The common word vector representation technology and deep learning model are summarized systematically,and two classification methods are proposed.The main research work is as followsa?To address the problem that the neural network with word vector as input cannot make full use of text semantic structure feature information and it is difficult to effectively represent the importance of each word in sentences,this paper proposes a hierarchical semantic representation model called Bi-Directional Hierarchical Semantic Neural Network based on Self-attention.Firstly,the text word vector is trained by double-layer bidirectional LSTM to obtain the text representation and solves the problem of long-distance dependence.Secondly,the importance of each word in the sentence is effectively obtained through the self-attention by multiple aspects,and the weight of the noise words is reduced and more hidden information is got,and the performance of the classifier is improved;Aiming at the lack of effective attention to the semantic information and topic features of the mainstream model and the loss of a large amount of information by using the word-level attention mechanism to generate weighted text representations,a Two-channel model based on CNN combined with topic augmented and local BiLSTM is used in text classification tasks.The model extracts the feature representation of the discrete topic information fear through multi-scale parallel CNN and uses a localized hierarchical semantic network to obtain the interrelationship between the internal phrases of the sentence to solve the long-distance dependence problem.The key phrase is effectively highlighted by the phrase attention mechanism to optimize the feature extraction process.The topic features and text features will be merged and used as the input of the classifier,which reduces the information loss and information redundancy in the feature vector extraction process.
Keywords/Search Tags:Deep learning, Text classificationt, Attention mechanism, Hierarchical semantic representation
PDF Full Text Request
Related items