Font Size: a A A

Research On Key Technologies Of Short Text Classification Based On Deep Learning

Posted on:2021-07-03Degree:MasterType:Thesis
Country:ChinaCandidate:Z Y ChenFull Text:PDF
GTID:2518306107450234Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Nowadays,a large amount of short text data(e.g.,Weibo,reviews,search queries,customer service Q&A and etc.)has been accumulated in the mobile internet,which involves rich semantic knowledge.But,the massive text data has results in the dilemma of “ information overload and knowledge scarcity”.it is of great research and commercial value to conduct the semantic analysis(e.g.,feature extraction and pattern matching)and classification over the massive short text data to discover the implicit associations and dependences among them,so as to identify the high-level semantic knowledge that can eventually be understood by humans.But usually,the length of short text is limited by various factors,such as low-frequency co-occurrence,nonstandard language,strong context dependence,and thus conventional text categorization methods(e.g.,vector space model)cannot handle the problem of highdimensional data modeling owing to ignoring the inner semantic relationships between words,which would seriously affect the accuracy of text classification.Existing short text classification approaches focus on how to learn the implicit semantic features derived from short texts,and the well-known methods can be roughly categorized into three classes,such as recursive neural networks based tree structure,recurrent neural networks based sequence structure and convolutional neural networks based N-gram model.However,such methods have several limitations,that is,the requirement of external knowledge or prior knowledge for recursive neural network based methods,highly dependence on N-gram features,hardly effective storage and use of related dependency information and so on.To address such problems,this thesis mainly focuses on how to effectively learn the semantic representations of short text to improve the accuracy of short text classification,and therefore we propose a hierarchical representation learning framework based on "word-phrase-sentence".In the framework,we learn the phraselevel semantic features(entity)based on graph attention network and sentence-level semantic features(e.g.,emotion,sentence pattern)by the utilization of capsule neural network,which are combined with general features to improve the representation ability of data and the accuracy of text classification.Experiments conduct on realworld data demonstrate that the learned word vectors based on graph convolution model can significantly improve the classification performance of graph attention capsule neural network over intention recognition data,especially on TREC data,which is 0.948,as compared to baselines.
Keywords/Search Tags:Deep Learning, Short Text Classification, Graph Attention Mechanism, Capsule Neural Network
PDF Full Text Request
Related items