Font Size: a A A

Study And Application Of Deep Features Learning In Sentence-Level Text Classification

Posted on:2019-11-15Degree:MasterType:Thesis
Country:ChinaCandidate:M Q WangFull Text:PDF
GTID:2428330566960652Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Text categorization is a basic research task in Natural Language Processing(NLP),and text features representation is the first step in the classification task.The quality of its representation directly affects the performance of the classifier.Therefore,it is very important to study high quality text representation.The deep learning techniques that have been developing rapidly in recent years have demonstrated powerful features for automatic extraction and have achieved satisfactory results in most natural language processing tasks.When applying deep learning techniques to text classification tasks,there are three challenges that affect the final classification performance of the model: 1)How to get a better word vector;2)How to better extract and combine information between words;3)How to make the final sentence vector accurately and comprehensively contain the text semantics.In light of the above three opportunities,this paper has launched three tasks:First of all,this paper proposes a new gate mechanism for the combination of characters and words to obtain more abundant word vectors.In the process of word vectorization,the practice of discarding or randomly initializing a large number of out of vocabulary(OOV)by the model seriously affects the final classification performance.In order to solve this problem and to capture the morphological features of words,characters of word and word vector are considered to be combined.In this paper,we proposed a new gate mechanism to combine characters and words and build a text classification model.The work achieved a good result in the SemEval2018 task: multi-language Emoji prediction.The related system description paper was published in the SemEval 2018 Workshop.Then,we show a multi-attention mechanism for sentence representation.Due to the success of the attention mechanism on the neural translation model,it has been widely applied to various tasks.In text categorization task,because the existing attention mechanism uses single information vector to extract text features under multiple categories,there are some limitations.In this paper,we proposed a multi-category based attention mechanism matrix to extract text features from various categories.This model ranked the fourth in news headline classification task of NLPCC 2017,and the work was published in the conference of IJCNN(CCF-C)2018.Finally,we proposed a sentence centers mechanism to optimize text representation.In the text representation space,samples of the same category should have similar representation vectors,and similar samples should be grouped together.So in this paper,we proposed a neural network framework constrained by global category center vectors to optimize text representation and classify texts.The category center vector proposed in this paper is used to assist the neural network model to extract the category features of the text from the category-global perspective.This part of work was published in the conference of PAKDD(CCF-C)2018.We completed a large number of experiments on several truly text classification data sets(SST2,Yelp13,Yelp14,NLPCC2017,Twitter Emoje,etc.)and different text classification tasks.Adequate experiments show that the three models proposed in this paper can handle the three corresponding challenges well and obtain better classification results.
Keywords/Search Tags:Deep Learning, Sentence Representation, Text Classification, Attention, Gate
PDF Full Text Request
Related items