Font Size: a A A

Research On Text Classification Based On Deep Learning

Posted on:2021-02-04Degree:MasterType:Thesis
Country:ChinaCandidate:J Y ZhongFull Text:PDF
GTID:2428330620464108Subject:Engineering
Abstract/Summary:PDF Full Text Request
With the continuous development of the Internet,there are more and more text data on the Internet,which contains rich knowledge and information.If the text information can be classified,it is more conducive to extract useful information from these massive text data,so text classification is an indispensable part of the text processing process.The process of building knowledge map is to process the text data and mine the valuable information.As far as human beings are concerned,stylistic classification is helpful to construct stylistic awareness and to writing and reading comprehension.As for the machine,stylistic classification is also helpful for the machine to interpret,accept the information conveyed by the text and generate the text needed by the user.It is also helpful for the subsequent construction process of the knowledge map,such as the subsequent entity extraction,relationship extraction,abstract extraction,knowledge reasoning and other processes.Therefore,stylistic classification is of great significance to the construction of knowledge map.The research goal of this thesis is to divide the input text into narrative,argumentation and expository text.Massive text data in various fields can be simply divided into short text and long text.In the process of feature extraction,short text has less information and pays more attention to keyword information;long text has more information and pays more attention to the relationship between contexts.Based on the above considerations,this thesis studies the task of short text and long text respectively,and chooses the method of deep learning to solve the task.The main work and contributions of this paper are as follows:1.This thesis proposes a stylistic classification model based on stylistic features.Based on the analysis of the stylistic classification characteristics of short texts,a stylistic feature vector is designed based on the lexical and syntactic features of short texts.Because the current word vector does not make full use of the stylistic feature information of the classification category,the integration of the stylistic feature vector and the word vector enhances the amount of classification category information contained in the word embedding.Then,the convolution neural network is used to extract the features of the vectors,and a style classification model based on the style feature vector is constructed.2.This thesis proposes a stylistic classification model based on word order features for long texts.According to the analysis of the stylistic characteristics of the long text,the long text is segmented.Combined with the advantages of the BERT pre training model,the sentence vector representation of the long text is obtained by the BERT model.The bi-directional recurrent neural network can effectively identify and extract the semantic features in the text data.Therefore,the bi-directional recurrent neural network is used to learn the word order features of the text,and attention is introduced Finally,the local features extracted by CNN network are used as features supplement,and a stylistic classification model based on word order features is constructed.3.The validity of the two models is proved by comparative experiments.Based on the above ideas,an automatic style classification system is designed and implemented.The system is divided into model training part and text automatic classification part.The model training part obtains the model parameters by calling the text data in the database for training and saving.The text automatic classification part loads the corresponding model according to the length of the input text and returns to the text classification result.
Keywords/Search Tags:deep learning, feature extraction, stylistic classification, text representation
PDF Full Text Request
Related items