Font Size: a A A

Research On Text Representation And Classification Based On Deep Learning

Posted on:2020-04-28Degree:MasterType:Thesis
Country:ChinaCandidate:S C LiangFull Text:PDF
GTID:2428330599477361Subject:Control engineering
Abstract/Summary:PDF Full Text Request
With the wide application of information technology in people's daily life,text information is growing exponentially.How to effectively manage massive text information and quickly understand the value of text information has become the focus of research.Among them,the text representation and text classification is the key technology of text information management.Traditional text representation uses statistical methods,without considering the semantic information,assuming that words are independent of each other,the extracted text feature data is sparse and high-dimensional,and a large amount of text information is lost.Nowadays,the semantics of text information is rich,and the diversity of themes poses a higher challenge to text classification.Especially in the face of the classification problem of long text,the traditional shallow text categorizer has generalization ability and can not meet the classification management requirements.The special structure between the deep learning layer and the layer can extract advanced features from shallow or middle features,which can solve the above problems in text classification,and provide support for the accurate extraction of text representation content and the accurate construction of text classification model.This paper applies text categorization technology,and on the basis of studying various deep learning algorithms,applies deep learning effectively to text representation and text categorization,and further research is carried out.The main work is as follows:(1)The improved Fasttext model is proposed for Chinese long text classification,which can solve the problem that the Fasttext model loses too much information of text context in the complex long text classification.Through the experimental analysis of THUCNews data set,the improved Fasttext model not only ensures the accuracy of text classification,but also reduces the speed of word vector training.(2)Aiming at the problems such as long training time and unsatisfactory classification result caused by text representation of word vectors in feature extraction for long texts,an unsupervised learning method was proposed to generate text sentence vectors based on PV-DM model research,so as to realize sentence-level text analysis and improve the analysis speed of long texts.(3)Aiming at the difficulty in extracting semantic key features and poor classification effect of long text classifier,a BGRU-CNN hybrid model based on circular neural networkand convolutional neural network was established to achieve accurate classification of long text.The training set of THUCNews data set and SogouC data set was used to respectively train the BGRU-CNN mixed model and conduct test experiments,which were compared with CNN,LSTM,GRU,B-LSTM,B-GRU,five text classification models.The effectiveness of the bgru-cnn hybrid model is proved by experimental comparison and analysis.This paper proposes an improved Fasttext model text representation method and BGRU-CNN text classification method for the classification of complex Chinese long texts,which not only provides a solution in theory,but also has guiding significance in application.Figure 32,table 9,the reference of the 64.
Keywords/Search Tags:Text Information, Text Representation, Text Classification, Deep Learning, Long Text
PDF Full Text Request
Related items