Font Size: a A A

Research On Methods Of Text Representation And Classification Based On Deep Learning In Information Retrieval

Posted on:2020-07-07Degree:MasterType:Thesis
Country:ChinaCandidate:A D XuFull Text:PDF
GTID:2428330590971808Subject:Control Science and Engineering
Abstract/Summary:PDF Full Text Request
Text representation and classification technology are prerequisites for supporting high-quality information retrieval.Now,high sparse,high-dimensional text features and low retrieval accuracy are the main problems in text representation and classification methods of information retrieval.In order to retrieve the required information efficiently and accurately,the construction of text representation and classification methods with excellent performance have become one of the research hotspots in the field of information retrieval.This paper has carried out in-depth research on method of multiclass text representation and classification ? multi-label text representation and classification respectively.The main study is as follows:(1)The traditional multi-class text representation and classification method based on BOW has the inherent disadvantages of high sparseness and high dimension.To solve this problem,this paper proposes a Deep Belief Convolutional Neural Network(DBCNN)that combines Deep Belief Network(DBN)with Text Convolutional Neural Network(TextCNN).Firstly,the DBCNN reduces the dimension of the text features under the premise of retaining text information through DBN pre-training.Further,getting lowdimensional and dense text high-level feature vector representation through convolution processing and pooling processing for initial dimensionality text feature.The results of the study show that the multi-class text representation and classification performance of the DBCNN model is better than the traditional methods,and the accuracy rate is improved by 6.18% on average;the keyword word vector embedding can effectively improve the performance of the model than the common word vector embedding;the number of nodes of pre layer in the DBN structure are closer to the number of input vocabularies,the better performance of the DBCNN model is;Adding the L2 regularization and sliding average model can improve the classification accuracy of the DBCNN model effectively.(2)Aiming at the inherent problems of traditional multi-label text representation and classification methods with low search accuracy and high hamming loss.This paper proposes a Bi-Long Short Time Convolutional Neural Network(Bi-LSTCNN)that combines Bi-Long Short-Term Memory(Bi-LSTM)with TextCNN.Firstly,getting the context feature vector of text through Bi-LSTM.Secondly,obtaining richer text fusion feature vector by fusing the context feature vector with model input;Finally,this paper utilizes TextCNN to reduce the dimension of the text fusion feature vector and obtain the text high-level feature vector representation.The results reveal that the Bi-LSTCNN model has better multi-label text representation and classification performance than the traditional methods.The accuracy rate is increased by 9.4% and the hamming loss is reduced by 0.374 averagely;Adding the L2 regularization and sliding average model can improve the classification accuracy of the Bi-LSTCNN model effectively.(3)With the increase of label sets,the output space of multi-label text classification increases exponentially,resulting in the difficult for the classifier to obtain accurate label sets.To solve this problem,this paper proposes a multi-label classification strategy of layering label tree that is introduced in the Bi-LSTCNN model to improve the performance of Bi-LSTCNN model.It is found that the Bi-LSTCNN model with layering label tree is better than the Bi-LSTCNN model with no layering label tree.The recall rate,accuracy and F1 value are improved 2.2%,2.9%,2.5% respectively and the hamming loss is reduced by 0.187.
Keywords/Search Tags:text representation and classification, search accuracy, layering label tree, DBCNN, Bi-LSTCNN
PDF Full Text Request
Related items