Font Size: a A A

Research On Deep Learning Text Classification Based On Fusion Topic Features

Posted on:2018-07-25Degree:MasterType:Thesis
Country:ChinaCandidate:W L MaoFull Text:PDF
GTID:2428330623950915Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
The rapid development of Internet technology makes the exponential growth of text data.How to effectively analyze and utilize these data to fully discover the value contained in it is the primary task of text big data analysis and processing,and the text classification is an important branch.Most of the tasks of information retrieval,information recommendation and text mining should be based on text classification.Although the technology of text categorization has been around for a long time,it has been very difficult to make a big breakthrough in this field due to the limitations of past data volume,learning models and hardware devices.Currently with the development of deep learning technology,big data processing is moving in an efficient and high-precision direction.However,there are more challenges and problems that come with it.Although the traditional text classification technology has been initially mature,but the face of unbalanced text,data stream text support is not enough.In addition,while large amounts of information and data on smart terminals are now available,most of these data are unorganized,untagged data.How to effectively use these untagged data to discover valuable information is still in the preliminary research stage.With the maturation of deep learning technology,textual representation of features and the processing of texts have all undergone new changes.How to give full play to the advantages of these text-based approaches based on deep learning technology to more fully explore the semantic information of texts needs to be studied.In this paper,in view of the inadequate expression of the text word features and the unfocused use of the unlabeled text data,the following work has been mainly done:(1)First of all,based on the semantic representation of the text,a topic vector representation method based on the topic model is proposed to supplement the global semantic loss of word2 vector.Based on the variational Bayesian inference method,the LDA parameter estimation of the topic model is implemented.And according to LDA model "topic-word" matrix to extract the topic vector,to facilitate the next text classification.The text classification models base1-CNN,base2-CNN,TE-1 and TE-2 based on neural network are constructed by using Tensorflow framework.The two tasks of topic classification and emotion classification are tested respectively.The neural network text classification method based on word vector fusion has achieved good results both in test accuracy and in convergence speed.At the same time,the effect of base2-CNN also verifies that the topic vector only contains part of the semantic information,and the problem of absence of the semantic information exists.Therefore,the topic vector can only serve as an auxiliary embedded vector representation method.(2)Aimed at the problem that text categorization task contains a lot of untagged text and can not make full use of its semantic information,a textual representation method based on word2 vector and a text classifier based on convolutional neural network are proposed.A semi-supervised text classification network d-CNN.The network is also built on the Tensorflow framework,which uses both "virtual" tag information for tag-less text and tags for tag text to train both CNN networks and training the respective weight parameters for both networks at the top level.In this way,the utilization efficiency of label-based text semantic information can be improved,and the accuracy of text classification can be effectively improved.The test results on d-CNN networks in the emotion classification and topic classification tasks show that the d-CNN model has better effect than the full-supervised deep-learning text classification model.
Keywords/Search Tags:text classification, deep learning, topic model, semi-supervised classification
PDF Full Text Request
Related items