Font Size: a A A

Research Of Short Text Classification Based On Improved Convolutional Neural Network

Posted on:2019-03-01Degree:MasterType:Thesis
Country:ChinaCandidate:Y L GaoFull Text:PDF
GTID:2428330548959132Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
With the continuing development of the internet,especially the popularity of social media,the way of people's life and thinking has changed a lot,every individual serves as the role of data production and transmission,which makes the internet data explosive growth.In the era of big data,a new genre of text,short text,has become the most popular way of expressing personal feelings,suggestions and opinions.Short text data reflects the ideas and judgements to social phenomena,goods,and also summarizes the personal routine and experience,thus it contains massive general rules which have important reference values to both enterprises and individuals.Compared to traditional document,short text has a few unique features: 1)short length of the text;2)strong sparseness;3)immediacy;4)nonstandardization.It is difficult for traditional machine learning methods to deal with short text classification mainly because too limited features can be effectively extracted.Deep learning network has made a huge success in the field of computer vision and audio recognition,but not of natural language processing.However,with much more attention being paid to by researchers,many short text classification models have been proposed in recent years which achieve satisfactory results.In this paper,we propose two improved short text classification models in extracting characteristics of the features from data in the CNN model.The main contents are as follows:1.Short text classification model based on sparse and self-taught convolutional neural network: In the current classification model,the input of the convolutional layer is usually artificial,which often requires prior knowledge about the data,and there is some interference from human factors,although it has achieved good results,it often deviates from the optimal value.In the proposed model,self-taught strategy is added to the convolutional layer by letting the node learn the combination from the input of the previous layer,eliminating the human factors;during the training process,using L1,L2 norm to make most nodes of convolutional layer depressed and only a few nodes activated,the complexity of the proposed model can be effectively decreased,on the contrary,the accuracy of the proposed model can be effectively increased;2.Short text classification model based on integrated deep network: In this proposed model,the short text is considered as the sequence of words,and thus there is complex connection between the words.Using the recurrent neural network to abstract the complex dependence between the words of the text based on extracting characteristics of the features from data in the CNN model,the sematic information is captured as well in the meantime.Experiments on the open corpus datasets and comparisons between the current models show that the proposed models are effective for the task of the short text classification.
Keywords/Search Tags:Short text, classification, CNN, RNN, self-taught
PDF Full Text Request
Related items