Font Size: a A A

Literature Analysis Based On Convolution Neural Network

Posted on:2018-07-24Degree:MasterType:Thesis
Country:ChinaCandidate:Z C LiFull Text:PDF
GTID:2348330536459570Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the rapid growth of the amount of text information on the Internet,it takes more and more time for people to consult and sort out the papers on the internet.In this context,using text clustering technology to classify the massive literature,has a very important application prospect and research significance.Text clustering is an important technique of text mining,which can be widely used in text mining and information retrieval etc.,it has important value for large scale text set organization and browsing and text set generation level classification.The first problem of text clustering is how to express the text data in mathematical form.At the same time,the traditional text clustering algorithm ignores the semantic correlation between the words in the text,and the traditional clustering algorithm has unstable clustering results.In this paper,we focus on these problems of text clustering mentioned above.In this paper we use Chinese Sougou corpus of text data as the experimental data set,and use of word2 vec tools to convert word vectors,we use Convolution Neural Network to extract features of text data.An improved KSDM clustering algorithm based on K-means clustering methods is proposed to realize the classification of the literature.The main work of this paper is organized as follows:1.In this paper first discusses the significance of text clustering algorithm and research status at home and abroad;Analyze the shortcomings of traditional text clustering algorithms;2.Studies several commonly used text clustering algorithm,the basic principles of Convolution Neural Network,as well as the transformation of the word vector and the basic principles of word2 vec tools.3.The text feature extraction method based on convolution neural network is designed.The convolution neural network model is built and the parameters of the convolution neural network are selected.By experiments the effectiveness of the text feature extraction method based on convolution neural network is verified.4.The improved KSDM clustering algorithm based on K-means is designed.Based on the traditional K-means algorithm,a new outlier detection algorithm and a new clustering center selection algorithm are proposed.Experimental results demonstrate the effectiveness of the KSDM algorithm.5.On the basis of theoretical research,in this paper combines word2 vec tools,convolution neural network and KSDM clustering algorithm,In addition I also proposes a document analysis framework based on convolution neural network.First of all,we need to segment the text data,remove the stop words,and make the process of converting wordvectors,Then we may get the word vector storage vector matrix and these vectors can be given to the pre-trained convolution neural network as input to extract text features.Finally,we apply the obtained features in clustering by using KSDM clustering algorithm to achieve the classification of the test literature.Experimental results show that the proposed algorithm which has high scalability and flexibility can effectively improve the accuracy of the existing text clustering algorithm,and has high scalability and flexibility.
Keywords/Search Tags:convolution neural network, Word2vec, Text clustering, Feature extraction
PDF Full Text Request
Related items