Font Size: a A A

Automatic Chinese Text Summarization Method Based On Convolutional Neural Network

Posted on:2018-07-14Degree:MasterType:Thesis
Country:ChinaCandidate:L YuFull Text:PDF
GTID:2348330533469443Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Nowadays,data on the Internet shoot up very fast.On big data era,the automatic text summarization of a large-scale long text data is of great significance to people who can quickly obtain the necessary information from massive data.Previous research on automatic summarization mostly focused on the small text data set,which is difficult to meet the needs of the current big data era.Besides,the lack of a large-scale long text summarization dataset in the past restricted the application of deep learning method in automatic summarization task.Based on this,in this paper,we constructed a large-scale long text summarization dataset,with the corpus,we used convolutional neural network model and LSTM model to realize the automatic acquisition of Chinese long text summarization.In view of the lack of a large-scale long text summarization dataset which restricted the research on automatic text summarization technology of Chinese long text,through the research of the network data,this paper established a micro-blog crawler and webpage text extraction algorithm with Sina micro-blog as a platform.Crawling the micro-blog with links released by media users,and extracting the corresponding content of the links,after denoising and filtering,we finally construct a large-scale Chinese text summarization dataset with 200,000 pairs of micro-blog and the corresponding content of the original text,according to the data label task in experimental process,we built a label system which can highlight the common words of micro-bloc and corresponding original text.In view of the problem that previous automatic text summarization methods obtain the low efficiency,poor performance,and it is difficult to meet the need of large-scale text automatic text summarization task on big data era.we based on the construction of the large-scale dataset,and made a deep research on the application of deep learning in the field of Natural Language Processing deep learning,putting forward two kinds of deep learning methods respectively based on LSTM model and convolutional neural network model,realizing the automatic text summarization of Chinese long text.For the method based on LSTM model,the original text and its sentences were input into the LSTM model as two sequences represented with word vector,after calculated by the LSTM layer and mean pooling layer,we can get two feature vectors which can represent the semantic of the original sentence sequences,calculating matching probability of them in logistic regression layer,according to the matching probability level,we would get the summarization.For the method based on convolutional neural network,respectively,the original text and its sentences were represented with word vector matrix,through convoluting and max pooling,we finally got two feature vectors which can represent their semantic information,through the nonlinear fully connected neural network,scoring the matching of the two feature vectors,the sentences whose score are higher as the summarization.In order to verify the performance of the automatic text summarization method s mentioned in this paper,we labeled 1000 original articles and the corresponding micro-blog manually from the constructed corpus as the test dataset of each experiment,and used the ROUGE method to evaluate the results.The evaluation results showed that,compared with the traditional methods,deep learning methods have many advantages in text semantic representation,The experimental results obtained based on convolution neural network method are better than the LSTM model method,improving the intelligence and quality of text summarization.
Keywords/Search Tags:automatic text summarization, Chinese text summarization dataset, deep learning, convolutional neural network
PDF Full Text Request
Related items