Font Size: a A A

Research On The Article Title Generation System Based On Clustering And Neural Network

Posted on:2021-05-16Degree:MasterType:Thesis
Country:ChinaCandidate:S S ZhangFull Text:PDF
GTID:2428330647467276Subject:Control engineering
Abstract/Summary:PDF Full Text Request
With the development of the Internet,the text information resources in the network are growing exponentially.However,it also contains a large number of non-standard text information at the same time,such as a large number of untitled microblogs,and many self-Media platform "title party" articles,etc.To create an accurate title for these non-standard texts has become a challenging work.The traditional manual summary and compilation method will have a huge workload and consume a lot of manpower and time.In engineering application,the automatic text summarization technology provides an economic and efficient solution for generating an accurate,concise and appropriate article title.Article title generation task is a variant of text summary task,which takes the content of the article as the input and the title of the article as the output.It is an important research direction in the field of natural language processing.However,dfferent from the relatively simple expression in English language,Chinese grammar rules are unique and the part of speech is variable.These factors affect the accuracy of Chinese title generation and are prone to the problem of "unlisted words",so there are relatively few researches on Chinese title generation.Based on this background,this paper studies the Chinese article title generation system based on the generative automatic text summarization technology.The main work of this paper is divided into the following three aspects:(1)Research on Mul-CBOW: A word vector model integrating the co-occurrence of Chinese parts of speechFirstly,in the aspect of text vector-representation,considering the unique part of speech changes in Chinese,a word vector model Mul-CBOW is proposed,which integrates the part of speech co-occurrence relationship in Chinese.The title generation experiments based on different word vector models show that Mul-CBOW model can improve the fluency of title generation.(2)Research on TGMCN: A model of article title generation based on clustering and neural networkSecondly,in order to make the generated title more in line with human's writing habits,the DBSCAN clustering algorithm is combined with the Encoder Decoder title generation network model which have attention mechanism to make up the TGMCN model.At the same time,based on the original content,the TGMCN model builds a prior distribution dictionary to alleviate the problem of "unlisted words" and optimize the TGMCN model's title generation effect.Test experiments on LCSTS data set showed that the evaluation indexes of ROUGE-1 and ROUGE-L of TGMCN model were improved to 35.43% and 30.95% respectively,which proves that the TGMCN model can effectively improve the accuracy and fluency of title.(3)The design and implementation of the prototype system of title generation.Finally,on the basis of the above research,designed and implemented the prototype system of title generation and the effectiveness of prototype system verified by the system functions shown in detail.
Keywords/Search Tags:deep learning, natural language processing, title generation, text clustering, neural network, prototype system
PDF Full Text Request
Related items