Font Size: a A A

Research On Chinese Automatic Summarization Based On Keyword Filtering And Text Structure

Posted on:2019-01-03Degree:MasterType:Thesis
Country:ChinaCandidate:X T SunFull Text:PDF
GTID:2428330566977998Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the development of the Internet and the information society,the amount of electronic information on the network has rapidly increased.How to accurately and timely obtain effective information has become more and more important.Abstract,as the summary of the core content of literature,can help people quickly and efficiently mine useful information,and improve the identification and reception efficiency of information.Therefore,in recent years,automatic summarization technology has gradually become research key point in the field of NLP.Firstly,this paper tells the definition of abstracts,the domestic and foreign research status of automatic abstracts,the main research methods and key technologies.Then,aiming at the relatively important link of key words extraction in automatic abstracting method,proposed a keyword extraction method based on Word2 vec and an automatic summarization algorithm KS-TextRank based on keyword filtering and text structure:(1)By training word vector model with Word2 vec tool,obtain the similarity between words,then by using similarity to optimized the initial weights of graph nodes,merge the synonyms,produce a better keyword set,it makes the association between words more comprehensive,and corrects the shortcomings of the lack of relevance between terms in traditional keyword extraction methods.(2)Use improved method to extract the high quality keyword set,filter out the irrelevant sentences in the candidate abstracts sentence set,and improve the subject relevance level and accuracy of the candidate sentence set.At the same time,according to the position information of sentence itself and the similarity information of chapter title,optimize the weight of the sorting algorithm,improve the quality of automatic summarization.In the last,this paper uses the paper corpus been manually collected and processed and the single document automatic summarization corpus of the information retrieval research center of Harbin Institute of Technology,analyze three abstract extraction method after detailedly compare these aspects such as Original abstract similarity,accuracy,recall rate,uniformity of abstracts,etc.The first is the traditional TextRank automatic summarization algorithm,the second is the KS-TextRank algorithm which uses co-occurrence window to extract keywords,and the third is the KS-TextRank algorithm which uses word vector to extract keywords.Experiment shows that compared with the traditional TextRank algorithm,the KS-TextRank algorithm which been proposed by this paper has significantly improved the quality of abstracts,and the word vector based on keyword extraction method which been proposed by this paper is also better than the traditional method.
Keywords/Search Tags:automatic summarization, word vector, keyword extraction, text structure, Sentence features
PDF Full Text Request
Related items