Font Size: a A A

Chinese Single Document Abstract Research Based On Doc2Vec And Improved TextRank

Posted on:2020-12-06Degree:MasterType:Thesis
Country:ChinaCandidate:X T XuFull Text:PDF
GTID:2428330620951754Subject:Computer system architecture
Abstract/Summary:PDF Full Text Request
Since the 20th century,the popularity and application of computers have greatly changed people's daily lives,and brought many benefits to human beings.People can extract the knowledge they need through a large amount of information on the Internet.Automatic text summarization technology uses text to process text,analyze text,and generate summary output,enabling people to quickly access key information about text.Since the introduction of automatic text summarization technology in the 1950s,it has undergone considerable development.At present,in terms of text summaries,there have been many applications in foreign countries,and can achieve good results.The extraction of Chinese automatic abstracts appears late,and because the Chinese language itself has certain peculiarities,it is not possible to directly use the foreign methods for abstract extraction.It is necessary to study the abstract method suitable for Chinese texts.The application effect of related systems existing in China still needs to be improved,and it is of great significance for the improvement of Chinese text abstraction technology.This paper proposes the DK-TextRank algorithm,which makes good use of the characteristics of Doc2Vec,K-means clustering and TextRank algorithm.First,use the Doc2Vec tool to vectorize the sentences in the text;then use the binary K-means clustering algorithm to perform the clustering operation;finally,use the improved TextRank algorithm to sort the inside of each cluster,and finally filter out the most of each cluster.A representative sentence that produces the final summary.Finally,in order to illustrate the effectiveness of the DK-TextRank algorithm proposed in this paper,we set up the relevant experimental environment and analyzed the performance of the Chinese text abstraction system using DK-TextRank algorithm.In the course of the experiment,50,000 news reports were selected as experimental subjects.The fields of these news reports cover various aspects such as finance,sports,politics,and society.Through experiments,we conclude that the DK-TextRank algorithm in this paper has better performance and is more comprehensive for the content of the article.At the same time,the effect of DK-TextRank algorithm is compared with the effect of TF-IDF algorithm and traditional TextRank algorithm.The experimental evaluation results show that the performance of this algorithm is better than other algorithms.The above results prove that the DK-TextRank algorithm in this paper is more suitable for Chinese texts,and can achieve satisfactory results in Chinese text abstracts.
Keywords/Search Tags:Doc2Vec model, K-means, TextRank, Automatic summary, Weight influence factor
PDF Full Text Request
Related items