Font Size: a A A

Research And Implementation Of News Keyword Extraction Method Based On Semantic Clustering And Weighted TextRank

Posted on:2022-05-18Degree:MasterType:Thesis
Country:ChinaCandidate:D R LiuFull Text:PDF
GTID:2518306341952109Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the rapid development of science and technology,the speed of information dissemination and circulation is faster and faster.There are countless current events and news spreading rapidly on the Internet every day.The resources on the network are more and more abundant,but also more and more complex.In the case of limited time and energy,it is difficult to accurately find the target content from the massive data.Therefore,there is an urgent need for a means to quickly read the news summary and grasp the current hot spots,so as to save the trouble of screening news and reading the whole article,which is not only beneficial to readers,but also helpful to news work.As a series of words that briefly summarize the theme of an article,keywords not only have very practical significance for expressing the main theme of the article,but also can help us quickly understand the main idea of the text.Therefore,we need an effective keyword analysis technology to obtain effective information.At present,there are some mature keyword extraction algorithms,but they all have some disadvantages,such as low accuracy,incomplete consideration of word features and so on.Therefore,based on the idea of TextRank algorithm,this paper uses the latest deep learning model to highlight the semantic features of words,and develops an efficient and accurate keyword extraction method.Based on this algorithm,a news push system based on keyword analysis is designed and implemented.The main functions of this system include news acquisition,text processing,keyword extraction,hotspot analysis and news push.In the research,the main work of this paper in the design and implementation of the system is as follows:(1)This paper studies the method of web crawler and getting real-time news of news website,and realizes the module of getting data source in the system.This module can be started regularly to obtain the latest news in the major mainstream news websites,analyze the web pages accurately,extract and store the news data,and provide an effective data source for subsequent processing in the system.(2)This paper studies a feasible keyword extraction method,and takes it as the core function of the system.In this work,we first study the existing algorithms such as TextRank and TF-IDF.TextRank is derived from PageRank algorithm.The basic principle is to use the word graph model to spread the weight of words and sort the final weight to get keywords.In TextRank algorithm,the tendency of keyword weight propagation only lies in the frequency of words.In order to improve this situation and improve the algorithm effect,this paper proposes a TextRank keyword extraction algorithm based on semantic and statistical features.Firstly,various word vector models are studied and compared.The word vector generated by the latest deep learning model is used for k-means clustering to represent the semantic clustering features.And the TF-IDF value of the word is used to represent the statistical features of the candidate words relative to the text library.Finally,combined with the position features of the last word in the text,a new TextRank weight transfer probability matrix can be constructed.This matrix is used for iterative calculation of word graph and keyword extraction.Through the experimental simulation,compared with the traditional TextRank algorithm and TF-IDF algorithm,the algorithm proposed in this paper has obvious improvement in accuracy,recall and F1 value.Therefore,this algorithm can be used in the core function of keyword extraction in the system.(3)On the basis of the previous work,this paper uses spring boot framework to implement a news recommendation system based on keyword analysis.This system provides basic features such as text processing and keyword extraction.It can not only display the latest news and its keywords,but also analyze the news hotspots according to the frequency of recent news keywords,and push news to users according to these news hotspots,so as to meet the needs of selecting relevant or similar news based on words.
Keywords/Search Tags:keyword, deep learning, word vector model, TextRank, news hotspot
PDF Full Text Request
Related items