Font Size: a A A

News Keyword Extraction Research And Implementation

Posted on:2020-09-08Degree:MasterType:Thesis
Country:ChinaCandidate:M TianFull Text:PDF
GTID:2428330596981809Subject:Computer technology
Abstract/Summary:PDF Full Text Request
The extraction of News keywords can quickly help users to locate the central idea of News,understand the general content and give users the basis for judging whether to carefully view,thereby improving the user experience.However,manual labeling of keywords can guarantee quality,but it is difficult to complete massive amounts of News data.This paper analyzes the characteristics of News texts,develops research on News keywords extraction,and implements a practical News keywords extracting system.Experiments show that the keywords extracting algorithm of this paper is superior to the traditional pattern in terms of performance and accuracy.This paper focuses on the word-frequency statistics,word-based model and topicmodel based keywords extracting method.Starting from these three aspects,the TFIDF algorithm,Text Rank algorithm and LDA topic model algorithm are explored and improved.Since there is no public keywords dataset in the existing resources,this article crawls Netease News to build a corpus and keywords tests set through crawler technology,and manually cross-labels the keywords of the tests set.Finally,a News keywords extracting system is implemented,and the improvement of News keywords extracting is applied to this system.The system has the characteristics of simple and generous,easy to operate and fast response.In order to improve the accuracy and efficiency of News keywords extracting,this paper has made several innovations around the above three aspects of keywords extracting.The inverse document frequency of the traditional TF-IDF algorithm introduces the probability increasing of some rare words,introduces Zipf's law to suppress this problem,and the chi-square tests to add the subject factor to the calculation of the weight.Through the design experiment,the results show that the accuracy and efficiency of the improved keywords extracting have been significantly improved.Aiming at the advantages and disadvantages of these three extraction methods,the model fusion idea in machine learning is mentioned.Two fusion methods,waterfall fusion and parallel combination fusion,are adopted.Five experimental schemes are designed.The final experimental results show that TF-IDF algorithm and Text Rank have a better effect in algorithm waterfall fusion.The combination of TFIDF algorithm and LDA algorithm is better than other combination schemes,and the accuracy and efficiency of the extraction algorithm after fusion are significantly improved compared with the single algorithm.At the end of the paper,the problems encountered in the process of research and implementation of News keywords extracting and their own shortcomings were summarized,and the further improvement and research of the system were further prospected.
Keywords/Search Tags:News, Keyword, Fusion, Data, Algorithm, Extraction
PDF Full Text Request
Related items