Font Size: a A A

Research On Evaluation Of News Headlines And Content Correspondence Based On Text Mining

Posted on:2019-10-08Degree:MasterType:Thesis
Country:ChinaCandidate:Z X MeiFull Text:PDF
GTID:2428330545970814Subject:Applied statistics
Abstract/Summary:PDF Full Text Request
With the development of Internet technology,online news is everywhere,not only in news clients and news portals,but also in social media such as WeChat and Weibo.However,in this era of rapid consumption,news headlines have not been spared.The "Clickbait" has repeatedly appeared,which not only undermines the credibility of the media but also is not conducive to the harmonious development of society.This article starts from the practical problem of how to identify “Clickbait”,based on text mining related technologies,proposes the concept of news headlines and content fits,and combines example analysis to propose a recognition method.The specific content is as follows: First,for the sentence in a news content,the model is established using the LDA topic model,and two distribution results of the document-topic and the topic-term are obtained.According to the theme model features and results,the similarity between topic representative words and articles is calculated,and the topic words are selected according to the similarity size and the corresponding selection rules.Then,the concept of fit degree is proposed as a measure of the matching and similarity between the title and the topic word.The essence of the concept is the measurement of the text similarity.The construction idea is based on multiple similarity calculation methods,and the final fit is weighted.The similarity of topic words accounts for the proportion.Through analysis,it is found that only when the degree of fit is 0,the news is directly marked as the "Clickbait" decision rule,and the accuracy rate of tag data identifying the "headline party" news is about 69%,and the F value is 0.739.This shows the model to some extent.The applicability.Experiments have proved that the method of using the topic model distribution to calculate word similarity for word selection is effective.The key word coincidence rate with the TF-IDF algorithm is 82.7%,and the subject word extraction accuracy rate is 76.2%.To improve the accuracy of keyword extraction,the accuracy of recognition also rose slightly.The concept of fit degree proposed in this paper is applicable to the measurement of similarity between other long and short texts.However,since it was first proposed to apply it to the identification of “Clickbait”,the actual detailed decision rules are still open to question,and the existing research results provide further in-depth research.Construction ideas and practical references.
Keywords/Search Tags:News headline, Text mining, Topic model, Degree of fit
PDF Full Text Request
Related items