Font Size: a A A

Research On The Method Of Topic Discovery And Hotness Evaluation For News

Posted on:2018-03-19Degree:MasterType:Thesis
Country:ChinaCandidate:X GaoFull Text:PDF
GTID:2348330515969297Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the rapid development of information technology,the Internet has been widely applied in various industries and fields.As the rapid growth of network data,the collection and organization of information become more and more difficult.How to get the required information from the massive information flows has become a problem to be solved.Topic Detection and Tracking(TDT)is the key technology to solve this problem,which is intended to discover new topics from news stream and carry on subsequently tracking on existing topics.Topic detection is one of the most important research tasks in TDT,which mainly clusters similar news stories to get a topic to facilitate users' inquiries.Therefore,information can be organized by setting topics as granularity,which facilitates people to understand the related activities of an event.Based on the current research situations about topic detection technology and the evaluation method of topic hotness at home and abroad,this dissertation analyzes and studies the main technologies among them.The main work of this paper is as follows:First,the pre-processing of news text data and the text representation model are studied.This paper carries on optimization from the word location information in the headline and the text and word increment document frequency in feature word weight calculation based on the characteristics of news reports,considering the headline and the text of the news as well as the incremental features of the news.This highlights the importance of news headlines and improves the efficiency of detecting.Finally,the vector space model,VSM is used to express,which changes the news text datas into the datas that can be identified by the computer.Second,a news topic discovery algorithm is proposed.The classical Single-Pass clustering algorithm is improved: based on the time characteristics and dynamic of the news,the time factor is added in the similarity calculation;the centroid vector of the topics is updated dynamically in clustering process.This research applies the news based on the headline web-based reptile and corpus as the experimental data set.The experimental results show that the cost and false alarms are lower in the improved algorithm than that in the classical algorithm,which verify the validity and accuracy of the algorithm.Third,an evaluation method of topic hotness is proposed.This paper takes the attention of the media and the users into account,and evaluates the hotness of clustering topics.Through the analysis of the hotness of topics,ultimately the network hot topics and their sorts will be obtained within a period of time,and the topic hotness index is applied to analyze the development trend of the topic.
Keywords/Search Tags:Topic Detection, Vector Space Model, Single-Pass Algorithm, Similarity Calculation, Topic Hotness
PDF Full Text Request
Related items