Font Size: a A A

Research On Automatic Network Hot Topics Detection

Posted on:2009-07-03Degree:MasterType:Thesis
Country:ChinaCandidate:H J GongFull Text:PDF
GTID:2178360245457964Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Hot topics refer to those topics happened in various fields and raise great public concerns in a certain period. Recognizing and monitoring hot topics can help people to be aware of the focus of the community in the period and discover public opinions in time.Although some organizations promulgate hot topics by medium every year, many of which are usually chosen by manual estimate, the objectivity and real-time of the results have certain limitations. And the focus news that is released by some search engine companies is only in the short-term and the users are unable to know the process and the development of the entire topic. Because of the enormous amount of information on the Internet and its not easy supervision, how to detect hot topics from numerous network information sources is becoming more and more important. This paper focuses on how to detect hot topics automatically and in time from network news corpus. The prime work of this paper included the following aspects:(1) Proposed an algorithm of incremental multi-level clustering based on multi-strategy optimization to detect topics. We downloaded news pages from popular portal web sites, which would be preprocessed, then recognized the named entities in every story. We introduced several optimization programs, like incremental df method and time attenuation function, then implemented incremental multi-level clustering based on vector space model to get topic lists. This algorithm could detect new topics of various fields on the popular portal web sites in a certain period dynamically and in time.(2) Designed a model to recognize hot topics. After analyzed the characters and trend curves of hot topics past years, we divided topic attention into media attention and user attention and found the features that influence the topic attention. We quantified these features and proposed a formula to calculate the attention of every topic. We screened out hot topics from the topic lists according to their attention scores and topic development curves.(3) Designed and has implemented the hot topics detection system according to the work above. This system can detect hot topics of various fields in a certain period and provide users with more related information of hot topics, such as topic title, topic description, topic related words group, and topic related documents.
Keywords/Search Tags:TDT, topic detection, hot topic detection, incremental multi-level clustering
PDF Full Text Request
Related items