Font Size: a A A

Research On News Topic Detection Based On VSM Model And ILDA Model

Posted on:2017-04-26Degree:MasterType:Thesis
Country:ChinaCandidate:L ChengFull Text:PDF
GTID:2308330488985685Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the rapid development of Internet technology, it’s more and more convenient for Internet users to obtain hot topics they are curious about. Internet news become one of the most important mediums from which people gain news information. Compared with traditional news media, when reporting a grand event, Internet news provide more prompts and interactive services for us. In a world where time is money, efficiency is life, how to organize and narrate a hot news topic in an effective way to help users keep up with the trends has become cognition needs which people earnestly hope to cope with.In recent years, topic detection and tracking has been a hot research area of Natural Language Processing (NLP) and information retrieval, it can effectively organize massive information on the net and relieve the pressing tension when people are faced with massive news. Besides, national institution need to make swift decision when keep tracking of public opinion when hot affairs happened. Thus, topic detection and tracking on network news is of great significance.Previous researchers worked on the research usually using traditional VSM and LDA model, while the two brings good results separately when used in TDT. However, both have their own defects. VSM can be processed conveniently when used but it ignores semantic relations in text, though LDA nicely deal with the semantic relations it needs us to set the topic number K in advance when we use it.To solve these problems, we do some relevant research work. Main contribution and innovation of this paper are as follows:First, in this thesis we combine ILDA model with VSM to design a topic detection and tracking algorithm. The combined modeling method can make full use of their respective advantages of the two models and avoid the shortcomings of using a single model, effectively solve the problem of text representation.Secondly, regarding the title can briefly summarize the main idea of a story, so the key words in news title deserve a higher weight and traditional TF-IDF algorithm is improved based on the idea.Thirdly, in the thesis, we introduce the aging theory and put forward a topic ranking algorithm.At first, we model topics using the theory, then compute the energy of single topic every time slice and lastly build association among the adjacent time slice. We remove topics which slowly are aging and reserve the hot topics by ranking them in decreasing order by their energy value.In true news corpus, a large number of experiments are carried out to verify the method’s effectiveness.
Keywords/Search Tags:network news, hot topics, ILDA, VSM, Aging theory
PDF Full Text Request
Related items