Font Size: a A A

Research On Topic Detection Based On Adaptive Gravity Vector

Posted on:2014-12-30Degree:MasterType:Thesis
Country:ChinaCandidate:L FangFull Text:PDF
GTID:2268330401482051Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the development of Internet, the volume of information grows rapidly, and how toorganize these scattered resources has become a topic of concern. Scholars have proposedvarious methods of information extraction, retrieval and organization to achieve theintegration of resources, and the Technology of topic detection and tracking (Topic Detectionand Tracking, TDT) also emerge as the times require. TDT technology was first used tohandle the news. It can integrate and organize the information automatically. By this way,users can understand a topic fully, and discovery the relations with other topics at the sametime. It avoids the redundant information retrieval based on other general informationresearch. In recent years, the topic detection and tracking has been widely development,especially in the analysis of public opinion and emergency discovery. At the same time, it hasa broad application prospects in scientific research, such as the historical research.Because of the special nature of the news, location, time and other special characters ofreports and topic play an important role in distinguishing different pieces of news. Therefore,named entity vector, time vector and other feature words’ vector would be used to describethe news. At the same time, this paper using gravity vectors to describe a topic. It can expressthe difference distribution of characteristics among different topics.Topic detection is to find and organize the topic that has not been appeared. In theadaptive topic detection algorithm, the original topic can be updated by the news we havedetected. At the same time, the topics’ characteristics can be extended. And in this method,the threshold strategy is used to restrict the update of topic. Therefore,we can get a selflearning platform to realize topic detection. Through this platform, users can monitor thenews reports, it well calculate the similarity of the story and guiding the follow-up operationthrough the threshold strategy.This paper improves the clustering algorithm and the online topic detection algorithm.First of all, improve the method of initial center selection, and cluster all the stories to get thetopics in this way. Secondly, using the method of incremental clustering base on density, seta certain threshold to guide the clustering--clustering the piece of news and the topics whichare similarity to it. It will optimize the local topic distribution, and the performance of thewhole topic detection would be improved.
Keywords/Search Tags:Topic Detection, Clustering, Density distribution, Distance, Gravity vector, Named entity
PDF Full Text Request
Related items