Font Size: a A A

The Research Of Chinese Story Link Detection And Topic Tracking

Posted on:2014-02-16Degree:MasterType:Thesis
Country:ChinaCandidate:Z M ChenFull Text:PDF
GTID:2268330401985889Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the rapid development of network technology, the scale of network increases sharply, providing automatic and efficient information processing technology to deal with news had become inevitable. Topic detection and tracking based on data stream news as the research object, presents the topic result to users after topic detection, recognition and tracking, which has a great of significance for public opinion analysis and information research work.This paper studied Chinese story link detection and topic tracking, which are the subtask of topic detection and tracking. Analyzing overseas and domestic research status, proposed the improvement technologies to improve the effect and to reduce the error recognition of the story link detection and topic tracking: in allusion to story link detection, proposed a algorithm which extracted elements correlative words and computed the similarity of correlative words to complementally expressed the story; about topic tracking, put forward improved arithmetic which based on KNN algorithm over class selection.In the study of story link detection, according to the characteristic that the news elements of related articles are basically the same, divided reports into vectors of time, place, people and content then used cosine similarity calculation. Extracting elements correlative words to supplement the representation model of story and prosing the computing method to calculate the correlative words similarity then provided a basis for judging the relativity between stories.In topic tracking, to solve the problems of KNN algorithm that KNN needs huge computation and is vulnerable to the distribution of training samples, expressed the topic with the characteristics of the high average, then found k-nearest neighbor topics, then found k-nearest neighbor stories from k-nearest neighbor topics, then calculated the average similarities of found k-nearest neighbor topics which included the k-nearest neighbor stories. In order to reduce the influence of the problem of topic excursion, dynamically updated topic model.The experiments showed that, the method about elements correlative words can reduce the loss rate and false positive rate of story link detection, then the detection cost was decreased by nearly10percent. Improved KNN algorithm that proposed in this paper has good tracking effect, because it doubled the operational efficiency and decreased tracking cost9percent compared with traditional KNN algorithm.
Keywords/Search Tags:story link detection, correlative word, news elements, topic tracking, KNN algorithm, excursive, imbalanced
PDF Full Text Request
Related items