Font Size: a A A

Topic Detection And Tracking Research Based On Semantic Structures And Temporal Features

Posted on:2010-01-12Degree:DoctorType:Dissertation
Country:ChinaCandidate:Y HongFull Text:PDF
GTID:1118360302965517Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
The transmittal and alternation of information have surmounted the temporal and dimensional restriction along with the prevalence and development of web worldwide. Because of the benefit of network, news media has used it as an important platform to distribute news stories. However, there are so much mass unordered and developmental information that it restricts the effective identification, collection and organization of news topics. So it has become a crucial issue of network information processing in automatically mining news topics and learning their dynamic evolutions with intelligence and high precision. Topic Detection and Tracking (abbr. TDT) is the research that focuses on these problems, and it provides a new multilingual platform to evaluate various techniques of information retrieval, data mining and natural language processing.This paper firstly proposes a method to determine the relevance based on the semantic domain language model for the task of link detection, which involves the basic researches of text comprehension and semantics analysis in relevance determination. So it is of great significance for subsequent researches in TDT. The core of the semantic domain language model is to establish the cohesive structure for specific semantics, by which the relevance among stories can be determined based on their semantics similarity. The paper has proved that it is beneficial to establish the clear and comprehensible topic model to segment and organize contents of stories based on semantics.Secondly, the paper proposes two detection strategies respectively based on division comparison of subtopics and temporal topic model for the task of new event detection (NED). NED task focuses on mining seminal events of news topics and establishing their initial centroids which plays the role of a pacesetter among the identification of subsequent on-topic stories. Therefore, new event detection is an important supplementary research for the task of topic tracking. The detection method, based on division comparison of subtopics, inherits the theory of semantic domain segmentation. To be specific, it describes a topic as a model involving several subtopics, and compares the relevance of subtopics between the model and stories. Then the final relevance between the topic and the story is determined by the distribution probability of relevant subtopics. On the basis, temporal topic model attributes the origin of subtopics to the occurrences of different on-topic stories. So it describes a topic as the model involving different events that correspond to different time. What's more, it efficiently compares the relevance between the topic and the story based on the rule of"the same event must happen at the same time". Additionally, the model attempts to mine seminal events and novel events of topics based on the distributions of temporal expressions, by which it can reasonably apportion weights in relevance calculation according to the influences of the events to the evolution trends of topics. Thus the accuracy of new event detection can be improved under favor of that.Thirdly, the paper proposes an incremental novelty learning method for the task of adaptive topic tracking in TDT. The main task of tracking is to identify subsequent on-topic stories in the temporal story stream. The difficulty part of the task is to improve the adaptation of topic model by automatically learning the trend of topic evolution and the touch-off point of topic shifting according to feedbacks. The incremental novelty learning method inherits the important effect of novel events in capturing topic evolution trends, and on the basis, it involves the burst novel event mining and their application, which further improve the ability of topic model to track the trend of topic shifting.At last, the paper proposes the information filtering method based on the distribution of two-dimension similarity. The main task of information filtering is to obtain the relevant information more precisely by shielding noises from the dynamic information stream. The paper incorporates information filtering into the study of TDT because both detection and tracking procedure are generally annoyed by the noises in the temporal ordered story stream. So the paper attempts to improve the ability of filtering noises by modifying statistical models and applying distribution characteristics of data. Among them, the filtering technique, based on the distribution of two-dimension similarity, shields noise terms from the statistical model by learning the distribution characteristics of the relevant information and noises.In general, the paper combines the main tasks of TDT into related and systematic researches which major in exploring new techniques of effective identification, mining and organization of news information. Additionally, the researches of all the tasks attempt to design technical routes according to particular characteristics of the news information on the basis of improvements of current statistical models, and good results were gained after the innovation. In spite of that, the works in the paper are only prospective study on TDT, and the field still includes many challenging issues awaiting the further exploration.
Keywords/Search Tags:topic detection and tracking, link detection, new event detection, adaptive tracking, semantic structure, temporal feature
PDF Full Text Request
Related items