Font Size: a A A

Research On The Representation Model And Technologies Of Link Detection And Tracking On News Topic

Posted on:2011-02-19Degree:DoctorType:Dissertation
Country:ChinaCandidate:X Y ZhangFull Text:PDF
GTID:1118330332486936Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Topic detection and tracking (TDT) is to analyze a stream of news stories and tryto find, track and thread the embedded topics. A topic is defined as a specific event oractivity plus directly related events or activities. Since being established as a researchfield in 1996, TDT has been an issue in the area of Natural Language Processing. Up tonow, great successes have been made and TDT techniques have been widely used in manyapplications, especially public opinions monitoring and new knowledge mining.This dissertation concentrates on two tasks in TDT: story link detection and topictracking. Some techniques are proposed for the representation model, the link detectionand the tracking methods.? Story link detection is the problem of deciding whether two stories discuss a sametopic in a stream of story pairs. It is the key technique of TDT. The achieved resultson this task are as follows:Event model: Based on the analysis of the feature selection, the similarityfunction and the partition criterion of the multi-vector model, an event modelis proposed according to the event framework. When using the event model,we take the uneven SVM model to solve the uneven problem in the trainingdata. The fuzzy matching technique between two models has also been tried.As indicated by the experiments, the performance of the story link detectionsystem using the event model is improved significantly.Dynamicinformationextending: Toovercomethelimitationofthestorylength,the sparse data and the possible topic drift in a story, we break the independentassumptionbetweentwostorypairsandproposeatechniqueofdynamicinfor-mation extending. It extends the current story with its previous latest topicallyrelated story. In addition, we also study the refinement of the extended infor-mation. Three kinds of information, including kernel information, noun enti-ties (person, location, organization) and noun dependency of noun entities, areselected to improve the effectiveness of the representation model. The experi-mental results indicate that the dynamic extending and the refinement method are effective and can both improve the performance of story link detectionsystems evidently.Topic tracking associates the incoming stories in a stream with a topic pre-identifiedbyafewstoriesandfindsallthestoriesrelatedtothetopic. Itistheonlytaskthathasprior information in TDT research. The achieved results on this task are as follows:Dynamic topic model: To overcome the topic drift problem, a dynamic modelis designed to represent a tracked topic, which continues the research on theabove dynamic extending. This model selects the features to update a topicmodel globally from all the incoming related stories. The information in thepseudo-related stories can be ignored in this procedure. Besides, a topic-basedweighting method is proposed, which takes the training data as topic-clusteredand measures a feature from the perspective of topics. Besides, the latest unre-lated story is also used to filter the noise in the topic model. The experimentalresults indicate that the dynamic topic model can well handle drifted topicsand improve the tracking performance.Joint tracking method: Since a topic description usually does not provideenough information and the new information in the incoming stories can notbe handled, we propose a joint tracking method, which is also a new way ofusing the techniques of story link detection for topic tracking. This methodfirstly constructs a tracking method using a kind of topic-independent linkage-basedfeaturesfromthedataaboutothertopics, andthenlinearlycombinesthismethod with the predefined related information-based tracking method. Theexperimental result sindicate that the joint tracking method can solve the aboveproblem. More important, it can integrate most of the proposed techniques inthis dissertation and the achieved improvement can be cumulated.The futureworkwillfocusonstudyingmoreaboutTDTandothertopic-relatedappli-cations such as network monitoring and topic-based summarization. In addition, althoughour work are tested and evaluated on the Chinese subset of TDT4, they should be inde-pendent of the language and the representation style.
Keywords/Search Tags:Topic detection and tracking, Story link detection, Topic track-ing, Topic
PDF Full Text Request
Related items