Font Size: a A A

The Research On Topic Search Technology Based On Topic Detection And Tracking

Posted on:2011-07-15Degree:MasterType:Thesis
Country:ChinaCandidate:L YuanFull Text:PDF
GTID:2178330338979994Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Along with the rapid popularization of Internet, The Internet has become one kind of new media. On the network, information transmission takes short time, and is burst, interactive compared with traditional newspaper, television and other information transmission media. News information published in the Web can reflect these characteristic more. People also begin to form the habits of browsing news in Internet, they pay attention to their interesting topics or search some information like "what great event has happened recently". In order to let people get information more quickly , we should mine the content of web news information deeply, implement auto detecting topics in the news report information flow, and track the exist topics, provide the information search service based on the unit of topic.This paper researched the topic detection and tracking technology, and applied it to the field of web news. Through the technology of topic detection and topic tracking, we mine the content of news story information, organize news reporting the same topic together, and use index and retrieval technology to process topic information we obtained, finally provide a topic search service.Firstly, This paper introduced some relative concept and key technology in text clustering, text classification about topic detection and tracking, and expounded key technology in search engine. Then we described data obtaining and preprocessing job, including news page collecting, news information extracting and model representation for news story. In response to characteristic of information in web news field, we designed topic detection and tracking algorithm, gave different weights to the words according to the different part of speech, improved the cosine similarity formula. The experimental result showed that the performance of algorithm was improved by the new similarity formula. We also test the topic detection and tracking algorithm in different threshold values. In the implementation of topic search module, we proposed a kind of topic information extraction method, indexed this structured topic information, achieved two types of search result sorting methods that are according to content relevance and time sequence respectively, and showed the result from real corpus from Internet. At last, we summarized the overall design and displayed part of demo interface.
Keywords/Search Tags:topic detection, topic tracking, topic search
PDF Full Text Request
Related items