Font Size: a A A

Research On Techniques Of Domain-Specific Topic Searching

Posted on:2009-12-22Degree:MasterType:Thesis
Country:ChinaCandidate:Y P ZhaoFull Text:PDF
GTID:2178360272979594Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the rapid development of the Internet technology, information capacity on the Internet is growing with the speed of geometric series, The Internet has become a largest information repository. How to acquire the information in time and with high accuracy and completeness becomes an urgent task for people. The apperance of search engine can resolve the problem of information acquirement in some extent. However, as information continues to explode in all directions, general search engines can not satisfy the information searching need of special users who are professional, precise and in-depth. Domain-specific topic search technology has become an active focus in Internet information retrieval field in recent years.First, the comparison between domain-specific topic search engines and general search engines is introduced in architecture, principle, key techniques, then the research status and development direction of topic search technology is also analyzed.And then, two key technologies including construction and update of domain knowledge base and domain topic identification in topic search technology are researched in this thesis. The emphases are subject dictionary's structure and building method, as well as the construction processes of topic characteristic model and page information model, thus some related algorithms are proposed.In the following, the searching heuristic strategy of topic network crawler is researched and several typical search algorithms are analyzed comparatively, a searching strategy based on integrated value is proposed. Based on this, a topic network crawler is designed.Finally, according to the above research results, a domain-specific topic search engine prototype system is designed. The system not only can crawl the topic pages accurately and automatically, but also can economize network bandwidth and put up good stability. Then several typical experiments are done to testify the precision ratio, the recall ratio, and the topic satisfaction ratio of this system, and these evaluation indicators all have reached a high level, also it obtained the good effect.
Keywords/Search Tags:topic search, domain, information model, topic crawler
PDF Full Text Request
Related items