Font Size: a A A

Design And Implementation Of The Focused Crawler System Based On Customized Domain Conceptions

Posted on:2008-10-07Degree:MasterType:Thesis
Country:ChinaCandidate:K JiangFull Text:PDF
GTID:2178360212474597Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
With the rapid increasing in information of Internet ,how to get the information from Internet quickly and accurately is becoming more and more important.The application of the focused search engine technology can solve this problem commendably .Focused Search Engine is specialized search engine,it only faces one field or one topic.Comparing with Common Search Engine,Focused Search Engine has the merit of collecting domain information exactly,covering the field area large.However,how to design the suitable topic rules for domain conception,how to analyse the web page effectively in order to not only filtrate the irrelative resources ,but also get the high-relative topic resources ,and how to enlarge the domain of topic resources ,is becoming very important in researching focused crawler system.First of all, the paper introduces the relative technologies of search engine and the HTTP protocol, explains the work principle and framework of Common Search Engine system and also describes the work principle and framework of Focused Search Engine system in details .Further the paper comes up with the design of structure of relativity degree and importance degree module. Second ,the paper researches on the key algorithms of topic relativity determination in the focused crawler system and make application analysis respectively .then ,the paper gives Web Page evaluating algorithm based on customized domain conceptions and designs out the topic establishment algorithm designing module ,initial seeds URL optimized algorithm designing module ,topic relativity analysis algorithm designing module , hyperlink importance analysis algorithm designing module .Finally, the paper successfully implements the focused crawler system based on customized domain conceptions and tests the system with the topic of football news .The test data indicates that the focused crawler system has high performance on searching accuracy and overlay.
Keywords/Search Tags:Focused Crawler, PageRank Algorithm, HITS Algorithm, Relativity Degree, Importance Degree
PDF Full Text Request
Related items