With the increasing of WWW, Web information retrieval systems with higher performance are required. Subsequently, the research on Web information retrieval has being a focus. Recently, Focus Crawling system was presented to satisfy people who need professional knowledge from WWW.In this dissertation all key aspects of a Focus Crawling system are introduced and then the classification problem in Focus Crawling system is deeply discussed. Now, most classification methods for Web Page only use the contents of Web Page. These methods ignore links between pages completely. In fact, links between Web Pages sometimes reflect topics of these linked pages. So this dissertation designs a new method to classify Web Pages. This method uses links and contents of Web Page to decide a page's class. The result of experiment shows an improvement on methods, which consider contents of Web Page only. Then this dissertation designs a better Focus Crawling system, which use a classifier based on contents and links of a Web Page to decide the page's class, and the result of experiments shows an improvement on common method.In order to check our methods, we develop a focus crawling system using vc++ 6.0.
|