Font Size: a A A

Theme Research And Implementation Of The Search Engine

Posted on:2012-03-19Degree:MasterType:Thesis
Country:ChinaCandidate:Q L LiuFull Text:PDF
GTID:2218330338955853Subject:Computer system architecture
Abstract/Summary:PDF Full Text Request
With the rapid development of Internet and increasing Internet information, the widely applied information technology brought much convenience to our life and greatly improved our living quality and working efficiency. General search engine provides us with a more convenient information service, thus it has an irreplaceable status in contemporary Internet age. However, the soaring growth of Internet information also makes it difficult to for general engine users to search subject-oriented information. As a result, subject-oriented search engine was invented to solve this problem. How to improve the efficiency of subject crawler and strategies of crawling in web for information have become the focus in the study of subject search engines. The dynamic, complex and heterogeneous nature of Internet requires the subject crawler to get access to the information needed effectively. That is to say, high quality information and efficiency of procuring information should be achieved.This thesis discusses the developing process of search engines and analyzes the composition and working principles of search engines. As the core part of search engine, subject crawler was analyzed in detail in this thesis. Besides, a modified way of algorithm that related to URL strategy is presented in this thesis. To be specific, an incremental algorithm is proposed in that subject search engine is superior to general search engine in terms of timeliness. In this way, new net pages on the Internet can be found effectively.In addition, this paper expounds on the realization process of search engine, which is based on the open-source framework named nutch,The original nutch Chinese word segmentation technology was improved. The result of experiments and tests proves that the modified algorithm and incremental algorithm of subject crawling URL are of effectiveness and application value.
Keywords/Search Tags:subject-oriented search, crawler, strategy of search, nutch, add-extract
PDF Full Text Request
Related items