Font Size: a A A

The Research Of Web Page Crawling Strategy For Topical Search Engine Based On Web Mining

Posted on:2015-11-19Degree:MasterType:Thesis
Country:ChinaCandidate:S T JinFull Text:PDF
GTID:2308330461992430Subject:Management Science and Engineering
Abstract/Summary:PDF Full Text Request
The rapid development of the Internet led to the changing times. Web resource contains a wealth of valuable information, as a new resource, it has become increasingly important. The main task of Web mining research is how to efficiently acquire the required knowledge from so much Web information. However, general search engines lack of specificity on the Internet, the search will result in a lot of irrelevant information so that the results’ utility is low. In this case, the search engines which oriented to specific areas came into being, it is a good solution to the drawbacks of general search engines,and its’ core content is the subject of web crawling technology. The study of the technology has become the current hot spots and trends.Firstly, a brief introduction to some of the feasibility analysis of Web mining technology background status quo and search engine technology and domestic and international development, and analysis of the existence of common ground between them, and for the combination of these two technologies do with each other. Then introduced the theme of development and the important role of search engines, and the main theme to theme search engine web crawlers as a research strategy to improve the theme pages crawled recall and precision as a starting point, a detailed analysis of the current some themes web crawling method and its advantages and disadvantages. Then drawback from the subject search spider’s Best-First Search algorithm exists to proceed, combined with a non-greedy strategy and other methods of algorithm was further optimized and experimentally proved that the new algorithm superiority; final design and implementation of a theme Web mining prototype is given relating to classes of the spider, the queue, the design of the database is set, etc., and the performance of the entire system was tested.
Keywords/Search Tags:Web mining, Topial search engine, Best-First algorithm, Non-greedy strategy
PDF Full Text Request
Related items