Font Size: a A A

Initial URLS Optimization In Search Engine

Posted on:2008-10-24Degree:MasterType:Thesis
Country:ChinaCandidate:L XiangFull Text:PDF
GTID:2178360212495650Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Nowadays, the information of internet is tremendous, how to acquire the important and user relevant information is very important research. Search Engine appears in this background. However, the total number of search results is very large, and it is difficult for users to find the useful information in those results. How do we organize the search result and how do we find the useful information? Helping user find the interesting information by good initial URLS is the research focal point.One of the important parts in search engine is the web crawler, it is necessary for search engine to crawl the pages. This paper begins with web crawler and emphasizes to discuss the initial point of web crawling—the initial URLS forming, we make the purpose of user individualization by initial URLS individualization. According to this train of thought, we do a lot of research on URLS forming; the research compliments are described below:1. This paper proposes the basic thought and method of candidate initial URLS forming. According to user input condition, partial web page returning from famous search engine (AltaVista, DirectHit, Excite, Google, HotBot, Lycos and Yahoo, etc) is considered to be initial URLS. In fact, this paper discuss that by using Google web service API, we let Google return a lot of URLS to be the start of consecutive research work.2. By using the ordered concept lattice proposed by another person, we get the user traverse path, especially discovering frequent traversal path by giving the access frequency minimal. We order these paths by appearance frequency, then, we get user interest seeds by user clicking in order to prepare for the consecutive crawling, at last, we propose the algorithm and instance.3. By data mining the user browser history and log, we acquire the user interests. then, we combine the results of 1 to get the interesting seeds, these seeds can be clicked directly, that is to say, they can not only be the seeds of next crawling but also the results returned to users.4. At last, my paper develops a web crawler (my spider), we check this URLS seeds forming method in the Xihua university web, by comparing with Google, Baidu, Learnable crawler, The result of my spider is higher than other search engine in user satisfaction, user relevance and web recall ratio, at the same time, these experiments concludes that the user result of my spider after three times crawling is enough, rational, efficient.
Keywords/Search Tags:Initial URLS, Meta search, Formal concept analysis, Data mining, Interesting seeds
PDF Full Text Request
Related items