Initial URLS Optimization In Search Engine

Posted on:2008-10-24

Degree:Master

Type:Thesis

Country:China

Candidate:L Xiang

Full Text:PDF

GTID:2178360212495650

Subject:Computer software and theory

Abstract/Summary:

Nowadays, the information of internet is tremendous, how to acquire the important and user relevant information is very important research. Search Engine appears in this background. However, the total number of search results is very large, and it is difficult for users to find the useful information in those results. How do we organize the search result and how do we find the useful information? Helping user find the interesting information by good initial URLS is the research focal point.One of the important parts in search engine is the web crawler, it is necessary for search engine to crawl the pages. This paper begins with web crawler and emphasizes to discuss the initial point of web crawlingâ€”the initial URLS forming, we make the purpose of user individualization by initial URLS individualization. According to this train of thought, we do a lot of research on URLS forming; the research compliments are described below:1. This paper proposes the basic thought and method of candidate initial URLS forming. According to user input condition, partial web page returning from famous search engine (AltaVista, DirectHit, Excite, Google, HotBot, Lycos and Yahoo, etc) is considered to be initial URLS. In fact, this paper discuss that by using Google web service API, we let Google return a lot of URLS to be the start of consecutive research work.2. By using the ordered concept lattice proposed by another person, we get the user traverse path, especially discovering frequent traversal path by giving the access frequency minimal. We order these paths by appearance frequency, then, we get user interest seeds by user clicking in order to prepare for the consecutive crawling, at last, we propose the algorithm and instance.3. By data mining the user browser history and log, we acquire the user interests. then, we combine the results of 1 to get the interesting seeds, these seeds can be clicked directly, that is to say, they can not only be the seeds of next crawling but also the results returned to users.4. At last, my paper develops a web crawler (my spider), we check this URLS seeds forming method in the Xihua university web, by comparing with Google, Baidu, Learnable crawler, The result of my spider is higher than other search engine in user satisfaction, user relevance and web recall ratio, at the same time, these experiments concludes that the user result of my spider after three times crawling is enough, rational, efficient.

Keywords/Search Tags:

Initial URLS, Meta search, Formal concept analysis, Data mining, Interesting seeds

Related items

1	Research Of The Result Merging Of Meta Search Engine Based On Formal Concept Analysis
2	Study On Search Results Clustering Based On Formal Concept Analysis
3	Study On Image Mining Based On Formal Concept Analysis
4	Improvement And Implementation Of A Number Of Algorithms In Concept Analysis,
5	Use Case Mining Based On Formal Concept Analysis
6	The Improvement Of Godin Algorithm And The Application Of Formal Concept Analysis In The Intelligent Search Engine
7	Study Of Formal Concept Analysis In Data Mining
8	Research On Selection Of Initial-URLs Based On User Ontology
9	Research And Application On Information Retrieval Model Based On FCA
10	The Research Of Concept Lattice Pruning Method And Its Application In Web Mining