The Algorism Research On The Web Structure Mining In The Search Engine Technology

Today, when searching for information on the Web, one usually performs a query through a search engine. Many search engine are term-based, and return, a list of Web pages whose content match the query. For wide topic queries, such searches often result in a huge set of retrieved documents, Many of which are irrelevant to the user.However, much information is contained in the link-structure of the Web pages, from which people can find much useable information through Web structure mining technology. Those information can be used to enhance the search engine technology. In this context, Jon M. Kleinberg (in his paper Authoritative sources in a hyperlinked environment) introduced the following notions:1. Authoritative pages-A small subset containing the most pages which match the query and which authoritative. Pages in this subset have many incoming links.2. Hub pages-Pages which have links to multiple authoritative pages. Kleinberg argued that hubs and authorities exhibit a mutually reinforcing relationship: a good hub will point to many authorities, and a good authority will be pointed at by many hubs . In light of this, he devised an algorism aimed at finding authoritative web pages. Brin and Page also devised the Page-rank algorism, and use this algorism to the search engine google.But those two algorisms have some deficiency. Through analyzing them (research their commonness and deficiency), devised a general algorism to find authoritative. Then, we define some notion to evaluate these algorisms and test our new algorism and Hub/authority algorism on artificial Web Topologies and compare the result. At last we have the conclusion that our new algorism is than Hub/authority algorism.
Keywords/Search Tags:Web mining, Web structure mining, search engines, authoritative, hub.
