Font Size: a A A

Research On XML-based Index And Page Rank Technology In Vertical Search

Posted on:2010-07-02Degree:MasterType:Thesis
Country:ChinaCandidate:B YouFull Text:PDF
GTID:2178360272479367Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
In today's world, accompanied by the rapid development of the Internet technology, network speed the expansion of information grows exponentially. People from all walks of life are closely linked because of the Internet, sharing of information is receiving increasing attention. At such conditions, the search engine technology has developed rapidly. However, a new requirement is people's access to information timeliness, relevance, accuracy, etc, so based on the professional search system that vertical search engines came into being. Traditional search engines is based on the HTML, which emphasize on appearance of the context not the content, and accuracy of locating information is decreased due to this reason. Thus, the performance of searching has to be improved. With the launch of a W3C Extensible Markup Language XML, the precision problem has been resolved to some extent. Being rich and clear in meaning of XML helps labels understand the context marked by this language. Search engines can locate accurately and get their goals depending on interdependent relationship between the labels and content. So they can reduce the scope of the search and improve searching accuracy. At this background, the XML-based vertical search engine has been studied in this paper.At first, this paper compares XML with HTML, introduces the principle of search engines and several related technologies and explains the reason why search engines can increase the precision of searching quality combined with XML language. Some improvement also has been made toward Chinese search engine in the traditional Chinese characters segmentation approaches. Next, a vertical search engine model based on XML is designed and the design concept and general framework will be followed. The model includes page snatch module, page conformity module, XML analysis module, index module and user query module. This paper describes the structure of the various modules and the realization of ideas in details. And then, this article focuses on XML-based vertical search engine's index analysis module. Due to the characteristics of XML documents, index analysis module designed in this paper includes two parts. One is parser which is to parse XML documents and the other is indexer which is to index the document. The structure of XML documents and contents both have been established with index, and the realization of the methods is discussed in detail.Finally, this article improves the PageRank arithmetic which is often used and brings forward the CP_PageRank arithmetic. The PageRank arithmetic merely considers the pages' links, easily brings "the topic excursion" issue and so on. In the CP_PageRank arithmetic, the article considers the keywords' word frequency with inhere basic PageRank arithmetic. The new arithmetic can solve "the topic excursion" beautifully.
Keywords/Search Tags:vertical search engine, index, word segmention, page rank, XML
PDF Full Text Request
Related items