Font Size: a A A

Research On Algorithm Of Authoritative Page Mining Based On Unitary Transformation

Posted on:2011-10-03Degree:MasterType:Thesis
Country:ChinaCandidate:Y SunFull Text:PDF
GTID:2178330332460342Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
The World Wide Web service is huge, widely distributed, global information service center for various applications. Web contains a rich and dynamic collection of hyperlink information and Web page access and usage information, providing rich sources for data mining. The goal of Web mining is to discover the access pattern and hidden information from the huge collection of documents plus hyperlink information, access and usage information. Web mining is from Web documents and Web activities of potential interest to extract useful patterns and hidden message.Firstly, this paper makes a systematic survey about search engine and Web page mining. This paper depicts the principle of search engine operation, the classification of search engine and Web mining. Moreover, several typical schemes for classical authoritative page mining algorithm are reviewed, and the taxonomies are described. More specifically, these schemes are discussed in detail, and advantages and disadvantages of the schemes are summarized. Then, unitary transformation and power algorithm foundation knowledge is introduced, especially, the SVD transformation and the TSVD transformation. These contents are the foundations of our research.The core content of the paper is Web mining technology .On the basis of combining Web content mining with Web structure mining, the thesis introduces a new algorithm which is called authoritative page mining algorithm based on the truncated singular value decomposition. The processing of the new mining algorithm based on TSVD is divided into two parts. In the Web structure mining based on the combined weight of the contents of the page mining transformation algorithm, it gets a link weight matrix. Then the algorithm does TSVD transformation to link weight matrix, the result is final ranking of the authority page. Literature has been adopted by the authority of the page mining algorithms are basically using only single mechanism in which Web content mining algorithm based on more and the study is also relatively mature. The algorithms based on Web structure mining are relatively small. Two mechanisms have their own advantages and disadvantages. Base on above consideration , this paper presents TSVD-based algorithm for mining the authority of the page. To achieve the authority of the page mining, the paper essentially tries to combine the two kinds of Web mining methods in complementary manner. It aims to effectively improve the precision rate and recall rate. The purpose of using the truncated singular value decomposition transformation is to reduce amount of calculation, filter out redundant mathematical calculations and improve search response time.Finally, with the help matlab simulation tool, the thesis conducts comparative analysis of three kinds of weighting calculation schemes. This paper describes the use of TSVD algorithm to a typical topology of static pages link to the process of mining the authority of the page. The sorting results of classic PageRank algorithm was used to compare with the results of TSVD algorithm. Simulation results show that the authoritative page mining algorithm based on TSVD has better query performance and higher accuracy of query than classic PageRank algorithm.
Keywords/Search Tags:Web Mining, Unitary Transformation, TSVD Transformation, Page Rank, Weight Computing
PDF Full Text Request
Related items