Font Size: a A A

Research And Improvement Of Pagerank Algorithm In Literature Retrival Ranking

Posted on:2017-05-08Degree:MasterType:Thesis
Country:ChinaCandidate:Z W WangFull Text:PDF
GTID:2348330488477981Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the rapid development of the Internet, more and more information is stored and exchanged in the form of electronic, so the information retrieval technology has emerged, and is still in the continuous development and improvement. As an important way of obtaining information resource, literature retrieval has become an important field in information retrieval. Well designed literature retrieval can help researchers summarizes and refers to the previous research results. It is not only able to promote the rapid utilization of literature resources, but also to avoid those phenomena like overlapping of research.Most of the traditional literature retrieval use one of the following conditions to sort the results: citations, publication dateline and the frequency of keywords appears in paper. It always from a single point of view and ignore the value flow among literature reference. A phenomenon tend to occur that a part of literature ranked too high or too low. Therefore, many scholars have pointed out the Page Rank algorithm can be applied to literature retrieval, and achieved some improvements. But it still exists some special circumstances, such as the value of literature have a possible recession over time. And there are some paper which published not long enough to get a citation, how to evaluate the value of these literature.To solve these problems, this paper propose a multidimensional search ranking method. It is based on the analysis of the network structure which built by inter-citation of literature, and it takes various influence factors into consideration synthetically, and introduce the concept of literature activity to quantity the weighted value of literature. At last, this paper chooses open-source crawler "Heritrix" as the sample collection tools. And using development tools include struts framework, HTML and Java Script scripting language and Oracle11 g database to design a simple online literature retrieval system.The system fetch the description documents of the literature from the CNKI by HTML and ASPX file, then parsing these files and creating experimental data sets by the analysis results. Experimental results show that the multidimensional search ranking method outperforms the traditional literature retrieval ranking method, and it has a better efficient because most of the additional computation brought by the weight of the iteration completed offline. It is not only improve the accuracy but also maintain the efficiency of retrieval.
Keywords/Search Tags:literature retrieval, multidimensional search ranking method, Page Rank algorithm, literature activity
PDF Full Text Request
Related items