Font Size: a A A

The Prediction Algorithm Of Paper Ranking Based On The Author’s Authority Value

Posted on:2013-12-23Degree:MasterType:Thesis
Country:ChinaCandidate:R Q XueFull Text:PDF
GTID:2248330371984046Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Paper ranking means rank the paper with the score which is valued by some certaincriteria from high to low. The normal criteria usually contain citations number, PageRankvalue, impact factor etc. While the prediction algorithm of paper ranking means using somecertain algorithm to predict the order of the ranking. In the paper citation network, theprediction algorithm of paper ranking can help the scientists and researchers pick out thepapers which will get widely intentions in the future.The result of statistical calculation shows that there is the accelerated growthphenomenon existing between the number of the papers published and the cited linknumber.The explosion growth of papers’ quantity brings extremely difficulty to scientists andresearchers’ documentation retrieval work. However, the prediction algorithm of papers’ rankcan help the scientists and researchers to find out the important papers with the criteria of howmuch they can get the focus from authorities. It means citations number, PageRank value,impact factor etc. will not be the criteria but the future citation number and Page Rank value’sposition will become the criteria. And this algorithm also can be used in other fields, forexample, the businessmen can pick out some quality networks or websites which will be hotin the future from the prediction algorithm of network and website ranking. If they find canfind them through the algorithm and advertise in a relatively low price with long term, thebusinessmen will get more profits in the future.The content of this paper is about the prediction algorithm of paper ranking. Concretelyto say, it is about predicting the paper ranking after two years. For the prediction algorithm ofpapers’ rank, this paper proposes two algorithms which have different ways to get the results,but they can make their respective advantages complementary to each other.The first algorithm: the prediction algorithm of paper ranking based on the author’sauthorities. This algorithm tries to compute without the papers’ PageRank value to estimatewhether a paper will be popular or not in the future with some known knowledge andexperience, which includes whether the paper is written by authorities, is cited by authorities, receives abundant reference links recently and has the relatively near paper presentation dateetc. Concretively to say, the algorithm involves two steps as follows.1) Compute the authorityvalue both of the writer’s and the citer’s; they refer to the level of the writer’s writing abilityand the citation ability. And this step’s algorithm is similar to HITS algorithm. It is an iterativeprocess between the writing authority value of the writers and the citing authority value of theciters; they assign values to each other until convergence. The author’s writing authority valueand citation authority value is a pair of scoring standard, and they definite each other. Inanother way to say, the high-low of the writing authority value depends on the number of thepaper citers and the citing authority value of these citers. And in reverse, the high-low of theciting authority value depends on the writing authority value of the cited paper’s writer.2)Predict the future ranking of papers with the two kinds of the authority values.Score of one paper consists of two parts, one is all the writers’ writing authority value, and theother part is the citers’ citing authority value. The final score is the weighted sum of the twoauthority valuesThe second algorithm: the prediction algorithm of paper ranking based on HiddenMarkov Model. This algorithm is from another view to predict the ranking. It means supposethe papers’ citation has the feature of historical repetition, in another way to say, we can findthe similar observation vector of the current paper from the historical data of papers,Observation vector consists of three parts: the paper’s cited number in the last three years, thewriting authority value of the writers and the citing authority value of the citers. Then, we canestimate the paper’s cited number and the PageRank value in the future by these similarobservation vectors. Specifically, the hidden markov model is used in this algorithm, there arethree steps to complete the predicting task:1) Train and generate the HMM model though theobservation vector of the training data set.2) Compute the logarithm likelihood value of thetraining data set (T1)’s observation vector and the current observation vector by the trainedHMM model.3) Find the closest likelihood value between the T1’s observation vector and thecurrent observation vector; use this observation vector in T1to predict the future cited numberof current paper.From the experimental results, the two algorithms in this paper can complement eachother’s advantages. The advantages of the first algorithm are the computing speed and theprediction precision. Compared with FutureRank algorithm, the first algorithm has obviousadvantages in the above two aspects. Its computing speed is10times faster than FutureRankalgorithm. In predicting precision, the degree of correlation of the predictive ranking and thefuture real PageRank value ranking is0.68, which higher than FutureRank algorithm (0.59). However, the shortages of the first algorithm are also obvious. It needs the clients setparameters by themselves. And the parameters have enormous influence to the experimentalresults; the bad parameters will make the prediction precision dramatic decreasing. So theuser need have some knowledge and experience of setting parameters. While the secondalgorithm will learn and generate model automatically without setting the parameters byclients themselves, but the computing speed and the prediction precision of the secondalgorithm is a little bit lower than the first one.The two algorithms that put forward in this paper used some of the same high citedfrequency characteristics of the paper, For example, attracting a lot of cited links recently,being written by excellent writers and being cited by excellent citers, etc. While the twoalgorithms have different ways to get the results and they can make their respectiveadvantages complementary to each other. The clients can predict the paper ranking withdifferent purposes and different situations though their different characters.
Keywords/Search Tags:Citation network, Ranking prediction, PageRank, Time series prediction, Hidden MarkovModel
PDF Full Text Request
Related items