Font Size: a A A

Citation Recommendation Based On Gradient Boosted Regression Trees

Posted on:2017-04-11Degree:MasterType:Thesis
Country:ChinaCandidate:J P ChenFull Text:PDF
GTID:2308330503458990Subject:Software engineering
Abstract/Summary:PDF Full Text Request
In recent years, science and technology are developing rapidly. The quantity of scientific literatures is growing in an explosive way. The publication of literatures is help to promote academic exchanges between researchers, which causes science and technology to develop rapidly. Due to the large quantity of literatures, it is hard to find the proper references for researchers. Citation recommendation task is to recommend proper references according to the users’ query input, which will increase the efficiency of searching references.The main research content of this paper is citation recommendation, which is to recommend papers which maybe cited as references according to the title and abstract of a paper. We designed and implemented a citation recommender system based on gradient boosted regression trees. We treat the task as a classification problem. The two key points of this paper are the evaluation of the influence of candidate citations and the relevance between the query input and candidate citations. We use future citation counts of a paper to estimate its influence and treat the prediction of citation counts as a regression problem.To predict citation counts of papers, we extract content features based on topic distributions of papers, extract author features based on author collaboration matrix and ensemble regression models by Stacking method.To recommend citations, we extract relevance features based on vector space model and KL divergence and use gradient boosted regression trees as the classification model. The classification features include content features and author features which are used to predict citation counts and relevance features between the query input and candidate citations.The experimental results on KDD CUP show that predicting citation counts based on Stacking method is superior to predicting based on single regression model, and recommending citations based on gradient boosted regression trees is superior to searching based on Lucene. The experiment results show that the approach in this paper can be conducted effectively.
Keywords/Search Tags:citation counts prediction, citation recommendation, Stacking method, gradient boosted regression trees
PDF Full Text Request
Related items