Font Size: a A A

Author-Analysis Based On Content And Structure

Posted on:2013-01-06Degree:MasterType:Thesis
Country:ChinaCandidate:W M ZhangFull Text:PDF
GTID:2218330374467139Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
With the development of internet, exponentially growth of web information, text information has become one of the key parts. Scientific paper is an essential way of sharing and learning knowledge and it has become the significant resource in the academic activities. Moreover, the number of papers is still increasing as the_continuous development of science and technology. However, papers are different from other literary forms in contents and structures.Reviewing papers is an indispensable step which the papers have to go through before being published. Although double blind review, which means the reviewers don't know the real authors of the paper, has been popular adopted in the reviewing process, most of the reviewers can guess the authors of the paper with the combination information of the paper itself and their background knowledge. There are many systems focused on academic papers, but a system guessing the authors from the reviewers' side is rarely.In this paper, we analyzed the structures of the papers and built a model for the papers. We constructed feature vectors for each candidate authors from the references. Then Rank SVM model has been employed to imitate the reviewers' mode of thoughts to identify the authors. We did not only guess the authors for the papers under double blind reviewing, but also analyzed the researchers' continuity and diversity of their research. Experiments conducted on the datasets of SIGMOD and VLDB showed that Rank SVM performs effectively for the authors' identification at high precisions. And an author-centric analysis system, ACARP, has been built. The main contributions for this paper are listed below:Firstly, we proposed a model for the papers based on the compact structure of paper. Each paper can be divided into five parts, which are titles, authors, abstracts, contents and references. Among these resources, references are the main information provider.Secondly, Rank SVM was introduced to guess the authors. Based on the paper's models, feature vectors of candidate authors were constructed from the information extracted from paper. Then a Rank SVM model was trained to predict the authors. Moreover, we demonstrated our research on authors. We not only provide the methods to guess authors, but also analyzed continuity and diversity of authors'researches. Different from the other studies on the authors, information from the main body and references were combined with citation maps are used in prediction. On the other side, we analyzed the styles of researchers by using the results of prediction.Finally, a set of experiments has been done, and an author-centric analyzing system has been built. We use the datasets from SIGMOD and VLDB, which are the two most important conferences in the database community, during1994and2010. The experiments were focused on precision analysis, features analysis and etc. The author-centric system, ACARP, is a mining-based systemfor analyzing research papers in the database community.In the conclusions, we not only introduced guessing the authors by Rank SVM, but also analyzed the diversity and continuity of the authors based on guessing results. A conclusion can be drawn from the experiments as the Rank SVM can predict the authors effectively and precisely. And it also raises the accuracy of guessing results by using both the information from the main texts and citations. Furthermore, the results showed that there is not an obviously effect by introducing double blind review.
Keywords/Search Tags:Paper, Author Prediction, Leaening to Rank, Rank SVM
PDF Full Text Request
Related items