Font Size: a A A

Scientific Data Acquiring And Experts Similarity Researching Based On Open Access

Posted on:2017-04-21Degree:MasterType:Thesis
Country:ChinaCandidate:X HuangFull Text:PDF
GTID:2308330482981786Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With rapid development of the Internet, the way people get scientific literatures has changed fundamentally. Reading and accessing literatures through network has become mainstream currently. In this situation, Open Access (OA) was born in order to promote the dissemination and using of research product. The core features of OA are providing free academic information and research products through the Internet under the premise of respecting the interests. This paper is a part of a project, open access scientific data platform, which belongs to the national natural science foundation. The platform can alleviate the current difficulties in obtaining domestic scientific literature data and enhance management capabilities for scientific data of government or research institutions.The scientific research data in National Natural Science Foundation’s database has many issues, like unstandardized data format, wrong information, property loss and so on. We build a muti-mode data acquisition and processing components to solve these problems. We obtain about 8,000,000 papers from many different authorized datasources, and add the missing property, delete the duplicate data. By doing that, we establish and maintain a massive academic database. Academic papers are rigorous and contain a lot of information, taking advantage of that information make it possible to analyse the similarity between experts and domains. Based on the database we established, this paper proposes a new method of generating expert’s scientific labels using topic model. The method computes topic distribution of every expert from papers of this expert, coupled with the word distribution of every topic we can get the words which have highest contribution to the expert as labels. We make comparisons with traditional method TF-IDF. Besides, we compute the similarity between experts using distributed representation.In fact, we compute the similarity between labels and get the similarity between experts combined with expert labels. The similarity of experts are used for recommendation. Experiments show that this similarity displays the connection of experts and similarity between expert’s domains and research content very well.
Keywords/Search Tags:Open Access, expert similarity, scientific papers, research domain, topic model, distributed representation
PDF Full Text Request
Related items