Font Size: a A A

Automatic Data Usage Identification In Scientific Articles

Posted on:2018-08-23Degree:MasterType:Thesis
Country:ChinaCandidate:Q Z ZhangFull Text:PDF
GTID:2348330515997537Subject:Information Science
Abstract/Summary:PDF Full Text Request
With the development of data processing and storage technology,the effective management of scientific data and data-based research behavior have been paid more and more attention.In order to better study the behavior of data usage,this paper establishes the research goal of data use behavior identification based on automatic text mining,and tries to analyze the of data usage behavior in the field on the basis of identification.In the field of computer science,computational experiments are usually performed exactly as the program code set by the author.Therefore,both the experimental data and the program code have the possibility of reuse,and sharing and reusing such scientific data will significantly facilitate the development of the discipline.Meanwhile,data management practices in computer science are still in primary stage relative to other disciplines such as biomedical field.In this context above,the author chooses data usage behavior in computer science as the study object.In this paper,we first construct the training set using academic articles in computer science,and use the bootstrapping-based unsupervised training process to obtain the patterns for extracting data usage statements.Based on the idea of analyzing part of speech tags and word frequency of data usage statements,candidate set of open-access datasets are obtained.Supplemented by rule-based automatic filtering and partial manual intervention,the final set of open-access data sets in the computer field are constructed.In the end,data usage identification is achieved based on both data usage statements extraction and construction of open-access data sets collection in the field.Using our method,the pattern list obtained from the training set can be effective in data usage statement extraction,so that the article can be judged by the number of statements included within it.The experimental results show that the F-1 value of at the level of article is more than 85%,and the comprehensive accuracy of at the level of dataset is 72.88%with the help of open-access datasets collection.As a primary application of data use recognition,this paper explores data use behavior in the sub-field of pattern recognition.The results show that no matter from the point of view of subjects using data or data objects used,data usage and data reuse are becoming more and more widely observed in the field of pattern recognition.The data usage tendency between different countries or institutions is basically consistent,while there is a slight difference in the tendency to choose to use self-built data or third-party data.
Keywords/Search Tags:Data usage identification, Data reuse tracking, Information extraction, Open Information Extraction, Bootstrapping
PDF Full Text Request
Related items