Automatic Data Usage Identification In Scientific Articles

Posted on:2018-08-23

Degree:Master

Type:Thesis

Country:China

Candidate:Q Z Zhang

Full Text:PDF

GTID:2348330515997537

Subject:Information Science

Abstract/Summary:

PDF Full Text Request

With the development of data processing and storage technology,the effective management of scientific data and data-based research behavior have been paid more and more attention.In order to better study the behavior of data usage,this paper establishes the research goal of data use behavior identification based on automatic text mining,and tries to analyze the of data usage behavior in the field on the basis of identification.In the field of computer science,computational experiments are usually performed exactly as the program code set by the author.Therefore,both the experimental data and the program code have the possibility of reuse,and sharing and reusing such scientific data will significantly facilitate the development of the discipline.Meanwhile,data management practices in computer science are still in primary stage relative to other disciplines such as biomedical field.In this context above,the author chooses data usage behavior in computer science as the study object.In this paper,we first construct the training set using academic articles in computer science,and use the bootstrapping-based unsupervised training process to obtain the patterns for extracting data usage statements.Based on the idea of analyzing part of speech tags and word frequency of data usage statements,candidate set of open-access datasets are obtained.Supplemented by rule-based automatic filtering and partial manual intervention,the final set of open-access data sets in the computer field are constructed.In the end,data usage identification is achieved based on both data usage statements extraction and construction of open-access data sets collection in the field.Using our method,the pattern list obtained from the training set can be effective in data usage statement extraction,so that the article can be judged by the number of statements included within it.The experimental results show that the F-1 value of at the level of article is more than 85%,and the comprehensive accuracy of at the level of dataset is 72.88%with the help of open-access datasets collection.As a primary application of data use recognition,this paper explores data use behavior in the sub-field of pattern recognition.The results show that no matter from the point of view of subjects using data or data objects used,data usage and data reuse are becoming more and more widely observed in the field of pattern recognition.The data usage tendency between different countries or institutions is basically consistent,while there is a slight difference in the tendency to choose to use self-built data or third-party data.

Keywords/Search Tags:

Data usage identification, Data reuse tracking, Information extraction, Open Information Extraction, Bootstrapping

PDF Full Text Request

Related items

1	The Research Of Open Information Extraction System
2	Neural Network-based Open Information Extraction And Its Application
3	A Grammar And Dependency Information Based Relation Extraction System For Streaming Data
4	Research Of Named Entity Relation Extraction Method Based On Bootstrapping
5	Research On Key Technologies Of Web Data Extraction And Mining On Open Source Community
6	Clause Based Open Domain Information Extraction
7	Research On Data Infringement Tracking Problem Based On Numerical Information Extraction Technology
8	Research And Implementation Of Data Extraction Oriented To Knowledge Graph
9	Research On Web Data Extraction Technology
10	Design And Implementation Of Large-Scale Open Information Extraction System