Font Size: a A A

Research On Automatic Metadata Extraction And Content Mining Of Scientific Reference In Patent Based On Representation Learning

Posted on:2020-05-21Degree:MasterType:Thesis
Country:ChinaCandidate:Y M HuFull Text:PDF
GTID:2428330623464302Subject:Library and Information Science
Abstract/Summary:PDF Full Text Request
Scientific references in patent(SRP)is one of the important categories of non-patent citations.We can grasp the relationship between science and technology,and learn the knowledge flow between scientific knowledge and technology by analyzing SRPs.The relationship between science and technology by bibliometrics in most of the current studies grasp,we can get more accurate and comprehensive analysis by analyzing the metadata of the SRP's content.The metadata of the SRPs' content can be extracted from SRPs,and there are something to be improved in traditional metadata extraction method.If the highefficiency SRPs metadata extraction can be realized,the effect of SRPs analysis can be improved,and the more data support to SRP content mining can be given.Based on these questions,this paper provides a method for metadata extraction and content mining of SRPs based on representation learning.We can extract metadata of SRP,and then use the extracted metadata to mining the content of SRPs,we have conducted some exploratory research in this part.This paper has led the representation learning method into the SRPs metadata extraction to improve the traditional metadata extraction method,in order to achieve more accurate SRP metadata extraction.and then mining the content of SRPs based on the extracted metadata.First,the SRPs' recognition method is designed to identify the SRPs from the nonpatent reference of various formats;then extract the metadata from SRPs,and measure the effectiveness of the method by comparing with contrast experiment;finally,the content of SRP is mined and analyzed based on the extracted SRPs metadata,and conduct empirical analysis in the field of nanotechnology.The experimental and empirical results show,the metadata extraction method proposed is more effective than the traditional machine learning method,and the content mining method has usability and practicability.This paper mainly solves the following two problems:(1)Automatic extraction of SRPs' metadata.Firstly,all NPRs' vectors are formed by representation learning to represent all NPR data,and then SRPs are identified by classification algorithm from all categories of NPRs.Secondly,divide SPRs into some units by rules,and vectorize the semantic features and location features of all the units.Finally,the SRPs' metadata is extracted by classification algorithm.(2)Content mining of SRPs.The title of SRPs extracted by the metadata extraction experiment are put into the journal database for retrieval,and then we can get metadata of the content such as abstracts and keywords.Then clustering,co-citation,similarity calculation and the similarity of the content metadata are performed,we have conduct some exploratory researches on related patent recommendations,and similarities and differences between scientific research and technical application.
Keywords/Search Tags:Scientific references in patent, Representation learning, Metadata extraction, Content mining
PDF Full Text Request
Related items