Font Size: a A A

Information Recognition And Extraction From Chinese Periodical Papers Based On Conditional Random Fields

Posted on:2020-11-24Degree:MasterType:Thesis
Country:ChinaCandidate:H H XueFull Text:PDF
GTID:2428330575951848Subject:Agricultural information management
Abstract/Summary:PDF Full Text Request
As a carrier of knowledge information and an important channel for researchers to acquire professional knowledge,journal papers play an extremely important role in promoting the promotion of professional technology and the dissemination of research results.Relevant research based on the full-text information of journal articles is conducive to improving the use value of journal resources and the efficiency of users' access to information.At present,there are many related tools for the extraction of paper information,but it is not efficient in the extraction of Chinese journal articles.Therefore,on this basis,this paper makes an improvement on the existing paper information extraction tools to make it better applied in the Chinese field.Through the comparative analysis of the methods and tools of journal paper information extraction,this paper selects the conditional random field algorithm and GROBID tool to identify and extract the Chinese journal paper information.The main research contents and achievements include:(1)In-depth comparative analysis of the relevant methods and tools for the identification and extraction of journal paper information,and found that the conditional random field algorithm and GROBID tool have higher accuracy in the paper information extraction.Therefore,this paper uses conditional random field algorithm and GROBID tool to identify and extract the information of Chinese periodical papers.At the same time,the key technologies of information extraction of Chinese journal articles based on conditional random fields are introduced in detail.(2)Based on the conditional random field algorithm and GROBID tool,the Chinese journal paper information recognition and extraction cascade model is constructed,including the segmentation model,the header model,the reference-segmentation model,the citation model and the fulltext model.Aiming at the characteristics of Chinese journal paper information,the design and implementation of the model is completed through a series of processes such as text preprocessing,feature selection,sequence labeling and feature template.(3)Select 12 kinds of agricultural journals in the agricultural field to train the model,and use the accuracy,precision,recall and F1 values to evaluate the effects of each model,and compare it with the extraction effect of the GROBID tool.The experimental results show that the effect of the segmentation model,the header model,the reference-segmentation model and the citation model in the Chinese journal paper information extraction model is significantly improved compared with the GROBID tool.The model can accurately and efficiently identify and extract the Chinese journal paper head information and citations information.
Keywords/Search Tags:Information extraction of paper, Conditional random field, Cascade extraction model, GROBID
PDF Full Text Request
Related items