Font Size: a A A

A Study On Fine-grained Numerical Information Extraction

Posted on:2021-03-19Degree:MasterType:Thesis
Country:ChinaCandidate:J J YuFull Text:PDF
GTID:2428330647950758Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Because of the characteristics of massive amount and abundant information content,text data is a very important data source in information extraction.Researchers and the industry paid much attention to related technologies,and defined some classic information extraction tasks,such as named entity recognition,relationship extraction,attribute extraction and event extraction.Besides the information that can be extracted by the classic tasks,there is also a large amount of numerical information in the text data.The expression of these numerical information is of great diversity,and brings challenges to the methods of identifying and processing numerical information.In recent years,there have been some explorations on the semantic role framework of numerical information in related fields,but their representations of numerical information are not precise enough,and mainly focuse on specific aspects with special expressions,such as time and currency.Therefore,this paper focuses on the problem of fine-grained numerical information extraction,and the main research work includes the following aspects:(1)Problem definition.In this paper,the target scope is illustrated,the problem definition is improved by observing the numerical information in the text,and the numerical semantic framework to be extracted called NIR is designed.(2)Extraction method.This paper proposes a numerical information extraction method called BERT-NIE,which includes four steps: preprocessing,trigger words identification,numerical information unit extraction,and relationship determination between the numerical structures.For the last two key steps,this paper attempts the deep learning method,by constructing a network based on the BERT model,to alleviate the precision loss and labor cost of traditional rules and statistical methods.(3)Dataset construction and experiment.In this paper,a corresponding annotation guideline is designed for NIR,and 96 Wikipedia pages are selected to construct FGNIED dataset.This paper investigates the performance of numerical information unit extraction and numerical information structure relation classification on this dataset.The results show that the joint learning model of numerical information unit extraction improves the performance of the independent model on some sub-tasks.In addition,this paper also gives an analysis of the selection strategy of key parameters in the two tasks.The experiments show that the NIR semantic framework and BERT-NIE method proposed in this paper is effective in numerical information semantic extraction.
Keywords/Search Tags:Numerical Information, Information Extraction, Joint Learning, Semantic Role Labeling, Attention Machanism
PDF Full Text Request
Related items