Font Size: a A A

Research On Information Extraction Technology And Its Application In The Financial Field

Posted on:2023-10-17Degree:MasterType:Thesis
Country:ChinaCandidate:Y QueFull Text:PDF
GTID:2569306770462064Subject:Applied statistics
Abstract/Summary:PDF Full Text Request
In the era of continuou~As d~Bev~Selo~Tpm~Rent~A of~C In~Tternet technology,it is very important to automatically and accurately obtain information in the face of the complex and rich network information.Information extraction technology can automatically classify,extract and reconstruct unstructured text,mainly including named entity recognition,entity relationship extraction and event extraction.In various vertical fields,the information needs of the financial industry are particularly obvious.Obtaining structured information from massive financial texts quickly and accurately is helpful for regulators and investors to make scientific decisions.In practice,financial information is often used in announcements,financial news And research reports are the content carrier,and the text is long and contains dynamic event descriptions.Therefore,starting from the application value,this paper hopes to achieve event extraction for long financial texts.In order to complete the difficult event extraction for complex texts,the work of this paper is divided into two parts in sequence.First,entity relationship extraction is performed for short texts in encyclopedias that are not limited to professional fields,and event extraction is further performed for long texts in the financial field.Experience with sequence labeling models,while proposing innovative improvements based on the complexity of the scene.The main contents are summarized as follows:Firstly,entity relationship extraction is completed with the idea of pipeline extraction.The BERT-Bi LSTM-CRF model is established for the named entity recognition task in the first stage,and then the obtained entities are paired to enter the relationship classification task in the second stage.Compared with the Text CNN model of the mechanism and the R-BERT model of the input layer,the comparison experiments show that the introduction of dynamic word vectors can better interpret semantic information than static word vectors,and R-BERT integrates entity information through the transformation of the input layer.The effect is simple and concise.better.Then,entity relationship extraction is completed through the idea of joint extraction,which avoids the accumulation of errors in pipeline extraction methods and the redundancy of samples caused by entity matching.Using the idea of sequence labeling,the BERT-Bi GRU-CRF model is constructed,and the relationship type,subject-object information and entity type information are added on the basis of the original BIO tag,so the relationship triplet can be obtained at the same time,and the problem of subject-object dislocation can be reduced.However,the rewritten label categories increase.In order to speed up the training efficiency,Bi LSTM is replaced by Bi GRU structure.Comparative experiments show that both BERT and Bi GRU structures enhance the learning ability of semantic features respectively.Finally,based on the previous experience,event extraction is performed on the chapter-level text in the financial field,but there are three main difficulties:(1)the length of the chapter text exceeds the limit of the BERT input sequence;(2)the chapter text may contain multiple events,between events Existential argument sharing,(3)The financial text language is highly normative,and there are many professional vocabulary.In view of the above difficulties,this paper proposes a multi-layer semantic enhancement model based on BERT,and uses the pipeline extraction method to decouple sub-tasks:trigger word extraction and event discrimination,argument extraction and role assignment,where arguments include entity argument extraction and attribute argument.Extraction,attribute argument extraction is converted into BERT-based text classification tasks,trigger word extraction and entity argument extraction are converted into sequence labeling tasks,all using BERT-based multi-layer semantic enhancement model,text segmentation input BERT to obtain dynamic word vector representation,after splicing,enter Bi GRU to learn contextual sentence-level features and CNN to learn lexical-level features of different lengths.After feature splicing,CRF decoding is performed;in particular,entity argument extraction needs to additionally integrate the event category information obtained in the previous stage.The original text is spliced into two sentences and input to BERT,so that BERT pays more attention to the argument information related to the event category in the original text,and extracting arguments for each event category separately can avoid the sharing and overlapping of arguments of different events.Through experimental comparison,it can be demonstrated that the method in this paper can solve the difficulty of event extraction from financial text to a certain extent.
Keywords/Search Tags:relation extraction, event extraction, financial text, sequence labeling, BERT
PDF Full Text Request
Related items