Font Size: a A A

A Joint Model For Evidence Information Extraction From Court Record Document

Posted on:2021-05-01Degree:MasterType:Thesis
Country:ChinaCandidate:P TaoFull Text:PDF
GTID:2428330629984459Subject:Cyberspace security
Abstract/Summary:PDF Full Text Request
Most of the existing information extraction research focused on ordinary fields' texts and less attention was paid to information extraction in professional fields,a joint model for information extraction from court record documents(CRDs)in the judicial field was proposed.Different from the text in the ordinary field,evidence information in CRDs spans multiple sentences,which increases the difficulty of extraction.Concerning this problem,an end-to-end joint model which made extraction in paragraph level was proposed by adding a paragraph classification task in addition to the extraction task.Paragraph classification and evidence extraction were combined in the proposed model,and paragraph category intermediate information was used to assist final evidence information extraction.Our main research content includes:(1)Tagging system: Based on the CRDs dataset,a new labeling strategy is proposed,which provides a theoretical basis for the joint model and effectively prevent errors propagation in the Pipeline model.The use of paragraph category information also makes up for the shortcomings of using BIO tagging strategies directly.(2)Baseline models: By using the dataset generated by the labeling strategy for the classification model and the sequence labeling model,two pipeline models and four end-to-end evidence information extraction models were trained and were compared with our joint model.(3)Joint model: Concerning that the evidence information of CRDs might span multiple sentences,we encoded inputs on paragraph level,compared three word embedding methods to select the best one,and used the label attention mechanism before predicting the labels,which enriched the expression of the vector and better encoded paragraph information.(4)Experimental comparison: We used a lot of experimental analysis to prove the rationality of the proposed model and explained the reasons for the unsatisfactory effects of certain categories through error analysis.Finally,we discussed the future improvement ideas.Experimental results confirm the effectiveness of the proposed model.The F1 score of the proposed model is 72.36% on the constructed dataset,which exceeds existing models.
Keywords/Search Tags:Natural Language Processing, Information Extraction, Attention Mechanism, Joint Model, Court Record Document
PDF Full Text Request
Related items