A Joint Model For Evidence Information Extraction From Court Record Document

Posted on:2021-05-01

Degree:Master

Type:Thesis

Country:China

Candidate:P Tao

Full Text:PDF

GTID:2428330629984459

Subject:Cyberspace security

Abstract/Summary:

PDF Full Text Request

Most of the existing information extraction research focused on ordinary fields' texts and less attention was paid to information extraction in professional fields,a joint model for information extraction from court record documents(CRDs)in the judicial field was proposed.Different from the text in the ordinary field,evidence information in CRDs spans multiple sentences,which increases the difficulty of extraction.Concerning this problem,an end-to-end joint model which made extraction in paragraph level was proposed by adding a paragraph classification task in addition to the extraction task.Paragraph classification and evidence extraction were combined in the proposed model,and paragraph category intermediate information was used to assist final evidence information extraction.Our main research content includes:(1)Tagging system: Based on the CRDs dataset,a new labeling strategy is proposed,which provides a theoretical basis for the joint model and effectively prevent errors propagation in the Pipeline model.The use of paragraph category information also makes up for the shortcomings of using BIO tagging strategies directly.(2)Baseline models: By using the dataset generated by the labeling strategy for the classification model and the sequence labeling model,two pipeline models and four end-to-end evidence information extraction models were trained and were compared with our joint model.(3)Joint model: Concerning that the evidence information of CRDs might span multiple sentences,we encoded inputs on paragraph level,compared three word embedding methods to select the best one,and used the label attention mechanism before predicting the labels,which enriched the expression of the vector and better encoded paragraph information.(4)Experimental comparison: We used a lot of experimental analysis to prove the rationality of the proposed model and explained the reasons for the unsatisfactory effects of certain categories through error analysis.Finally,we discussed the future improvement ideas.Experimental results confirm the effectiveness of the proposed model.The F1 score of the proposed model is 72.36% on the constructed dataset,which exceeds existing models.

Keywords/Search Tags:

Natural Language Processing, Information Extraction, Attention Mechanism, Joint Model, Court Record Document

PDF Full Text Request

Related items

1	Question Answering Model Based On Self-Attention Mechanism
2	Research And Application Of Document Semantic Representation Method
3	Research On Natural Language Understanding Of Air Travel Based On Joint Modeling
4	Statistic-based Automatic Keypharse Extraction And Summarization From Multi-document
5	BERT-based Two-stage Long Document Retrieval Model Fused With Supplementary Information
6	Research And Implementation Of Text-oriented Entity Relation Extraction Technology
7	Relation Extraction Based On Multi-layered Attention Mechanism And Bias Adjustment
8	Research On Machine Learning For Natural Language Processing And Transmission
9	Reading Comprehension Model Based On Two-way Attention Mechanism And Conditional Random Field
10	Natural Language Processing Aiming To The Core Texts Of Scientific Literature