Triple extraction,which can represent unstructured text in a structured form,is an important research topic in natural language processing.Triple extraction of judgment documents plays an critical role in the upstream construction of knowledge graphs,the establishment of retrieval systems,automatic question and answer systems,etc.It can help the judgment system in knowledge expression and knowledge reasoning,and promote the construction of intelligent justice.In judgment documents,it is difficult to define the entities by predefined relation sets,and there is a lack of large-scale annotated corpus,which leads to the inappropriatability of the traditional entity relation extraction method by training classifier or neural network model.However,the current common open Chinese triple extraction methods usually only consider the shallow syntax and location characteristics of the sentence,or only start from the head word of the sentence,and cannot be effectively and completely extracted.Aiming at the above problems,this paper proposes a triple extraction method based on Dependency Syntax Extraction Pattern and a triple extraction method combining pre-training model and DSEP,and designs comparative experiments for verification.The work of this paper is supported by the National Key R & D Program Project "Internal and External Connected Trial Execution and Litigation Services Collaborative Support Technology Research"(2018YFC0831300).The main work of this paper is as follows:(1)To solve the problem that it is difficult to use predefined relation set and lack annotated data set for judgment documents,a method of triple extraction based on Dependency Syntax Extraction Pattern is proposed.This paper conducts a statistical analysis on the text of judgment documents and summarizes three common language features.The research finds that these language features can be effectively reflected by the sentence dependency analysis tree.Based on this,the entity and relation description of the sentence are mapped to the dependency analysis tree,and 8 dependency syntax extraction patterns DSEP(Dependency Syntax Extraction Pattern)are proposed,and the extraction algorithm is designed based on DSEP.This method regards all nouns and noun phrases in the sentence as entities,and discovers the relation descriptors by combining them as candidate entity pairs to match DSEP,and is not limited to using head word as relation.This method does not rely on any manual labeling.The experimental results show that the accuracy and recall value of this method are higher than the existing methods CORE,Un CORE and ZORE.(2)In order to further improve the effect of triple extraction,in view of the problems of LTP processing error and incomplete DSEP coverage in the above methods,a triple extraction method combining pre-training model and DSEP is proposed.Using the extraction results of the former method,supplemented by small samples annotated manually,can form the annotated data set.Therefore,a neural network model is designed to modify the extraction effect of the former method by training it.In this paper,triple extraction is modeled as a sequence labeling task,the entities and relation descriptors in the sentence are multi-labeled,and the pre-training model BERT(Bidirectional Encoder Representations from Transformers)is used as the encoder of the sentence.Use BERT to capture the context information of the input sentence,and obtain a distributed representation of it;The downstream uses the fully connected layer activated by Softmax to perform multi-label classification for each word;Considering that there is context correlation between labels of output sequence,the CRF layer is used to introduce the context information of labels.The experimental results show that this method can effectively improve the extraction effect of the previous method,and both the accuracy and the recall value are improved. |