Most of the existing relation extraction methods use the relation classification model,which can not effectively obtain the new relations contained in the text.In order to alleviate this problem,open relation extraction emerges as the times require.This extraction method does not need to define the categories of relations to be extracted in advance,and can effectively find new relations among entities.However,it brings two new problems:one is that the mainstream open relation extraction model is limited by the "experience" of experts,and the preprocessing tools will lead to error transmission and affect the extraction accuracy;the other is that the open relation extraction cannot focus the domain,and the extracted relation tuples cannot reflect the domain background.Based on the previous research results,this paper proposes an improved open relation extraction method,and identifies the domain of the extraction results.At the same time,deep learning is introduced into Chinese open relation extraction task,and a new extraction idea is designed and implemented.The main research work is as follows:(1)This paper proposes an improved Chinese open relation extraction method based on syntactic structure,and summarizes eight dependency path rules according to the characteristics of corpus language structure.The core of this method is to map the dependency syntax tree of a sentence to the dependency path rules,without restricting the relative position between entities and relationships,and to deeply mine the dependency semantics behind the dependency path.In order to improve the accuracy of extraction,shallow sentence features are used to filter the candidate tuples.The experimental results show that,compared with the reference model,the F value increases by 4.22%.At the same time,this method provides a high-quality corpus for the subsequent extraction model based on deep learning.(2)The method of domain relation recognition is proposed.The purpose is to process the results of open relation extraction and identify the relation tuples in the target domain.The core of this method is to use the pre training language model to transform the relation tuple into a dynamic semantic matrix to alleviate the problem of semantic deficiency of relation tuple,and use the TextCNN model to extract its features and generate the deep semantic representation of relation tuple.The experimental results show that compared with the reference model,F values are increased by 4.5%and 3%respectively.(3)This paper proposes a Chinese open relation extraction model based on pointer generation network,PG-CORE.The core of the model is to cast the open relation extraction task as a text summarization problem,and use the pointer generation network mechanism to make the model focus on generating the word units in the original sentence.The training set of the model comes from the high-quality corpus provided by the first research work,so that the model can effectively capture the semantic and language structure information contained in the sentences to be extracted.The experimental results show that,compared with the reference model,the accuracy is improved by 1.61%,which proves the effectiveness and feasibility of the deep learning model in the field of Chinese open relation extraction.Finally,based on the first and second research work,a domain oriented real-time open relation extraction prototype system is built.Kafka and canal are used as middleware to incrementally update the extracted results from MySql to Neo4j in real time,and provide a visual interface for users to query the extracted results. |