Font Size: a A A

Research And Realization Of Domain Knowledge Graph Construction Method Based On Text Mining

Posted on:2020-10-01Degree:MasterType:Thesis
Country:ChinaCandidate:X Y LiuFull Text:PDF
GTID:2428330578457083Subject:Electronic and communication engineering
Abstract/Summary:PDF Full Text Request
As a semantic network which describes the entities in nature and their mutual relations,knowledge graph has been widely used in various industries.A perfect domain knowledge graph can assist computer to understand the related knowledge,so as to provide further help in improving the work efficiency and quality of practitioners.The core technology in constructing knowledge graph is the entity relation extraction.Usually,the accuracy of Chinese entity relation extraction only maintains from 60%to 70%at present stage.However,the construction of domain knowledge graph still needs to face a lot of problems,such as the lack of training corpus,heavy dependence on labor and it is difficult for construction methods to transplant among crossing domains and so on.In response to the above problems,combined with the huge and complicated data resources in the current legal field,there is urgent need of effective organization and utilization for current situation.This paper proposes a method of constructing the legal domain knowledge graph on the basis of text mining and implements it.The specific work is shown as follows:(1)In view of the current situation of lacking domain training corpus,a training corpus construction method based on the distant supervision is put forward.This paper collects the structured information under the terms related to legal concepts on encyclopedia as the initial triplet,and then the training corpus is automatically obtained by using the method of distant supervision to retrieve encyclopedia texts.Furthermore,the triple expansion algorithm and the filtering of relational characteristic words are proposed to solve the problems of less quantity and noise in automatic obtain of corpus.(2)According to different types of entity relation to extract tasks,two extracting methods are proposed.The first one is the entity relation extraction method based on the maximum entropy model.This method is based on the idea of relation classification and uses n-pattern feature extraction method to represent the differences of all kinds of relational texts,so as to complete the task of relation extraction in limited types.The second one is the entity relation extraction method which combines CRF and syntax analysis tree.This method relies on the sequence tagging and syntax analysis idea,which can accomplish any kind of relation extraction task.The experimental results show that the accuracy of the two methods put forward in this paper can reach the accuracy over 72%of extraction results.Compared with the existing method of entity relation extraction,the proposed method has obvious optimization effect.(3)Taking the legal knowledge triple obtained in preceding links as the data sources,a RDF file storage scheme based on the Neo4j graphic database is proposed to store the triple.This paper further researches on the modular division of labor management between the construction of legal knowledge graph and the application system.In addition,realizing the query and display function of the legal knowledge graph through the visualization platform of the system application module.
Keywords/Search Tags:knowledge graph, distant supervision, entity relation extraction, triplet, Neo4j
PDF Full Text Request
Related items