Font Size: a A A

Research And Application Of Knowledge Graph Oriented To The Field Of History

Posted on:2024-03-26Degree:MasterType:Thesis
Country:ChinaCandidate:J J DuFull Text:PDF
GTID:2555307178973919Subject:Software engineering
Abstract/Summary:PDF Full Text Request
The digital transformation of China’s historical documents is now complete,thanks to the rapid development of computer technology.However,most of the digitized data remains unstructured,making them challenging to use efficiently.In the recent years,research in natural language processing has led to the scientific advancement of knowledge graphs,which structure and organize knowledge resources to provide more efficient and accurate support for human learning,understanding,and application.Knowledge graphs are essentially a semantic network that reveals relationships between entities,constructed by triples of "entity-relationship-entity" and "entity-property name-property value".This paper utilizes historical documents to develop a knowledge graph in the historical domain,mainly by achieving the following objectives:(1)Proposing a supervised learning-based Named Entity Recognition(NER)method for unstructured historical textbook data.We annotated the corpus to construct a historical Named Entity Recognition dataset that includes 11 entity categories,containing 3483 training set data,and 1161 validation and test set data,respectively.Based on the entity features in the dataset,we introduced a boundary-detection-based NER model that learns the features of the entity start and end positions through two self-attention networks and fused them with the sentence features extracted by the Bi-LSTM to improve the entity boundary detection ability.Our experimental results indicate that the F1 value of the model on the historical NER dataset is 1.7% higher than that of BERT-Bi-LSTM-CRF.(2)Proposing an entity relation joint extraction method that combines semantic dependency relations for unstructured ancient texts containing multiple triple information.The method processes Named Entity Recognition and relation extraction in a single model to resolve the error propagation problem caused by the pipeline method.Due to the limitation of most models in recognizing only a pair of entity relationships in text data,the method proposed in this paper employs a cascading framework to identify the subject(S),the object(O)related to the subject(S),and their relationship(P)simultaneously,to extract multiple triples in text data.To explore the contribution of syntactic components and syntactic relationships to information extraction tasks,we use the semantic dependency relationship features obtained by Harbin Institute of Technology’s natural language processing tool LTP and integrate them into the model using graph attention networks.Our experimental results demonstrate the effectiveness of this method.(3)Completing the construction work of the historical domain knowledge graph involves using spider tools to extract semi-structured data from person information websites and formulating rules to obtain triples.Concurrently,we utilize the entity recognition and relationship extraction methods proposed in this paper to extract triples from unstructured text data.Finally,we employ the methods provided by the Py2 neo module to import heterogeneous triple data into the neo4j graph database.
Keywords/Search Tags:Knowledge graph, Named entity recognition, Entity-relation joint extraction
PDF Full Text Request
Related items