Research And Application Of Knowledge Graph Oriented To The Field Of History

Posted on:2024-03-26

Degree:Master

Type:Thesis

Country:China

Candidate:J J Du

Full Text:PDF

GTID:2555307178973919

Subject:Software engineering

Abstract/Summary:

PDF Full Text Request

The digital transformation of China’s historical documents is now complete,thanks to the rapid development of computer technology.However,most of the digitized data remains unstructured,making them challenging to use efficiently.In the recent years,research in natural language processing has led to the scientific advancement of knowledge graphs,which structure and organize knowledge resources to provide more efficient and accurate support for human learning,understanding,and application.Knowledge graphs are essentially a semantic network that reveals relationships between entities,constructed by triples of "entity-relationship-entity" and "entity-property name-property value".This paper utilizes historical documents to develop a knowledge graph in the historical domain,mainly by achieving the following objectives:(1)Proposing a supervised learning-based Named Entity Recognition(NER)method for unstructured historical textbook data.We annotated the corpus to construct a historical Named Entity Recognition dataset that includes 11 entity categories,containing 3483 training set data,and 1161 validation and test set data,respectively.Based on the entity features in the dataset,we introduced a boundary-detection-based NER model that learns the features of the entity start and end positions through two self-attention networks and fused them with the sentence features extracted by the Bi-LSTM to improve the entity boundary detection ability.Our experimental results indicate that the F1 value of the model on the historical NER dataset is 1.7% higher than that of BERT-Bi-LSTM-CRF.(2)Proposing an entity relation joint extraction method that combines semantic dependency relations for unstructured ancient texts containing multiple triple information.The method processes Named Entity Recognition and relation extraction in a single model to resolve the error propagation problem caused by the pipeline method.Due to the limitation of most models in recognizing only a pair of entity relationships in text data,the method proposed in this paper employs a cascading framework to identify the subject(S),the object(O)related to the subject(S),and their relationship(P)simultaneously,to extract multiple triples in text data.To explore the contribution of syntactic components and syntactic relationships to information extraction tasks,we use the semantic dependency relationship features obtained by Harbin Institute of Technology’s natural language processing tool LTP and integrate them into the model using graph attention networks.Our experimental results demonstrate the effectiveness of this method.(3)Completing the construction work of the historical domain knowledge graph involves using spider tools to extract semi-structured data from person information websites and formulating rules to obtain triples.Concurrently,we utilize the entity recognition and relationship extraction methods proposed in this paper to extract triples from unstructured text data.Finally,we employ the methods provided by the Py2 neo module to import heterogeneous triple data into the neo4j graph database.

Keywords/Search Tags:

Knowledge graph, Named entity recognition, Entity-relation joint extraction

PDF Full Text Request

Related items

1	Research On Named Entity Recognition And Knowledge Graph Construction Of Chinese Classical Literature Texts
2	Research On Technologies Of Knowledge Graph Construction In Cultural Relics
3	Construction Research And System Implementation Of Knowledge Graph Of Cultural Relics In The Context Of Cultural Digitalization
4	Research On English Named Entity Extraction
5	A Named Entity Recognition Method For Text Of Han Dynasty Paintings
6	Research On Construction Technology Of Knowledge Graph Of Cultural Relics Collection
7	Research On Construction Technology Of Knowledge Graph Of Network Film
8	Research On Chinese-Vietnamese Entity Alignment Technology Based On Named Entity Recognition
9	The Construction And Analysis Of The Knowledge Graph Of Mongolian Historical Figures
10	Entity Relation Extraction Of Dermatosis Based On Dependency Syntax Analysis