Font Size: a A A

Construction And Research Of Knowledge Graph For Thangka Culture

Posted on:2023-02-08Degree:MasterType:Thesis
Country:ChinaCandidate:C Z LiFull Text:PDF
GTID:2545306848493944Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the development of science and technology of The Times,the paperless text information is accelerated,and the rich knowledge of thangka culture,such as geography,history,religion and medicine,is further precipitated,but at the same time,it also brings the disadvantages of complex and redundant knowledge that is difficult to extract.In order to extract the accumulated knowledge in Thangka texts,this paper uses knowledge graph construction technology to sort out and systematize Thangka culture,and realize the transformation of Thangka cultural knowledge from unstructured to structured knowledge.Based on the components of the knowledge graph,this paper focuses on the research on key technologies such as entity recognition and relation extraction,aiming at the text data of Thangka culture.The main achievements are as follows:(1)This paper proposes to construct a Thangka culture text dataset,which can be divided into two parts: dataset acquisition and dataset labeling.The acquisition of Thangka text mainly adopts the recognition method of web crawler and Optical Character Recognition(OCR)technology.In the aspect of Thangka text data annotation,Brat[1] tool is used for data annotation,and finally a total of 3942 entities and 1756 relations between entities are annotated.(2)For the named entity recognition task,this paper introduces in detail Conditional Random Field(CRF)and Bidirectional Long Short Term Memory Networks(Bi-Lstm)to build models on the Thangka dataset process,and then combined the advantages and disadvantages of both CRF and Bi-Lstm to build a Bi-Lstm+CRF model on the Thangka culture dataset,and finally proved that BiLstm+CRF in the Thangka culture dataset through multiple in-depth experiments Under the premise of taking into account the time overhead,it can achieve better named entity recognition results than the CRF and Bi-Lstm models.(3)A model(Bs-Spert)for joint extraction of cross-domain entities and entity relations based on Transformer’s Bidirectional Encoder Representation from Transformers(Bert)is proposed.The model is mainly composed of span classification module and span filter module.,a cluster search module and a relationship classification module,the experimental performance of the model on the Thangka culture data set under different search beam widths and different pooling functions was longitudinally explored,and finally the Thangka culture data set was compared horizontally with other The classic entity-relationship joint extraction model proves that the model proposed in this paper has excellent entity and entity-relationship joint extraction performance on the Thangka culture dataset.(4)On the basis of completing the research on Thangka Named Entity Recognition and Thangka Entity Relation Extraction in Thangka Text Dataset,this paper builds a web-based Thangka cultural knowledge graph display platform to realize the retrieval of Thangka natural language texts.Named entity recognition and relation extraction visual query capabilities.
Keywords/Search Tags:Thangka Dataset, Entity Recognition, Relation Extraction, Thangka Visualization
PDF Full Text Request
Related items