Font Size: a A A

A Study On Construction Of Chinese Local Chronicles Knowledge Graph Under Cold-Start

Posted on:2022-03-13Degree:MasterType:Thesis
Country:ChinaCandidate:X XiongFull Text:PDF
GTID:2545306725989729Subject:Information Science
Abstract/Summary:PDF Full Text Request
The local chronicles not only have the effect of “keeping history,politics and inspiration”,but also is essential information resource for the development of Chinese politics and economy.However,currently the utilization of local chronicles by the libraries and archives is still at the beginning stage of computerization and digitization,and the large-scale chorography resources are in loose linear structure,cannot provide efficient services on knowledge association,aggregation and sharing.Under such a background,how to utilize the open knowledge organization capability of knowledge graph to fully explore the knowledge value of local chronicles,is one of the promising and challenging research filed in digital humanities.Although the techniques for the construction of general knowledge graph have gradually matured,the local chronicles with different length are characterized with a wide span both of space and time,scattered narration,numerous characters and complicated relations,now the construction and application of Chinese local chronicles knowledge graph still face the cold-start problem due to lack of annotated samples,and have difficulty in automation,error propagation and accumulation between multi-task,the particularity of Chinese,the inability to acquire dynamic knowledge and so on.In view of the above challenges,study and integration of related methods on the construction of Chinese local chronicles knowledge graph is not only helpful to promote the knowledge-based construction of local chronicles,but also is of great importance for the technical development of domain knowledge graph.Aiming at the construction and application of Chinese local chronicles knowledge graph,this paper concentrates on Chinese Knowledge extraction and representation under cold-start,and proposes KGMTP based on Multi-Task Learning,which divided whole process into three phases including entity extraction,relation extraction and visualization of knowledge graph.In the phase of entity extraction,the HNNEE model is used for the entities with large-scale and unobvious rule of composition,with the traditional machine learning method as the baseline,considering the particularity of Chinese and OOV problem,I improved the quality of entity extraction by adjusting the strategy of text representation,sequence tagging and feature engineering.Next,the task of relation extraction is modeled as a problem of sentence-level classification.With the preparation of the data set of Chinese relation by means of distant supervision,I combine the information of distance and context from entity pairs to classify relation,and add the categories of candidate entity pairs to constraint interaction and reduce the overall computational complexity,which can promote the effect of relation classification.Finally,I use RDF to express and store the results of entity and relation extraction,designed and developed a graphic retrieval system based on Chinese local chronicles knowledge graph.During the experiments,the primary finding are as follows.Compared with traditional models like Bag of Words model and Word Embeddings model,the pre-training language model is more capable in text representation.And CRF is suitable to solve the global optimal probability for entity extraction,especially for long entities.Actually Chinese word segmentation and entity extraction complements each other,the combination of part-of-speech features which as a kind of prior knowledge and pre-training language model can also improve Chinese entity extraction to some extent.Moreover,the quality of entity extraction has a direct impact on relation extraction afterwards,and the fusion of the category feature from entity pairs can effectively promote relation extraction.Overall,in this paper,KGMTP model integrated with transfer learning and distant supervision can basically solves the cold-start problem caused by the lack of annotated samples and the consuming and laborious manual labeling,and the construction and application of Chinese local chronicles knowledge graph are completed in some certain scope.I hope that our models,methods and conclusions can be extended to larger corpus with wider range and more categories,which can also provide reference for constructing domain knowledge graph lacking of annotation.
Keywords/Search Tags:Local Chronicles, Knowledge Graph, Transfer Learning, Distant Supervision, KGMTP, BERT, Char2Vec
PDF Full Text Request
Related items