Font Size: a A A

Research And Application Of Knowledge Graph Construction Method For Celebrity Thematic Data Platform

Posted on:2022-12-09Degree:MasterType:Thesis
Country:ChinaCandidate:P Q YangFull Text:PDF
GTID:2518306779971879Subject:Automation Technology
Abstract/Summary:PDF Full Text Request
Since the 21 st century,relying on the new generation of information technology and existing collection resources,the establishment of a thematic data platform of famous people with regional cultural characteristics has become one of the important tasks in the construction of digital humanities.The celebrity thematic data platform is a digital platform for the collection of distinctive resources created through the collection,collation,development and utilization of celebrity data with public effect,thereby promoting local economic development and the promotion of distinctive regional culture.As most of the traditional celebrity data materials exist in a scattered and independent unstructured form,for example,between documents and documents,between documents and deeds,between deeds and deeds,between deeds and places,people and dates,etc.Therefore,there is the problem of not being able to access the interrelationship between the two conveniently,effectively and accurately,making it difficult to build a thematic data platform on famous people with strong internal links and a strong holistic approach.To address the above problems,this paper identifies named entities in celebrity documents more effectively by studying the Chinese named entity recognition task and the knowledge graph construction method,completes the construction of a thematic knowledge graph of celebrities,and applies the graph to realize a thematic data platform of Chen Yi with strong connections between documents,deeds,places,people and dates.The research work in this paper is as follows.(1)Designed and proposed a feature enhancement module based on BILSTM and multiheaded attention mechanism,which solves the problem of not being able to integrate the overall features of the sentence or the neighborhood features of the word into the word vector in the character-based model training method,and enhances the feature extraction ability of the model;proposed to use adversarial training as the regularization method of the model,which solves the problem of poor generalization ability of the model in the case of imperfect samples of the selfbuilt dataset The problem of poor generalization of the model in the case of imperfect samples in the self-built dataset is solved,and the anti-interference ability of the model is enhanced.The experimental results show that the F1 value,recall rate and accuracy rate of the model are improved after adding the adversarial training and feature enhancement modules.(2)Designed and proposed a triplet extraction model DSTE based on the combination of dependency syntactic analysis and Chinese named entity recognition model,the DSTE model achieves the extraction of triples using the trained Chinese named entity recognition model and eight triplet extraction patterns designed according to different syntactic structures.The problem that descriptive entity and relationship extraction in celebrity literature texts is difficult to be qualified with pre-defined datasets is solved.The experimental results show that the model DSTE can achieve an f1-score of 78% on the Chen Yi event database dataset,which can effectively provide data services for the Chen Yi thematic data platform.(3)Designed and implemented the knowledge graph construction system and the Chen Yi thematic data platform.The Knowledge Graph Construction System is responsible for providing developers with services from this article to structured data,including functional modules such as syntax analysis,graph display,triad extraction and text uploading,realising the construction of knowledge graphs.The Chen Yi thematic data platform is responsible for providing users with diversified data services on Marshal Chen Yi,including functional modules such as biography list,event tree and visual map,completing the integration of information from Chen Yi's distinctive collection resources.
Keywords/Search Tags:Chinese named entity recognition, Dependency syntax analysis, Knowledge graph construction, Celebrity thematic data platform
PDF Full Text Request
Related items