Design And Implementation Of Entity Linking System For Chinese Novels

Posted on:2023-06-09

Degree:Master

Type:Thesis

Country:China

Candidate:C Wang

Full Text:PDF

GTID:2568306914957809

Subject:Electronic and communication engineering

Abstract/Summary:

PDF Full Text Request

There are widespread ambiguity and irregularities in natural language.Chinese language culture are extensive and profound,and have rich semantics and expressions.Especially in literary works like novels,novels often have a large number of characters or nicknames,many complex scenes and organizations.Entities such as characters,scenes,and organizations in these novels also bring many ambiguities,which bring great challenges to the downstream tasks of natural language processing in many novel scenes.For example,the attribute extraction of novels,relationship extraction and knowledge graph construction all need to solve the problem of entity ambiguity first.Therefore,it is very practical to establish an automated novel entity disambiguation system.In recent years,with the leap-forward development of mobile electronic devices and Internet technology,digital novel reading platforms based on mobile devices such as mobile phones or tablets are rapidly emerging to meet people’s fragmented reading needs.Although there has been a lot of research on entity disambiguation in the general field,in the field of complex Chinese texts such as novels,the ambiguity problem still relies on inefficient manual processing and lacks a systematic solution.To this end,this thesis constructs a Chinese entity linking model based on deep learning and a pipelined entity linking process,designs and implements an Entity Linking(EL)system for Chinese novel scenes.The system realizes entity disambiguation in novel text by identifying entity references in novel text and linking entity mentions to entities in entity knowledge base,mainly including entity mention recognition in novels,candidate entity generation and candidate entity sorting.The main innovations of this thesis are as follows:(1)Aiming at the problem that the candidate entity generation method based on alias table has low entity recall rate and cannot effectively deal with unregistered words and the low retrieval efficiency of candidate entity generation method based on deep learning in novel scenarios,this thesis adopts the nearest neighbor(KNN)based entity retrieval.The method also constructs a dual-tower neural network coarse-ranking model BiMatch Model based on similarity matching to generate a candidate entity set through two candidate entity screening,which effectively improves the recall rate of candidate entities while taking into account the speed of entity retrieval.(2)Aiming at the complex text environment of novels,the lack of topic-level information in short text input,and the difficulty of building a traditional system based on fine-grained entity types,this thesis designs a deep-level semantic matching refinement model Cross-Attention Model based on interactive attention mechanism.Secondary sorting strengthens the interaction between entity references and candidate entities through interactive mapping,and makes more effective use of semantic information and entity type information.At the same time,in order to strengthen the interaction between the algorithm models of each module,based on the idea of knowledge distillation,the interactive model is used as a teacher model to guide the dual-tower model,which effectively improves the effect of the dual-tower model and improves the accuracy of the entire entity linking system.In addition,based on novel corpus,Chinese named entity recognition dataset,Chinese encyclopedia and knowledge base,this thesis constructs Chinese novel entity reference dataset and Chinese novel entity link dataset,which are used for novel entity reference recognition and novel entity linking tasks respectively.Aiming at the problem of entity reference recognition in novels,this thesis designs a BERT+FLAT+CRF(BF-CRF)scheme for entity reference recognition based on Flat-Lattice Transformer,a vocabulary enhancement model for Chinese named entity recognition.Finally,by integrating the algorithm model of each module and the design and implementation of the front and back ends of the system,a pipelined entity link system is constructed,including a four-layer structure:service layer,scheduling layer,business capability layer and data layer,which are used to realize front-end visualization,back-end interaction,algorithm modules,data processing and storage functions.After the function and performance test of the system,it is verified that the functions and performance indicators of the system have reached expectations,with high entity link accuracy,low latency and good scalability.

Keywords/Search Tags:

entity linking, entity disambiguation, named entity recognition, fiction texts

PDF Full Text Request

Related items

1	Research On Named Entity Recognition And Disambiguation Based On Network Semantic Resource
2	Research And Implementation Of Named Entity Disambiguation Based On Wikipedia
3	Research On Document Oriented Entity Linking Method
4	Named Entity Linking Based On Multisource Knowledge
5	Research On Chinese Entity Linking Based On Online Encyclopedia
6	Research On Named Entity Recognition And Disambiguation For Short Text
7	Entity Linking Algorithm Research And System Implementation Based On Wikipedia
8	Research On Named Entity Recognition And Entity Link Method For Short Text Questions
9	Research On Key Technologies Of Named Entity Recognition And Linking Based On Representation Learning
10	Research And Implementation Of English Entity Discovery And Linking System Based On Freebase