There are widespread ambiguity and irregularities in natural language.Chinese language culture are extensive and profound,and have rich semantics and expressions.Especially in literary works like novels,novels often have a large number of characters or nicknames,many complex scenes and organizations.Entities such as characters,scenes,and organizations in these novels also bring many ambiguities,which bring great challenges to the downstream tasks of natural language processing in many novel scenes.For example,the attribute extraction of novels,relationship extraction and knowledge graph construction all need to solve the problem of entity ambiguity first.Therefore,it is very practical to establish an automated novel entity disambiguation system.In recent years,with the leap-forward development of mobile electronic devices and Internet technology,digital novel reading platforms based on mobile devices such as mobile phones or tablets are rapidly emerging to meet people’s fragmented reading needs.Although there has been a lot of research on entity disambiguation in the general field,in the field of complex Chinese texts such as novels,the ambiguity problem still relies on inefficient manual processing and lacks a systematic solution.To this end,this thesis constructs a Chinese entity linking model based on deep learning and a pipelined entity linking process,designs and implements an Entity Linking(EL)system for Chinese novel scenes.The system realizes entity disambiguation in novel text by identifying entity references in novel text and linking entity mentions to entities in entity knowledge base,mainly including entity mention recognition in novels,candidate entity generation and candidate entity sorting.The main innovations of this thesis are as follows:(1)Aiming at the problem that the candidate entity generation method based on alias table has low entity recall rate and cannot effectively deal with unregistered words and the low retrieval efficiency of candidate entity generation method based on deep learning in novel scenarios,this thesis adopts the nearest neighbor(KNN)based entity retrieval.The method also constructs a dual-tower neural network coarse-ranking model BiMatch Model based on similarity matching to generate a candidate entity set through two candidate entity screening,which effectively improves the recall rate of candidate entities while taking into account the speed of entity retrieval.(2)Aiming at the complex text environment of novels,the lack of topic-level information in short text input,and the difficulty of building a traditional system based on fine-grained entity types,this thesis designs a deep-level semantic matching refinement model Cross-Attention Model based on interactive attention mechanism.Secondary sorting strengthens the interaction between entity references and candidate entities through interactive mapping,and makes more effective use of semantic information and entity type information.At the same time,in order to strengthen the interaction between the algorithm models of each module,based on the idea of knowledge distillation,the interactive model is used as a teacher model to guide the dual-tower model,which effectively improves the effect of the dual-tower model and improves the accuracy of the entire entity linking system.In addition,based on novel corpus,Chinese named entity recognition dataset,Chinese encyclopedia and knowledge base,this thesis constructs Chinese novel entity reference dataset and Chinese novel entity link dataset,which are used for novel entity reference recognition and novel entity linking tasks respectively.Aiming at the problem of entity reference recognition in novels,this thesis designs a BERT+FLAT+CRF(BF-CRF)scheme for entity reference recognition based on Flat-Lattice Transformer,a vocabulary enhancement model for Chinese named entity recognition.Finally,by integrating the algorithm model of each module and the design and implementation of the front and back ends of the system,a pipelined entity link system is constructed,including a four-layer structure:service layer,scheduling layer,business capability layer and data layer,which are used to realize front-end visualization,back-end interaction,algorithm modules,data processing and storage functions.After the function and performance test of the system,it is verified that the functions and performance indicators of the system have reached expectations,with high entity link accuracy,low latency and good scalability. |