| As the important carrier of human communication,natural language contains rich semantic knowledge,but due to the high ambiguity and variation of language,such implicit knowledge is difficult to be directly understood by computers.Meanwhile,human beings are also building machine-understandable knowledge bases to store knowledge explicitly.How to connect language and knowledge automatically has always been a core problem in the field of natural language processing.Entity linking is a core technology of bridging language and knowledge,specifically linking entity mentions in natural language to corresponding entities in the target knowledge base.Entity linking has important applications in knowledge base construction,knowledge base question answering,information retrieval,and other fields.Early entity linking technology mainly adopts feature engineering-based methods,which measure the context similarity between the entity mention and candidate entities by manually designing features.With the advent of the deep learning era,the deep learningbased entity linking methods gradually replaced the feature engineering-based methods.Recently,with the emergence of large-scale pre-trained language models,models can learn more generic text representations from large-scale unsupervised pre-training.However,the design of entity linking models in the pre-training era remains to be explored.In addition,the previous work on entity linking mainly focused on textual data.With the multi-modality of data,there is also rich semantic information in tabular data.How to design effective entity linking methods for tabular data also needs to be studied.Finally,the current work on entity linking mainly focuses on the technology itself,but the research on the application of entity linking is also critical.Therefore,under the background of the rise of pre-trained language models,this thesis carries out a series of research work around natural language in the form of textual and tabular data,aiming to improve the performance and efficiency of entity linking technology and explore its applications in downstream tasks.The main research contents of this thesis include the following four aspects:1.Entity Linking over Textual Data by Modeling Latent Entity Type Information.Existing state-of-the-art neural entity linking methods tend to link entity mentions to incorrect entities with inconsistent types.To address this problem,this paper proposes a textual entity linking method by modeling latent entity type information with a pre-trained language model.By modeling the immediate context consistency between entity mention and candidate entities,this method implicitly captures the type consistency and overcomes the insufficient disambiguation information problem of existing methods.This paper verifies the effectiveness of this method through in-domain and out-domain evaluation on standard benchmark datasets and reveals the reasons why the proposed model is effective via detailed experiment analysis.2.Autoregressive Entity Linking over Textual Data via Referring Expression Generation.Current deep learning-based textual entity linking models mostly rely on pre-trained entity embeddings.The storage space of such methods increases linearly with the number of entities,thus suffering from poor storage efficiency.The recently proposed autoregressive entity linking method formulates the entity linking task as a sequence generation problem,which solves the poor storage efficiency problem well.However,the current autoregressive entity linking method cannot be generalized to the knowledge base without a unique entity name.This paper proposes a unique entity name generation method based on the referring expression generation framework,which can improve the generality of the autoregressive entity linking method.The experiment results demonstrate that this method significantly improves the generalization performance of the autoregressive entity linking model by injecting entity type and property information into the entity names.3.Entity Linking over Tabular Data by Modeling Structure Constraints.Current table entity linking methods rely on labeled data and assume the tables contain rich metadata.To address these limitations,this paper proposes an unsupervised table entity linking method based on the structure constraints.This method does not rely on any metadata associated with the table and is only based on structure constraints of the table,i.e.,the type consistency along with the column and property relatedness within the row.This paper verifies the effectiveness of this method on standard benchmark datasets.Furthermore,this paper explores how to leverage the results of table entity linking to solve the downstream task of column type prediction.4.Question Entity Linking and Its Application in Knowledge Base Question Answering System.A detailed analysis of the effect of entity linking in downstream applications is missing.Therefore,this paper deeply explores the application of entity linking in the downstream task—knowledge base question answering.This paper firstly designs a knowledge base question answering framework based on retriever-transducerchecker architecture.Then this paper discusses the role of entity linking module in this framework in detail and analyzes the effect of different entity linking systems on the performance of knowledge base question answering system.Finally,this paper finds that the downstream semantic parser with an execution mechanism in the knowledge question answering framework can take advantage of program execution constraints to further improve the performance of the question entity linking system.In general,this thesis mainly studies the entity linking technology for natural language under different forms of carriers.According to the characteristics of context information in different forms of data,this thesis designs corresponding entity linking methods and discusses its applications in some downstream tasks.This thesis has achieved some preliminary results and hopes that it can be helpful for researchers in the areas of natural language processing,knowledge graph,etc. |