With the rapid development of Internet technology,information has exploded with an exponential development trend.We can easily obtain massive amounts of information thus we need an effective information extraction method to extract valuable information.Entity linking task,as an important subtask of information extraction task,refers to the task of linking references in text to corresponding entities in external knowledge bases for entity disambiguation,thereby supporting multiple natural language understanding and knowledge acquisition tasks.This thesis mainly studies the task of entity linking,and implements the entity linking prototype system,which can realize the functions of entity linking and display the result of user input text.The main research content of the thesis is based on the following three aspects:(1)First,this thesis proposes an algorithm based on end-to-end architecture.Entity linking task includes two stages of mention detection and entity disambiguation.The current main research direction is in the entity disambiguation stage.However,methods that only focus on the entity disambiguation stage ignore the important dependencies between the mention detection stage and the entity disambiguation stage,and errors brought about by mention detection will irreversibly affect the subsequent disambiguation stage.To this end,this thesis proposes an entity linking model based on an end-to-end architecture.The model treats the mention detection stage and the entity disambiguation stage as a whole for joint training,and the parameters of the two tasks are simultaneously trained during the model training process in order to make full use of the dependencies between the two stages,which allows the model to learn more feature information between the two stages.The experimental results show that the end-to-end neural network model proposed in this thesis effectively improves the effect of entity linking task.(2)Then,the thesis optimizes the pre-trained task of BERT and proposes the EL-BERT model.Because BERT’s pre-trained task has a certain mismatch in the entity linking task,the MLM task is based on wordlevel MASK,and pre-trained models can only learn word-level feature information.In addition,the NSP task is less difficult and cannot promote the result.In response to the above problems,we propose the EL-BERT model.Based on the original BERT model,the pre-trained task is optimized,and the multi-task mask method and sentence semantic prediction task are adopted,which is more suitable for the task of entity linking.The experimental results show that by improving the pre-trained task,the pre-trained model can better extract feature information,and achieve better results in the entity linking task.(3)Finally,the thesis designs and implements an entity linking prototype system based on the Django framework.The system mainly includes functions such as entity link result display,candidate entity list display,entity query,and link entity visualization.At the same time,there are modules such as corpus management and user management,which can be applied to production environment.The system test proves that the entity linking system has completed the design requirements and achieved the expected goals. |