Font Size: a A A

Research On Chinese Short Text Entity Chain Indicator Oriented To Retrieval System

Posted on:2022-03-04Degree:MasterType:Thesis
Country:ChinaCandidate:Z Q ChongFull Text:PDF
GTID:2518306344489904Subject:Software engineering
Abstract/Summary:PDF Full Text Request
The Internet provides users with an open platform for information sharing,information retrieval and expressing opinions.The search content represented by Baidu and Google is mostly Chinese short texts.These short texts contain a large number of named entities.How to identify these entities and the correct linking of these entities to the knowledge base is an urgent problem in the field of natural language processing.Compared with the Chinese long document entity link finger,the Chinese short text entity link finger has the following challenges: The colloquialization is serious and the expression is not standardized.The Chinese short text often contains some meaningless symbols,which makes it difficult to resolve entity ambiguities;Short text is not rich in contextual information,and its semantic meaning is vague;Compared with English,Chinese is more challenging for short text entity disambiguation due to the characteristics of the language,such as polysemous and synonymous words.Entity chain refers to one of the tasks in the field of natural language processing,and it has a wide range of application scenarios,such as translation systems,retrieval systems,and automatic question answering.Therefore,the research and application of Chinese short text entity link finger for retrieval system is of great significance.In response to the above problems,this paper is based on the BILSTM-CNN model,using the BERT model and the Attention mechanism to complete the entity chain finger task.The main work of this paper is as follows:(1)For the short-text entity recognition task of the retrieval system,this paper proposes the BERT-BILSTM-CNN model,which uses the BERT pre-training model to solve the problem of long-distance dependence.At the same time,a half-pointer and half-labeling method is proposed to mark entities.Finally,It is used to decode the identified entity and fully obtain the context information.By comparing the BERT-BILSTM-CNN model with SVM,CRF,BILSTM,BILSTM-CRF,BILSTM-CNN and other model tests,the experimental results show that the F1 value of BERTBILSTM-CNN is better than the comparison model,indicating that the model is in entity recognition Has a more significant advantage.(2)For the retrieval system entity disambiguation task,this task uses the BERT-Attention model,which uses the BERT parameters shared by BERT-BILSTM-CNN,and uses the Attention mechanism to focus on entity features,ignoring invalid information in entity features.By comparing the results of the BERT-Attention model with CNN,BERT+Text Rank and other models,experiments show that the accuracy of the BERT-Attention model is better than the comparison model,and it plays a vital role in entity disambiguation.
Keywords/Search Tags:Chinese Short Text, Named Entity recognition, Entity Disambiguation, Entity Link, Attention mechanism
PDF Full Text Request
Related items