Font Size: a A A

Research On Named Entity Recognition And Entity Link Method For Short Text Questions

Posted on:2020-08-16Degree:MasterType:Thesis
Country:ChinaCandidate:C ZhaoFull Text:PDF
GTID:2428330623959900Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Named entity recognition and entity linking are the basic tasks in the field of natural language processing.Identifying the entity mentions in the statement and mapping them to the corresponding entities in the knowledge base provide an important sense for effective understanding of semantics of sentences.With the emergence of the knowledge base question and answer system,as a basic step of the question and answer system,the research on named entity recognition and entity linking technology for short text questions has important significance and value.For named entity recognition,this paper regards named entity recognition as a sequence labeling task,which is implemented using a neural network model and proposes improvements in the input layer and decoding layer of the model.For the entity linking,this paper utilizes the background knowledge of the entity mentions through the external corpus and the extraction of the entity type,the entity relationship and the neighboring entity as the representation of the candidate entity in the structured knowledge base.The main research contents of this thesis are as follows:(1).Named entity recognition is based on the neural network model of BiLSTM+SoftMax.It is proposed to splicing the character level and part of speech of the word as the input of the model after the pre-trained word vector,and because BiLSTM and SoftMax cannot consider the relationship between the named entity tags.For considering dependencies between tags,the decoding layer replaces SoftMax with CRF,and selects the globally optimal label for each word.(2).The triples containing the entity name attribute in the Freebase knowledge base are extracted,then the data is cleaned,and the mention-entity mapping dictionary is constructed.The candidate entity set is effectively filtered by the entity popularity to obtain a suitable set of candidate entities.Finally,this paper proposes a string matching algorithm that guarantees that the dictionary covers as many entity mentions as possible.(3).Candidate entity disambiguation through three different angles of features,namely entity popularity: entity popularity indicates the popularity of the entity in the knowledge base,it is the entity's inherent attribute and can be used as an auxiliary feature of entity disambiguation;question-based features: due to the lacking of descriptive text information for entities in structured knowledge base.In this paper,entity categories and entity relationships are used as representations of candidate entities in the knowledge base,and the similarity with the mention context is calculated separately;Feature based on similar entity mentions: with pretrained word embedding model in Wikipedia corpus,similar entity mentions are obtained as the background knowledge of the entity mention,and utilizes the neighboring entity of the candidate entity as its similar entity to calculate the similarity between similar entity mentions and neighboring entities.(4).The system of named entity recognition and entity linking for short text questions is implemented.The validity of the method is verified experimentally on the commonly used question and answer data sets.The named entity recognition experiment shows that adding character-level feature in the input layer and utilizing CRF in the decoding layer perform best.The entity linking experiment shows that the results of all the features combined are the best.Entity popularity and feature based on similar entity mentions in a single feature are relatively good.
Keywords/Search Tags:Named Entity Recognition, Entity Linking, Freebase, Knowledge Base, Short Text Question
PDF Full Text Request
Related items