Font Size: a A A

Research On Knowledge Graph Question Answering System For Open Domain

Posted on:2022-08-17Degree:MasterType:Thesis
Country:ChinaCandidate:Z H WangFull Text:PDF
GTID:2518306572486374Subject:Computer technology
Abstract/Summary:PDF Full Text Request
At present,with the help of big data and artificial intelligence technologies,existing search engines can quickly locate the required information in the massive internet data according to people's search queries,which can meet people's daily needs of retrieving information.However,the results returned by search engines are usually a list of document links related to the query,and it is impossible to directly get the answers with regard to the question,especially when confronting knowledge-based questions.Therefore,this thesis studies the construction method of open-domain knowledge graph question answering system with the help of existing knowledge graph,natural language processing and deep learning technologies.The goal is to directly return short and clear answers to encyclopedic knowledge questions raised by users.The main contributions of this thesis are as follows:1.This thesis proposes an entity prediction model based on mention recognition and entity linking to identify candidate entities in questions.For mention recognition,it proposes a BERT-Bi LSTM-CRF(BBC)sequence labeling method and an Elastic Search(ES)precise retrieval method.BBC sequence labeling method is to learn the hidden semantic features of questions through BERT and Bidirectional Long Short-Term Memory(Bi LSTM)network,and then uses Conditional Random Field(CRF)to predict the label sequence of questions,from which candidate mentions can be identified.ES precise retrieval method is to segment question sentences,and accurately match the filtered words to the corresponding candidate mentions in the ES database.For entity linking,it proposes a feature calculation method and a feature ranking method,which are used to link candidate mentions to the knowledge graph and obtain candidate entities related to it.The feature calculation method is to learn the semantic features and statistical features between the question and the entity.The semantic features include the semantic similarity between the question and the entity information.The statistical features include the mention importance,entity popularity and character matching.The feature ranking method is to use Logistic Regression(LR)algorithm to model entity features and obtain candidate entities after ranking.The experimental results show that the comprehensive recall rate of mention recognition is 0.961,and the prediction accuracy of entity linking Top-5,Top-3 and Top-1are 0.846,0.834 and 0.815.In order to balance both the accuracy of question answering and the computational efficiency,Top-3 is finally selected as the candidate entities of questions.2.This thesis proposes a relation prediction model based on semantic similarity and representation learning to identify candidate relations in questions.The semantic similarity method is to calculate the semantic similarity between questions and relations by multi-level sorting(Word2Vec,BERT)algorithm,and obtain candidate relations after ranking by LR.The representation learning method uses the BERT question encoding and Rotat E knowledge graph encoding model to learn the graph information between questions and relations,and obtains candidate relations after ranking by scoring function.The experimental results show that the semantic similarity model is better than the representation learning model,and the prediction accuracy of the Top-1 relation is 0.792 and 0.774,respectively.Therefore,the semantic similarity model is finally selected to predict the candidate relations of questions.
Keywords/Search Tags:Knowledge graph, Question answering system, Mention recognition, Entity linking, Relation prediction
PDF Full Text Request
Related items