The knowledge base question answering(KBQA)system can answer the question which is raised in natural language form directly.Compared with the way of obtaining knowledge base content with the help of formal query statements,such as SPARQL,question answering system is more intelligent and efficient.Therefore,in recent years,the knowledge base question answering system has become a focus of attention in the field of artificial intelligence,and has been widely used in a lot of industrial scenarios,including search engine,intelligent customer service and others.The question answering system based on large-scale knowledge base usually adopts the pipeline mode.In this mode,the system uses the related technology of natural language processing to determine the subject entity of the question query,and extracts the triples near the subject entity in the knowledge base to find the combination with the highest degree of correlation with the question to extract the answer of the question.The interrelated triples in the knowledge base represent a small-scale graph structure,which is called query graph.It can reflect the knowledge reasoning process between entities.It can be said that the process of searching answers in the knowledge base question answering system is equivalent to the process of ranking the query graph containing subject entities according to the degree of correlation with the questions.This paper will study the main methods involved in each segment of the knowledge base question answering system based on query graph ranking.The research content of this paper can be summarized as the following three points.1.Topic-entity recognition technology for KBQA.In order to recognize the entity reference from the question sequence,this paper designs the sequence labeling model and the model for predicting the entity boundary,which are both based on the pre-training model ERNIE.In order to solve the problem that the result of model recognition may not be the entity reference of knowledge base,the model-based entity recognition method is combined with the string matching entity recognition method.2.Entity link technology for KBQA.Firstly,all the entities corresponding to the entity recognition results are recalled as candidates,and then entity features are extracted from multiple perspectives,and feature fusion is carried out by using the ranking model to calculate the correlation between the entity and the problem.This paper also designs a method based on Lambdarank algorithm to train the ranking model,which significantly improves the ability of learning the relative order between entities and the accuracy of ranking.3.Query graph ranking technology for KBQA.In this paper,dynamic expansion rules are designed to generate candidate graph sets that may contain answers to questions,and two expansion restriction strategies are designed to reduce the size of candidate graph sets.In this paper,the query graph features are extracted from multiple perspectives,and the ranking model is used for feature fusion to calculate the correlation between the query graph and the question,and then the query graph with the highest correlation is selected to extract the answer.In addition,in the process of extracting query graph features,this paper uses the pre-training model ERNIE and graph embedding technology to calculate the semantic similarity between query graph and natural language problems.The F1 value of our KBQA system reached 0.8828 in CCKS2020 CKBQA test set,which is close to the first place in the competition. |