| Along with improvement of the research value and commercial value of Question Answering(QA),the research on Chinese QA has become more and more popular.Text matching has always been a hot research topic in the field of natural language processing,and it is also one of the basic tasks in the research of QA.Compared with other languages,Chinese has two remarkable characteristics,including multiple semantics and complex structure.Multiple semantics refer to the phenomenon of a large number of polysemy in Chinese,meanwhile complex structure represents the unclear segmentation between chars and words.If Chinese is analyzed only from the perspective of chars,it will lose rich semantics,while the analysis of the text from the dimension of words is limited by the quality of word segmentation.In addition,in the QA,question-answering sentence pairs as the input of text matching have two characteristics:(1)The inconsistent length of question-answering sentence pairs.These are indefinite length texts.(2)The lack of contextual knowledge.Question-and-answer sentences constitute complete information from the same context,but the contextual information cannot be determined when they appear in the form of question-answering sentence pairs.This paper investigates Chinese characteristics,matching characteristics of question-and-answer sentences,as well as the technical challenges.Firstly,in terms of Chinese characteristics,this paper proposes an answer matching method combining event and sememe.The event is between word granularity and sentence granularity,which can represent the key information in the text and weaken the adverse impact of word segmentation.The sememe can be abstractly expressed as granularity information smaller than word granularity,which can enrich the semantic of words in the form of external knowledge,and it determines the accurate semantics via combining with the context.In order to effectively integrate different levels of information,this paper constructs the text grid graph(TGG).TGG takes the sememes,chars,words and events as nodes,and the adjacent relationships in the original text as edges.The sememe is only used as an abstract node to update the representation of words.The above information is fused to obtain the representation of the sentence in order to the answer matching prediction.Secondly,according to the characteristics of question-answering sentence pairs,this paper proposes a matching method integrating structured knowledge base.The structured knowledge base comes from the encyclopedia knowledge graph,which is used as external knowledge to provide contextual information for the text.The knowledge base and text are associated through entity links,and the context subgraph is constructed and studied to obtain the context feature representation provided by external knowledge,then realize the word level information interaction in text pairs.Moreover,this paper adds character interaction information to effectively correlate the question-and-answer sentences,in addition to expressing the context information through word interaction.The above information is fused to obtain the representation of the sentence to answer matching prediction.Finally,in order to verify the effectiveness of the above methods,this paper conducts experiments on public datasets and self-processed datasets,and compares with the classical and latest answer matching methods.The results show that the method of fusing multi-source external knowledge performs better on the question-and-answer text matching task,and also verify that the method with information interaction is obviously better than the method of information fusion only. |