Font Size: a A A

Research On Key Techniques Of Question Understanding For Open-domain Question Answering System

Posted on:2017-06-08Degree:DoctorType:Dissertation
Country:ChinaCandidate:C Y WeiFull Text:PDF
GTID:1318330566956057Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
As a result of the rapid development of the Internet,the amount of free texts from the open domain grows exponentially,leading to the serious problem of information overload.However,users expect to retrieve and find the required information from Internet quickly and accurately.The traditional search engines based on key words are often criticized for their low accuracy,information redundancy and the need for users to screen the search results,no longer living up to users' expectations.Thus question answering system is,as an important branch of information retrieval,receiving more and more attention.By making use of such theories and technology as natural language processing,information retrieval,information extraction and machine learning,question answering system can understand the natural interrogative input,analyze the user's purpose of retrieval and present a high quality solution.So open-domain question answering system is becoming a new research direction in the areas of natural language processing and information retrieval.This dissertation mainly focuses on the key techniques used in question understanding in open-domain question answering system,including question semantic representation,question semantic information extraction,question semantic similarity computation and similar question retrieval in community question answering system.The main contents and innovations are as follows:(1)The model of question semantic representation(QSR)with event information in Chinese question answering system is proposed.The semantic components of complicated questions in open-domain question answering system are analyzed,identifying the key semantic information as composed of question focus,question topic and question event,thus designing a semantic structure of question event to represent its semantic information.On basis of this,the question semantic representation model is presented based on semantic chunks.This model generates the structure of question semantic chunks,including the three semantic components of question focus,question topic and question event.The QSR modelcan convert a natural language question into a semantic information structure and reduce the complexity of semantic analysis of question understanding.As shown by experimental results,the average accuracy of semantic chunk labeling in Chinese question answering system has reached 74.97% and proves the validity.(2)The method of recognition of question semantic chunks based on active learning is proposed.Due to the limit of linguistic data of questions whose semantic information is manually labeled,to make use of plenty of questions which are not labeled,sequence labeling is adopted to identify question semantic chunks,with the method of recognition of question semantic chunks based on active learning is proposed.As a query strategy for uncertain sampling based on semantic information density is designed for realizingthe computation of the semantic similarity between sequence vectors.This algorithm is more accurate than semantic information density.At the same time,the diversity of samples for uncertain sampling is expanded.This method enhances tremendously the effect of the recognition of semantic chunks,with the accuracy up by 5.2% compared with the method of supervised learning and the effect of reducing linguistic data manually labeled reaching 14.6%.(3)The methods of semantic similarity computation based on QSR model is proposed.On basis of the semantic information of question focus chunk,question topic chunk and question event chunk extracted through question semantic representation,the methods of semantic similarity computation for questions based on the structure of question semantic chunks are presented,converting the computation of question similarity into that of the similarity between correspondent semantic chunks,measuring the weighting coefficient of question semantic similarity composed of the three components of question focus,question topic and question event.The experimental results show that the similarity of question event accounts for 12.9% of the overall question similarity,further proving that question semantic representation integrating question event information enriches the representation of question semantic information.Experiments of similarity threshold on different scales all certify that such methods can be effectively applied for similar question retrieval in Chinese question answering system.(4)The method of question similarity computation based on coupled matrix factorization is proposed.With the main research task as similar question retrieval in community question answering system,the clustering information of question focus,question event,question type and question topic and question category label information are selected as the main features of questions to analyze the relationship among the five features and their attributes,which is defined as coupling relationship.Thus the coupled model of question similarity is proposed.The methods of question similarity computation based on coupled matrix factorization enable category label information and question topic information to be integrated,therefore significantly improving the performance of similar question retrieval.
Keywords/Search Tags:open-domain question answering, community question answering, question understanding, question semantic representation, semantic chunk, coupled matrix factorization, similar question retrieval
PDF Full Text Request
Related items