Font Size: a A A

Research And Implementation On Information Extraction From Text For Question Answering Systems

Posted on:2010-04-28Degree:DoctorType:Dissertation
Country:ChinaCandidate:H LiFull Text:PDF
GTID:1118360275955531Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the rapid development of the Web,people could easily store data,exchange information and share knowledge on this platform.However,the great amount of data on the Web brings difficulty to users to efficiently obtain their required knowledge. Hence,web-based Information Retrieval and Information Extraction become important research topics.When the search engines become inadequate to meet people's growing need,how to appropriately make use of the abundant resources and make machine understand the information among them,becomes a popular research area in machine intelligent during the Internet Age.Moreover,Question and Answering which is based on Information Retrieval and Natural Language Understanding flourish under this kind of circumstance.A Question and Answering System takes questions represented in natural language instead of keywords as input, in order for users to express their requirement conviniently and clearly.A Question and Answering System returns short answer to users instead of relevant documents, which facilitates information obtaining.Question Answering(QA) systems can be categorized into automatical QA systems and user-interactive QA systems,according to whether there is interaction between user and system.They can also be categorized into open domain QA systems and specific domain QA systems,according to the questions it can handle.The former do not impose restrictions on the scope of questions;the systems attempt to find answers to any questions about any topic.The latter only accept questions of a certain domain;domain knowledge always provides guidance in the QA process.In this thesis we focus on applying Information Extraction in QA and do research in the two different kinds of QA systems.In open domain QA,we investigate how to improve the semantic analysis over questions,how to efficiently make use of historical database,in order to enhance machine intelligence.In specific domain QA,we study how to use experiences to solve new problems,in order to increase the precision of the return answers.The main research works and contributions are described as follows:First,correct semantic analysis over a question is the key to catch the user's requirement.In this paper,we research into semantic constraints detection among text; expect to correctly detect the semantic constraint parts which are denoted by signal words and correctly disambiguate multi-semantic constraints when they are denoted by a same signal word.We propose a method for multiple constraint relation detection based on dependency tree matching.For every kind of semantic constraint,we collect signal words and relevant example sentences to build our case-base.We define partial dependency tree(PDT) kernel to compute the similarity between two objects.Apriori algorithm is applied to decrease the complexity of this kernel function computing.Second,large amount of historical data are accumulated,in spite of in an open domain QA system or a specific domain QA system.In order to efficiently reuse data, in this thesis we research into knowledge extraction only from historical database.We expect to translate short-answer QA pairs into structured expressions;return answer automatically through retrieving among the knowledge base.In this case,we first avoid time-consuming handwork to build a knowledge base.Second,a reference answer will be automatically given to the user by our user-interactive system,provide convenience to users.The workflow of transformation from question answer pairs to knowledge base is described.We combine semantic pattern matching and the above multiple constraint relation detection together,to obtain information among the question sentences.Semantic network based structure is used to express the interconnected knowledge pieces.A user-interactive prototype is implemented to demonstrate the whole process of knowledge base construction,management and usage.Finally,domain knowledge plays an important role in specific domain QA.In some domain,experiences are the best gist of solving new problems.In this thesis,we take the task of plant growth environment recommendation as a background,and research into reusing experiences in a specific domain based on Case-Based Reasoning method.A method of learning adaptation rules for case-based reasoning (CBR) is proposed.The Resource Space Model and the Semantic Link Network are applied in case-base construction for efficient resource management and reuse. Adaptation rules are generated only from the case-base based on case comparison. Relations between cases and general domain knowledge provide guidance during the similarity computing.The adaptation rules are refined before they are applied in the revision process.Distance measurement and confidence value are used to improve the accuracy of adaptation rules.After solving each new problem,the adaptation rule set is updated by an evolution module in the retention process.
Keywords/Search Tags:Web, Question Answer System, Information Extraction, Dependency Tree, Knowledge Base, Case-Based Reasoning
PDF Full Text Request
Related items