Font Size: a A A

Study On Open-Domain Question Answering

Posted on:2007-05-08Degree:MasterType:Thesis
Country:ChinaCandidate:Y ChenFull Text:PDF
GTID:2178360212968357Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
The development of information technology and the Internet has dramatically increased the quantity of information available in digital form. This has resulted in a proliferation of uses of personal information. Nowadays, one can use search engines, such as Google (http://www.google.com) and Baidu (http://www.baidu.com), to find useful information from the web easily and rapidly. These search engines, also called Information Retrieval (IR) systems, take one or more keywords as input, search results from a large text corpus (such as the World Wide Web) and return a list of snippets and links to relevant documents as output to users. Normally, most of users could be satisfied with these keywords based search engines. However, there are some shortcomings: First, the quality of returned documents relies on the input keywords. There exists a big challenge for new users because it is difficult to describe their information needs using one or more keywords; Second, search engines return a list of relevant documents but not the exact answers. Often one still has to read a large amount of text to find the answer.To overcome these shortcomings, recently, more and more research organizations and companies do efforts to explore new generation of information retrieval systems. One of the most important directions is Question Answering (QA) system. Such QA system takes a natural language question (e.g."Which is the longest river in the world?") as input instead of keywords. Also QA system searches answer from a large text collection (such as the World Wide Web or a local collection). Then one exact answer to input question is returned to user, instead of a list of relevant documents. QA research attempts to deal with a wide range of question types including: factoid, list, definition, how, why, hypothetical, semantically-constrained, and cross-lingual questions. QA is regarded as requiring more complex natural language processing (NLP) techniques than other types of information retrieval such as document retrieval.This paper focuses on the study on open-domain QA. Specifically, it focuses on answering factoid and definition open-domain questions. Answers to factoid questions are typically named-entities, e.g., a number, a person name, or an organization name, or the like. Questions like"Who is Colin Powell?"or"What is mold?"are definitional questions[1-2].For the factoid QA sub task, this paper proposes a novel answer reranking...
Keywords/Search Tags:factoid QA, definition QA, question reformulation, pseudo relevance feedback, language model
PDF Full Text Request
Related items