Font Size: a A A

Semantic Document Retrieval For English To Chinese Cross-Language Question Answering System

Posted on:2012-01-12Degree:MasterType:Thesis
Country:ChinaCandidate:T YangFull Text:PDF
GTID:2218330368487862Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the advent of rapidly development of internet technologies, more and more information is available and many users connect it. And as more information becomes available, many other information description languages are increase. In so dynamic and complex environment, find useful information has now become domestic and international important research subject. And cross language information retrieval is gradually becoming a hotspot concerned by domestic scholars. Meanwhile as an advanced form of cross language information retrieval, cross language question answering (CLQA) has also become a major research area in natural language processing (NLP).Compared with the traditional information retrieval system, the query of cross language information retrieval is complete and colloquial questions. And the results of cross language question answering are hyper-precision webpage or definite answers. Judging from the interior structure of the system, cross language question answering system used a lot of NLP techniques and methods, such as syntax analysis, question analysis, named entity recognition (NER) and machine translation. CLQA system can be divided into cross language document retrieval and answer extraction. Find more relevant documents by analyzing the original query are the main function of cross language document retrieval. This paper is aimed to research the cross language document retrieval.Via analyzing the actuality and deficiency of existing methods, a new approach of cross language document retrieval based on semantic information is proposed. The aim is to extract more relevant document in target documents (Chinese). Firstly, keywords are extracted from the source query (English) by question analysis, and then the extracted keywords are translated and combined as a query. Then, expand the query by local context analysis, and retrieve the relevant documents according to the expand query. Finally, reconfiguration the initial results based on the semantic topic cluster method. Contributions of our work mainly include the following three aspects:(1) Effective use of the semantic information between keywords and expand words in expansion process. Adopt the snippets, which are given by web search, as the underlying collection and use local context analysis method to get expand words. The information insufficient problem of original query is solved by using the method mentioned above.(2) A result reconfiguration method based on semantic topic cluster is designed to re-ranking initial results. And we solve the relevant documents come low in results ranking problem via the method.(3) Our method can effectively avoid topic drift problem.
Keywords/Search Tags:Cross Language Question Answering, Cross Language Information Retrieval, Query Expansion, Semantic Topic Cluster
PDF Full Text Request
Related items