Font Size: a A A

Research On Information Retrieval Based On Chinese Wikipedia

Posted on:2014-04-22Degree:MasterType:Thesis
Country:ChinaCandidate:X L DaiFull Text:PDF
GTID:2268330398981651Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the popularization and development of computers and Internet, the number of various information resources on the network is also increasing rapidly. While enjoying the benefits from massive data, the Internet users also need to face the challenge that how to extract the information they need from this enormous data. Information retrieval (IR) technology has helped to solve this problem.However, the traditional IR systems mostly search the result based on keyword matching, but they ignore the importance of semantic information. It makes the IR system cannot understand the user’s query intent accurately. Thus, the performance of the IR system is in serious decline. Therefore, how to make an IR system fully understand the intentions of the user queries has become a hot research topic in the field of IR. In our paper, we add the concepts into the IR system and make it fully understand the user’s query by the semantic. The essential work contains three parts:First, we make the Chinese Wikipedia as an additional huge library of concepts and propose a method to represent the semantic of the text which based on these concepts. The method makes each article in the Chinese Wikipedia as an independent concept. By comparing the relevance between these articles and the text, we can easily find a best set of concepts to represent this text.Second, we add these concepts into the traditional IR system. Representing the semantic of the query and the documents to be retrieved by the concepts, we can get an IR method based on the concepts. Ultimately, we propose a novel pseudo-relevance feedback (PRF) retrieval method which combined the BOW (bag of words)-based retrieval model and Concept-based retrieval model together.Third, we implement the IR method which proposed in our paper and experiment on NTCIR-5test set. Finally, the experimental results show that our method performs better than the traditional PRF in both average precision (MAP) and p@10, which proves the effectiveness and practicality of our method.
Keywords/Search Tags:Chinese Wikipedia, information retrieval, concepts, textrepresentation
PDF Full Text Request
Related items