Font Size: a A A

The Search Result Set Selection Algorithm Research

Posted on:2011-03-26Degree:MasterType:Thesis
Country:ChinaCandidate:C S LiuFull Text:PDF
GTID:2208360305473806Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
With the development of network, the volume of information on the network has been increasing, in order to let people be able to get information from the Web easily, search engine comes into being and continues to grow and develop. However, the results of search engines which users can view are presented in a single result, and are isolated but lacking of integrity, still require assembling the information manually, and thus obtain a complete information, meanwhile the information contained among the results has more redundancy, especially the top of the results are greater. This needs to find a way of trying to solve this problem.This paper analyzes the ranking algorithm of existing search engine, and the novelty methods among the results and minimal document set generation algorithm. We can see that the relevant results which the existing search engine returns to the users have more redundancy between the results, and the results are single and information is not complete enough to meet the users'query satisfaction. So in this paper, we use document set retrieval based subtopic extraction method to assemble the relevant retrieval results, and propose a concept "Inclusion" with the formula explanation, and meanwhile consider novelty approach and propose a concept "Novelty" with the formula explanation, using the "Relevance","Inclusion" and "Novelty" to re-rank the initial document sets, which ensure that the top of the document collections don't debase the relevance, and contain more complete information and smaller redundancy as far as possible.We compare with the experimental results to determine the three factors in the algorithm. After that, we compare results of the document sets based on the factors with the original document sets. The experimental results show that the overall relevance has increased after improving the algorithm, while the top of the document collection also has greater inclusion degree and novelty degree. This not only meets the user's query request, while allowing users to get a better query satisfaction.
Keywords/Search Tags:document cluster, retrieval results set, inclusion, novelty
PDF Full Text Request
Related items