Font Size: a A A

Research And Implementation Of Distributed Search Diversity Based On Vertical

Posted on:2017-01-07Degree:MasterType:Thesis
Country:ChinaCandidate:Y F XieFull Text:PDF
GTID:2348330503968523Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Information technology and computer network has made great development since the 21 st century, when mankind has entered into the internet age, massive data and information overload make it more difficult for users to retrieve the content of interest from so many information. With the increase of the information storage pressure, distributed system came into being, at the same time, it also brings a series of new challenges to the traditional retrieval systems and search engines. One of the challenge comes from the diversification of user search requirements, which makes retrieval system can not only quickly and accurately find users' retrieval requirements related to which vertical, that is,to meet the user's query diversification but also need to consider the correctness of information to override the user's need. So combine the distributed search system with diversity will be the answer to these challenges. Based on the classical distributed retrieval system, this paper proposes many kinds of feasible algorithm combined with the retrieval of information diversity in three aspects, vertical selection, resource selection and result merging,to provide users with more targeted services. The major contributions of this paper are as follows:Firstly, for vertical selection, this paper proposed the vertical selection algorithm based on the word2 vec judgment method and vocabulary expansion ranking method. The algorithm extracts the key words of vertical,and extends the query word at the same time.Finally the vertical is selected according to the similarity between the two. The experiment results show that the proposed vertical selection algorithm has a certain improvement in precision and recall compared with the existing vertical selection method.Secondly, for resource selection, this paper puts forward two kinds of resource description method, LDA topic description method and TF-IDF resource description method, and proposes the resource selection algorithm framework based on resource description method. The framework combines with the vertical selection results to select the resource which is related to users' requirement. The final results show that the proposed distributed resource selection algorithm can be effectively applied to the network search engines of the real and complex environment.Thirdly, for results merging,this paper propose a results merging algorithm framework based on three aspects,document features,resource features and vertical features. it also consider the vertical characteristics and the query word diversity. The framework uses the improved CORI algorithm and the linear fusion algorithm to perform the final result merging score. Compared with the existing results merging existing methods, the algorithm achieves good performance and greater improvement in precision rate and n DCG.On the basis of the above researches, this paper proves that the proposed three algorithms can effectively improve the accuracy of the system and the feedback effect of diversitification.
Keywords/Search Tags:federated search systems, diversification, vertical selection, resource selection, results merging
PDF Full Text Request
Related items