Font Size: a A A

Data Fusion In Information Retrieval By Opposition-based Learning And Clustering

Posted on:2021-03-09Degree:MasterType:Thesis
Country:ChinaCandidate:Z ZhangFull Text:PDF
GTID:2428330629987243Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the development of information technology,Internet technology has become the main channel for people to obtain information.The primary objective of information retrieval is to help users get the information which they need quickly and accurately from the massive collection of information on the Internet.The idea of data fusion technology is to combine the multiple component results submitted by different information retrieval systems to obtain a new search result.Previous research shows that data fusion technology can effectively improve retrieval performance.The linear combination method is one of commonly used methods and weight assignment is the key to improve performance.On the other hand,data fusion performance is not only affected by the number of components,but also closely related to their mutual relationship.When there are a relatively large number of candidate component systems,it is necessary to select one group of suitable component systems for fusion so as to achieve better effectiveness.But there is still a lack of research in this area.This thesis explores strategies to improve the efficiency of data fusion from the two directions of improvement of fusion methods and achieving efficiency.the main work in the thesis is as follows:(1)The Linear Combination method can assign different weights to the component systems participating in the fusion according to the specific situation,and the appropriate weight assignment of the component system is the key to the success of the linear combination method.Based on the analysis of the characteristics of opposition-based learning,this thesis introduces opposition-based learning algorithm into data fusion for the first time,and takes advantage of the characteristics of simple opposition-based learning algorithm,high solution accuracy,and few parameters.It optimizes the weight assignment strategy and improves the fusion performance.The experiments on the TREC data set show that compared with traditional data fusion algorithms,this weight assignment strategy can effectively improve the performance of fusion results.(2)Aiming at the problem that a large number of component systems participate in data fusion,the fusion consumes many resources and the implementation efficiency is low.This thesis takes some results from a geometric framework of data fusion into consideration,then proposes a CHAMELEON-based component systems selection algorithm.Firstly,starting from the similarity of the result lists,the distance matrix of the initial component results set is obtained to determine its inherent similarity between each other.Meanwhile,CHAMELEON is used to divide the component results into different clusters according to the distance matrix.Secondly,it uses a greedy strategy to traverse different clusters and select component results one by one to form a new set of component result sets with relatively low similarity.Finally,experimental demonstrate that,compared with other methods,it can greatly reduce the number of component results participating in fusion while maintaining the fusion performance,thereby improving the efficiency of data fusion.
Keywords/Search Tags:data fusion, fusion performance, opposition-based learning, weight assignment, geometric framework
PDF Full Text Request
Related items