Font Size: a A A

Study On Pivot Selection Of Metric-Space Indexing

Posted on:2018-11-17Degree:MasterType:Thesis
Country:ChinaCandidate:X L LiFull Text:PDF
GTID:2348330515496490Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Among the characteristics of big data,'variety' is relatively less studied.Metric-space data management and analysis abstract data into metric space,and is an effective approach for the variety challenge of big data.Since coordinates do not exist in metric space,it is common to consider the distances from a data object to pre-selected reference points(or pivots)as its coordinates.Pivots are selected from metric space,used to partition data and build index tree ultimately.The quality of pivots is critical to the data management and analysis in metric space.The research of pivots selection consists of at least objective function and selection algorithm.Currently,the distance-mean objective function,Farthest First Traversal(FFT)and Incremental selection algorithm are used frequently.The distance-mean objective function roughly regards the mean of distance as object function,and loses the influence of pruning about the query radius.FFT algorithm is difficult to find the best pivots,while Increment algorithm has the problem of local optimum.The main contribution of this thesis includes:· For the disadvantages of the distance-mean objective function,we propose the radius-sensitive objective function.Its objective is the number of data pairs whose distances are greater than the query radius,and it adequately considers the effect of pruning with respect to the query radius.Experiments show that the radius-sensitive objective function is able to select better pivots,and is more consistent with the performance of index.· For the disadvantages of FFT,we propose the Recent Farthest Traversal(RFT)al-gorithm.Experiments show that RFT is better than FFT on the running time and hit-rate.Meanwhile,FFT has the characterisitic of sampling data by space evenly,and is more suitable for sampling.We also propose Pivot Set Selection(PSS)al-gorithm which avoids the local optimum problem of Incremental algorithm.With RFT to select a candidate pivot set and exploiting FFT to sample an evaluation set,PSS gains better query performance and the construction cost of index is greatly reduced.· We explore the theoretical upper bound of the impact of pivot on index perfor-mance.Currently,the difference of performance between many pivot selection algorithms is not significant.The improvement of performance,using complex selection algorithm with high construction cost is usually not remarkable.This thesis studies the possible space of performance improvement,and gives possible direction of future work.
Keywords/Search Tags:variety, metric space, pivot selection, radius-sensitive objective function, RFT, PSS, theoretical upper bound
PDF Full Text Request
Related items