Font Size: a A A

Research On Sequence Keeping Hash Algorithm Based On Dynamic Clustering

Posted on:2021-05-04Degree:MasterType:Thesis
Country:ChinaCandidate:X L HanFull Text:PDF
GTID:2428330605467915Subject:Software engineering
Abstract/Summary:
The rapid development of the Internet technology has caused the explosive growth of multimedia data,which makes the retrieval more complex.In order to solve this problem,hashing algorithms are employed in the image retrieval task,which maps the high-dimensional data into a compact binary code,and uses the Hamming distance with faster retrieval speed to approximate the original similarity.Although hash method has excellent performance in reducing storage occupancy and improving retrieval speed,its retrieval accuracy is relatively low.In order to improve the nearest neighbor retrieval accuracy of hash algorithm,this paper mainly focuses on two aspects: how to get lower quantization loss and how to improve the similarity of the data points at the top of the ranking.The main research contents and innovation achievements are concluded as follows.1.An iterative self-organizing hash algorithm(ISOH)is proposed.The product quantization method is utilized to divide the feature space,and the iterative self-organizing data analysis method is applied to cluster the sub data sets in each sub space.In the paper,the Hamming distance is used to approximate the Euclidean distance.In order to solve the problem of local optimal,which are problem caused by the random initialization of clustering centers.The farthest average distance method is proposed.We generate the initial clustering centers by dividing the clusters with large number of samples and large variance in turn.Because the split threshold and merge threshold need to be obtained by cross validation,the minimum spanning tree is introduced to obtain the merge threshold,and the maximum value of standard deviation in each component is calculated to the split threshold.Finally,in order to solve the problem of limited representation range of fixed encoding length,a multiple encoding mechanism is constructed to assign multiple binary codes to each data.2.A ordinal constrained hash algorithm with top optimization(OCTH)is proposed to keep the tuples sequence relationship between data in Hamming space.If directly computing the ordered tuples,the complexity is too high.So tensor product is introduced to build the ordinal relationship graph between data.Due to the large size of the training data set,it is difficult to use all the data points to construct the above sequence diagram.In order to reduce the spatial complexity,the iterative self-organizing data analysis method is used to reduce the size of the data set,and the ordinal relation between the clustering center points is used to approximate the ordinal relation of the whole data set.In terms of the accuracy of the ranking data points in the optimization return result,a penalty mechanism is established to punish the top ranked wrong points more than the bottom ranked ones,and different weight values are given to the data points according to the ranking difference between Euclidean space and hamming space.In the case of the wrong sort with a large sorting difference,a large weight value is given.Otherwise,a small weight value is given.In SIFT,GIST and CIFAR10 data sets set up neighbor retrieval contrast experiment,the experiment results show that ISOH uses the iterative self-organizing data analysis method to divide the feature space,which can effectively reduce the quantization error.OCTH can significantly improve the similarity of the data at the top of the sorted list by setting a penalty mechanism.A large number of experiments have proved that ISOH and OCTH can effectively improve the performance of neighbor retrieval.
Keywords/Search Tags:iterative self-organizing data analysis, minimum spanning tree, multiple coding, tensor product
Related items