| Nearest neighbor search plays a fundamental role in machine learning community,which has been applied in various fields,include data mining,computer vision,and information retrieval.However,with the explosive growth of data in real-world applications,a large number of videos,images and text have been occurred in various industries.In the era of big data,nearest neighbor search faces large data storage,high retrieval cost and dimensional disaster caused by high-dimensional.Hashing is essentially to learn hash function that maps the data to the low-dimensional binary space with preserving the neighborhood structure of the data in the original space.Therefore,the neighbor search problem can be stored and calculated in the binary space,which significantly reduces the storage cost and improves the search efficiency.This paper focuses on the learning to hash and its application research,and has the following work:(1)Locality-constrained discrete graph hashing:The goal of hashing is to map high-dimensional data into a low-dimensional binary space with preserving the similarity among neighbors,which is essentially a discrete optimization problem with constraints.However,the existing discrete graph hashing methods do not combine the goal of hashing in the relaxation constraint processing.Therefore,a novel locality-constrained discrete graph hashing(LCH)method is proposed,which defines the slack variable to relax the validity constraint(bit balance and bit uncorrelation)of the hash codes,and preserving the similarity as the goal to minimize the constraint loss.In order to the algorithm converge smoothly,the slack variable is also preserving the similarity.This is consistent with the nature of hashing for similarity preservation.(2)Improved hashing for efficient recommendation method:Hashing techniques can effectively solve the storage and retrieval efficiency problems faced by recommender systems(RSs).However,one issue of applying hashing to RSs is that RSs focus on modeling user's preference over items rather than their similarities concerned by hashing.Therefore,we propose an improved hashing for efficient recommendation method.Firstly,the mean of each user and item relative scoring system is considered as a bias.Then,the rating is mapped to the similarity interval by subtracting the bias.Finally,two methods witch preserving the similarity are proposed to decompose the similarity matrix to obtain user and item binary codes.This method alleviates the gap between preference and similarity.(3)Semantic Hashing with Bayesian Clustering for Text Retrieval:A popular way to accelerate text search is semantic hashing which designs compact binary codes for a large number of documents so that similar documents are mapped to similar codes.However,the time complexity of existing text semantic hashing methods are squares so that difficult to apply to large-scale data.Therefore,we propose a novel Semantic Hashing method based Bayesian Clustering(BCSH).It adopts the naive Bayesian model to extract the semantic information on the documents for two class clustering,and generalizes the two class clustering to the r-dimensional to obtain the r-bit binary code of the documents.In order to ensure the validity of each cluster,bit balance and bit uncorrelation constraints are imposed on the binary code.The time complexity of this method is linear,and the semantic extraction can obtain high quality hash codes. |