Font Size: a A A

Research On Similarity Search Based On Hidden Semantic Hashing Algorithm

Posted on:2016-09-17Degree:MasterType:Thesis
Country:ChinaCandidate:Y F WangFull Text:PDF
GTID:2348330479953429Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the rapid development and wide application of IT technologies, multimedia data shows exponential growth trend. Although diverse and massive data provides a wealth of raw materials for intelligent service based on data analysis, it brings unprecedented challenges for organizing, analyzing and retrieval of data. Although the traditional nearest neighbor search approaches work well in low-dimensional feature space, they fail to efficiently solve the problem of "curse of dimensionality" brought by high-dimensional multimedia. Binary hash codes have an absolute advantage in storage and computing, therefore hashing algorithms have the potential to help people walk out of the shadow of information overload.The data-aware hashing algorithm based on latent semantics, which analyzes the correlation between data and hidden semantics, bridges the "gap" between low features and semantic layer and provides similarity search based on hidden semantics. The algorithm takes the local spatial geometric structure of data in semantic space into account, and tries its best to maintain the information of neighborhood relationships of data in semantic space. In order to make the expression of data consistent with human brain's cognitive processes, we map data into the implicit semantic space with the parts-based representation. In addition, sparseness constraint is imposed on implicit semantic representation of data so as to highlight the major latent semantics and eliminate interferences from the weak ones. In order to generate compact and efficient hash codes, the method of space division is utilized to project data in latent semantic space into Hamming space, making each bit have maximal entropy. Eventually, the hash function is regarded as a combination of multiple classifiers. Thereby, the task of learning hash function is turned into training multiple binary classifiers. The experimental results on several open datasets show that the hidden semantic hashing algorithm outperforms state-of-the-art hashing algorithms under the evaluation metrics of accuracy, recall and MAP. In addition, the reverse index trees structure based on segmental hash code improves the retrieval speed tremendously. The rank fusion technique for ordered lists of multiple hashing algorithms increases the accuracy of ranking greatly.
Keywords/Search Tags:data-aware hashing, hidden semantics, local spatial geometric structure, reverse index trees, rank fusion
PDF Full Text Request
Related items