Research On Similarity Search Based On Hidden Semantic Hashing Algorithm

Posted on:2016-09-17

Degree:Master

Type:Thesis

Country:China

Candidate:Y F Wang

Full Text:PDF

GTID:2348330479953429

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

With the rapid development and wide application of IT technologies, multimedia data shows exponential growth trend. Although diverse and massive data provides a wealth of raw materials for intelligent service based on data analysis, it brings unprecedented challenges for organizing, analyzing and retrieval of data. Although the traditional nearest neighbor search approaches work well in low-dimensional feature space, they fail to efficiently solve the problem of "curse of dimensionality" brought by high-dimensional multimedia. Binary hash codes have an absolute advantage in storage and computing, therefore hashing algorithms have the potential to help people walk out of the shadow of information overload.The data-aware hashing algorithm based on latent semantics, which analyzes the correlation between data and hidden semantics, bridges the "gap" between low features and semantic layer and provides similarity search based on hidden semantics. The algorithm takes the local spatial geometric structure of data in semantic space into account, and tries its best to maintain the information of neighborhood relationships of data in semantic space. In order to make the expression of data consistent with human brain's cognitive processes, we map data into the implicit semantic space with the parts-based representation. In addition, sparseness constraint is imposed on implicit semantic representation of data so as to highlight the major latent semantics and eliminate interferences from the weak ones. In order to generate compact and efficient hash codes, the method of space division is utilized to project data in latent semantic space into Hamming space, making each bit have maximal entropy. Eventually, the hash function is regarded as a combination of multiple classifiers. Thereby, the task of learning hash function is turned into training multiple binary classifiers. The experimental results on several open datasets show that the hidden semantic hashing algorithm outperforms state-of-the-art hashing algorithms under the evaluation metrics of accuracy, recall and MAP. In addition, the reverse index trees structure based on segmental hash code improves the retrieval speed tremendously. The rank fusion technique for ordered lists of multiple hashing algorithms increases the accuracy of ranking greatly.

Keywords/Search Tags:

data-aware hashing, hidden semantics, local spatial geometric structure, reverse index trees, rank fusion

PDF Full Text Request

Related items

1	Based On Spatial Data Storage And Retrieval Of Geospatial Research
2	Research On The Generic Location-aware Rank Query Based On Temporal
3	Research On Spatial Data Index Based On Nearest Neighbor Distance And Query Algorithms
4	A Str R - Tree Spatial Index Based On The Research
5	Research On Technology Of Reverse Nearest Neighbor Query For Moving Objects In Spatial Database
6	Research On High-dimensional Index In Large-scale Image Retrieval
7	Locality Sensitive Hashing Index Based On Neighborhood Collision Counting
8	Research On The Location-aware Rank Query Based On User Preferences Constraint
9	Study On Spatial Index Structure And Spatial Query Algorithm In Supporting System Of Three Resistances
10	Research On Spatial Index Based On QAAR-Tree