Font Size: a A A

Research On K Nearest Neighbor In High Dimension Data

Posted on:2018-07-01Degree:MasterType:Thesis
Country:ChinaCandidate:S J DuFull Text:PDF
GTID:2348330533969826Subject:Computer technology
Abstract/Summary:PDF Full Text Request
In recent years,with the rapid popularization of the Internet,the data generated by human daily life is also growing.The explosion of data is not only reflected in the amount of data,but also in the dimension of data,some operations in the high-dimensional data become more and more important research topics,such as the approximate nearest neighbor search problem.In this paper,we mainly analyze and study the approximate NN query problem in high-dimensional data space.We proposed a series of optimizations and improvements method on the query-aware retrieval methods to deal with approximate nearest neighbor search.In approximate nearest neighbor search,Query-aware LSH is a retrieval method which is very novel,different from the direct bucketing used in traditional LSH,it first determines the hash projection and then determines the search scope,the query performance improved significantly.In this paper,we first improve the Query-aware LSH method,by correcting some defects in the algorithm,and combining the hash method using spectral method,the algorithm has been improved in time and IO performance.Deep hash is a kind of hash algorithm based on deep neural network,we hope to build an excellent hash structure by using neural networks.We first propose a new activation function,and then establish a Convolutional Auto Encoder network.By using powerful ability in data representation of deep neural network,our data is reduced into the lower dimensions,and then we obtained very concise data code.Finally,we combine our deep hash structure and query-aware retrieval,and build a depth hash based near neighbor retrieval model.Although offline training of the model takes a certain amount of time,this method shortens the online processing time and can get more accurate retrieval results.Our experiments on public datasets show that our activation function and training strategy are very useful,at the same time,we prove the efficiency and practicability of the Convolutional Auto Encoder network.
Keywords/Search Tags:high-dimensional data, locally sensitive hash, approximate nearest neighbor search, spectral hash, neural network
PDF Full Text Request
Related items