| Image similarity retrieval is a basic problem in computer science.As the dimension of data features increases,the search efficiency of tree structure index algorithm drops sharply.This is the "dimension disaster" problem encountered by many nearest neighbor searches.One method to solve the problem is the Locality Sensitive Hash(LSH)algorithm.The performance of the LSH algorithm is very sensitive to several parameters that must be chosen by the algorithm implementation.In addition,the traditional centralized image retrieval system will expose the performance bottleneck when facing the massive data.In view of the above characteristics and some shortcomings of existing solutions,in this thesis,LSH-based image indexing system on Hadoop platform is researched.First of all,the key technologies of image retrieval is studied in this thesis,where we analyze the structure and basic characteristics of Hadoop platform,and adopt the Master-Slaver structure of Hadoop cloud platform to store massive image data based on LSH algorithm,and use it as the basis for processing images in divide and conquer,which provides an effective retrieval method for massive images.This lays the foundation for comprehensive analysis and research image retrieval,and provides theoretical and technical methods for the design and implementation of the prototype system.Considering that the parameters in the LSH algorithm are related to the data set,in this thesis,the data set samples are extracted,the data set distribution is observed,the relationship model of the data set distribution and parameters is established,and parameter adaptive optimization method is proposed,which helps to improve the recall and precision.Aiming at the difficulty of parameter selection of LSH algorithm,a parameter optimization method for LSH image retrieval is proposed.Firstly,a performance optimization model of LSH for image retrieval is established,the general form of the non-linear optimization problem for LSH parameter optimization is given,and the novel optimized objective function is defined.Moreover,the distance distribution between image data is analyzed,and a quick method for solving the parameter optimization problem aforementioned is found.Finally,a parameter optimization method for LSH is proposed based on numerical differentiation and binary search.Through experiments,it can be found that the parameter optimization method can improve the operation efficiency,while maintaining a high harmonic mean F1 of precision and recall.Finally,an image retrieval prototype system based on LSH algorithm under Hadoop platform is designed and implemented in this thesis.The system adopts the parameter adaptive optimization algorithm proposed in this thesis,implements the MapReduce-based parallel LSH algorithm,and can retrieve massive image data adaptively and parallelly.The test results show that the system can implement the function of image retrieval based on the parameter adaptive of LSH algorithm.The research results of this thesis can be used as the basis for further research on distributed image similarity retrieval and parameter adaptation.It can maintain high F1 and improve the efficiency of system operation,so it has high theoretical value and practical value. |