Font Size: a A A

The Design And Implementation Of Distributed Storage And Query System For Similar Images Based On Lsh

Posted on:2015-04-17Degree:MasterType:Thesis
Country:ChinaCandidate:Y ChenFull Text:PDF
GTID:2298330452461123Subject:Software engineering
Abstract/Summary:PDF Full Text Request
The rapid development of the Internet has brought large-scale multimedia data thathas played a great role in promoting the content of multimedia data processing andanalysis. However, there are some difficulties about the massive multimedia data storageand query. Currently, the traditional RDBMS can only provide a little basic storagecapabilities for multimedia data, provides little for multimedia data analysis, processingand query functions, and when traditional relational databases face to the situations ofmassive data, there are some problems about the configuration complexity, scalabilityweak, difficult maintainability, high hardware and software costs and so on.The subject is from a project for technical testing and functional verification aboutInternet multimedia data analysis system, this project is mainly for real-time analysis andnon-real-time analysis of massive quantities of text, images, audio and videoe and othertypes of data, in order to satisfy users for massive data query statistics, and other relatedbusiness requirements. The design and implementation of content of this subject is asubsystem about massive quantities of non-real-time image data analysis, image datastorage and query related technical and functional on the project.According to the user’s requirements and mass data system should support functions,to determine the design and implementation of this subsystem is based on distributed filesystems and distributed database systems, all modules are multi-threaded approach toimplement, and, support the same function modules multiple deployment. Between thefunctional modules, data transfer by Socket message based on TCP/IP protocol. Easyscalability, no single point of failure and data security features for the system to processmassive amounts of data is very important. Nowadays, Hadoop HDFS distributed filesystem and distributed database systems and Hbase can support the above characteristics.While both systems are open source software systems, building costs are relatively low,technology is very mature. Therefore, the storage system of this subject use Hbase withHadoop combination. Similar images search function is achieved by combining LSH(Locality Sensitive Hashing) algorithn and SURF feature.This subject includes seven major functional modules, image receiving and de-emphasis module, image feature extraction module, image feature indexing module, theimage loading modules, image query receiver module, an image query execution moduleand MapReduce query module. The above modules are independent of backgroundapplications using C or C++programme language. Each module can be repeateddeployed on one or more devices that can improve the utilization of hardware resources.The subject achieve a massive re-image data storage capabilities, similar imageretrieval based on image content features, building high-dimensional vector index and retrieval capabilities, it can meet the requirements of multiple backups of data, paralleldata access, software and hardware is easy to expand, software and hardware have nosingle point of failure.
Keywords/Search Tags:Distributed file system, Distributed database system, Locality SensitiveHashing Algorithm, Content-based image retrieval
PDF Full Text Request
Related items