Font Size: a A A

Research On HBase-based Mass Image Storage And Fast Retrieval Technology

Posted on:2021-04-10Degree:MasterType:Thesis
Country:ChinaCandidate:D XieFull Text:PDF
GTID:2428330602995922Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the advent of the Web2.0 era,more and more pictures need to be stored in the database.Massive picture information,unstructured data structures,and frequent read and write operations all provide difficulties for data storage.How to efficiently store massive picture data is a topic worthy of attention.The emergence of big data provides us with new ideas for solving massive picture storage.Aiming at the characteristics and storage requirements of massive pictures of faces and the comparison of distributed storage frameworks,the thesis proposes a massive picture solution based on HBase and adopts the distributed storage structure of Master / Slaver.In order to improve the reliability of the picture data,build HA high availability architecture to improve the reliability and fault tolerance of the system.In terms of picture storage,in order to improve the insertion efficiency of massive pictures,different tables and storage methods are designed for different picture sources.For the large-scale face capture data and unstructured character information,a distributed storage primary key is designed to successfully solve the problem of HBase data imbalance in high concurrency situations,and improve the load balance of regions in the region.Since the face image is a small file type,too much will affect the access efficiency of the cluster,so this paper optimizes the existing solution in Hadoop and proposes a new solution,first extracting the feature values in the face image Then,the k-means algorithm is used to merge small files with high similarity into large files to improve the utilization of block blocks in Hadoop.In terms of text retrieval,due to the lack of secondary indexes in HBase,the efficiency of multi-condition queries is low.In order to make up for the shortcomings in this regard,this paper proposes a new solution idea,using the coprocessor to combine the Elasticsearch tool with HBase to build a composite index to improve the retrieval efficiency of HBase.In the search of similar pictures,the LSH algorithm is used to map the picture data with high similarity to the same bucket structure.Finally,build a distributed cluster to test the optimization scheme from the aspects of cluster space-time overhead and retrieval efficiency.Experiments show that when the number of inserted pictures is the same,the small file merge scheme can reduce the consumption of cluster memory and increase the writing speed of pictures.The optimization of rowkey improves the load balance of the Region,and the establishment of a composite index greatly increases the efficiency of data retrieval at the expense of a certain cluster space.
Keywords/Search Tags:HBase, small files, rowkey, coprocessor, composite index
PDF Full Text Request
Related items