Research On HBase-based Mass Image Storage And Fast Retrieval Technology

Posted on:2021-04-10

Degree:Master

Type:Thesis

Country:China

Candidate:D Xie

Full Text:PDF

GTID:2428330602995922

Subject:Software engineering

Abstract/Summary:

PDF Full Text Request

With the advent of the Web2.0 era,more and more pictures need to be stored in the database.Massive picture information,unstructured data structures,and frequent read and write operations all provide difficulties for data storage.How to efficiently store massive picture data is a topic worthy of attention.The emergence of big data provides us with new ideas for solving massive picture storage.Aiming at the characteristics and storage requirements of massive pictures of faces and the comparison of distributed storage frameworks,the thesis proposes a massive picture solution based on HBase and adopts the distributed storage structure of Master / Slaver.In order to improve the reliability of the picture data,build HA high availability architecture to improve the reliability and fault tolerance of the system.In terms of picture storage,in order to improve the insertion efficiency of massive pictures,different tables and storage methods are designed for different picture sources.For the large-scale face capture data and unstructured character information,a distributed storage primary key is designed to successfully solve the problem of HBase data imbalance in high concurrency situations,and improve the load balance of regions in the region.Since the face image is a small file type,too much will affect the access efficiency of the cluster,so this paper optimizes the existing solution in Hadoop and proposes a new solution,first extracting the feature values in the face image Then,the k-means algorithm is used to merge small files with high similarity into large files to improve the utilization of block blocks in Hadoop.In terms of text retrieval,due to the lack of secondary indexes in HBase,the efficiency of multi-condition queries is low.In order to make up for the shortcomings in this regard,this paper proposes a new solution idea,using the coprocessor to combine the Elasticsearch tool with HBase to build a composite index to improve the retrieval efficiency of HBase.In the search of similar pictures,the LSH algorithm is used to map the picture data with high similarity to the same bucket structure.Finally,build a distributed cluster to test the optimization scheme from the aspects of cluster space-time overhead and retrieval efficiency.Experiments show that when the number of inserted pictures is the same,the small file merge scheme can reduce the consumption of cluster memory and increase the writing speed of pictures.The optimization of rowkey improves the load balance of the Region,and the establishment of a composite index greatly increases the efficiency of data retrieval at the expense of a certain cluster space.

Keywords/Search Tags:

HBase, small files, rowkey, coprocessor, composite index

PDF Full Text Request

Related items

1	Research On Retrieval Speed Improvement Of HBase Based On Coprocessor Mechanism
2	The Design And Implementation Of Full Text Index For HBase Based On Lucene
3	Design And Implication Of Mini-files Storage System Based On Hbase
4	Hbase Non-primary Key Attribute Index Method And Implementation
5	Optimization Study On Storing Massive Small Files Based On Hadoop
6	Research On Access Optimization Of Small Files In Hadoop Cluster
7	Research And Application Of Small Files Storage Method Beased On HDFS
8	The Research And Implementation Of Method Regarding To The Small Files Problem Of Hadoop
9	The Research And Implementation Of Mass Small File Storage System
10	Research And Implementation Of Fast Retrieval Technology For Massive Small Files