Font Size: a A A

Research For Retrieval Performance Optimization Based On Double B-tree Under Deduplication Environment

Posted on:2019-04-04Degree:MasterType:Thesis
Country:ChinaCandidate:H Y CaoFull Text:PDF
GTID:2428330569996100Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the rapid development of information science and technology,the era of information globalization has come,and the total amount of global data has exploded.In the rapid development of global data volume has maintained rapid growth at the same time,hardware storage resources growth rate is far less,so the data storage way to become the focus of social attention.According to the relevant data,in the current stored data,the redundant data share is very high.De-duplication is out,but as a new technology,there are many deficiencies,such as data de-duplication system,a data block index structure improvement and optimization,performance of fingerprint indexing,etc.In this paper,the performance bottleneck of the system index reading and writing is studied,and the main work is as follows:(1)In the performance optimization of data de-duplication system,the index optimization is a very important part.Through the deep research on the related technologies of index structure,this paper summarizes the advantages and disadvantages of the current index structure,and points out that in the index structure Read performance in the bottleneck state.(2)In order to reduce the number of disk accesses and improve the read performance of deduplication,this paper proposes a Double B-tree Index Structure(DBIS)based on double B-tree.Based on the B-tree structure,a new double B-tree based index structure is designed and applied to the memory.B-tree index structure is composed of two different structure of the B-tree structure,one is to optimize the B-tree structure to improve the search efficiency,the other is based on the B-tree,the index structure formed by the integration of LRU algorithm,Increasing the detection hit rate.For the first B-tree is B-tree-1,the tree is optimized based on the characteristics of the B-tree.In order to reduce the number of B-tree traversals as much as possible,the number of fingerprints in each node of the B-tree is increased,and at the same time,the number of node branches is also increased.Thus making the height of the B-tree smaller,thereby improving the retrieval efficiency of the fingerprint.For the second B-tree is B-tree-2,based on the B-tree,the LRU algorithm is merged.The number of B-tree-2 trees stored is set to be 1/m of B-tree-1(m is the optimal value obtained experimentally).Leverage the least recently used idea of LRU to improve B-tree-2 retrieval hit rates.Thus has promoted the retrieval efficiency of the system.(3)For B-tree-2 in DBIS structure,time parameter T is introduced in B-tree-2 to clean the unvisited nodes every T time.Further,the time t is added to store the time when the node was last accessed.If this node is accessed,the time t is automatically refreshed.Because of the existence of the tree index parent,if the parent is not accessed for more than a set time parameter T,and the child node under this parent is also not accessed within the time parameter T,then the parent and its child nodes Will be eliminated from the index.In addition,it also analyzes the theory of DBIS and validates the validity and efficiency of DBIS on the system performance by experiments.With the increase of data detection,the detection efficiency is more obvious,which achieves the performance of deduplication system Read performance optimization purposes.
Keywords/Search Tags:De-duplication, Double B-tree, LRU, Index structure
PDF Full Text Request
Related items