Font Size: a A A

Research And Optimization Of Massive Small File Storage System For Vehicle And Driving Service

Posted on:2018-04-10Degree:MasterType:Thesis
Country:ChinaCandidate:Q C ZhengFull Text:PDF
GTID:2348330518456589Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the development of the computer Internet and the advent of the information age,in astronomy,geography,meteorology,e-commerce and many other areas,applications in these areas have accumulated a large number of data,while the data is scattered into a number of small files for storage.And such as banks,postal services,vehicle administration office and other convenience of these people's service industry has begun to transform to the "Internet +"development model by combining the Internet to its model,and in order to meet their own needs,They gradually produce more than a hundred million small files.And These small files are still in explosive growth,bringing a big challenge to system storage efficiency,retrieval and metadata management.In the context of the era of big data,according to the "Internet traffic safety comprehensive service platform construction guidance"(public transport pipe[2013]433)requirements,in order to promote the big data platform construction of the "Internet + Vehicle Administration",this paper is in view of the needs of the Nanning vehicle administration office,building a VDSMSS(Vehicle-Driving Service Mass Storage System)which is The Mass Small File Storage System for Vehicle and Driving Service based on HDFS,laying the foundation of the big data platform of "Internet + Vehicle Administration",and providing an effective solution to ideas and direction of optimization for today's service industry based on the HDFS-based mass small file storage system designing.It's practical significance and value.The main contents of this paper are as follows:(1)Briefly describe the core architecture of HDFS and its internal key data structure.Introduce the storage optimization program when facing the mass small files in current industry,analysis of several representative of the advantages and disadvantages of the program.Introduced several representative cache replacement algorithm,focusing on the file system ZFS self-tuning cache replacement algorithm(ZFS Adjustable Replacement Cache,ZFS-ARC).(2)Analysis of HDFS storage of small files when the problem,to determine the direction of optimization.Summarize the characteristics of the small files in the vehicle and driving business system,and make use of the characteristics to merge these small files into a large file in user units by time and service group,reducing the number of files to achieve the purpose of reducing the consumption of NameNode memory,while design an efficient index of a single file lookup and bulk search which can give consideration to the speed of retrieving file,dealing with the bulk file search behavior when user make a certain query condentions.(3)Aiming at the problem that the HDFS does not provide the prefetch and cache function based on file read and write,this paper proposes an adjustable cache replacement algorithm based on file relevance prereading mechanism.Firstly,the traditional Association Rules mining algorithm is used to mine the Hadoop log files with small file access records,and analyze the excavated data by theoretical mathematical analysis,calculating the potential correlation between small files.Then design a suitable file prefeching mechanism by the correlation,when a small file is read,prefeching the file which is associated with the small file into memory.Then,combine the ZFS-ARC,which give consideration to the time and frequency,with the prefeching mechanism which is designed in this paper to propose the self-adjusted cache replacement algorithm PRE-ZFSARC based on the file association prefeching mechanism,improving the small file read and write performance of VDSMSS.Finally,the effectiveness of the proposed scheme is proved by experimental comparison.And ultimately complete the mass of small file storage system performance optimization,making it highly applicable to vehicle and driving service system.
Keywords/Search Tags:massive small file, HDFS, batch lookup, associated prefeching, self-adjusted
PDF Full Text Request
Related items