Font Size: a A A

High-performance File Storage And Management System Based On HDFS

Posted on:2021-03-31Degree:MasterType:Thesis
Country:ChinaCandidate:X B ZhangFull Text:PDF
GTID:2428330611967466Subject:Control engineering
Abstract/Summary:PDF Full Text Request
With the explosive growth of global data information,emerging industries such as artificial intelligence,machine learning,big data,and the Internet of Things are booming.The geometrically increasing amount of data requires larger storage space and lower storage costs,so distributed cloud storage system came into being.This design is based on HDFS,and is committed to developing a distributed file cloud storage system with performance,security and applications.The file system uses HDFS as the underlying storage support,but users cannot directly access the HDFS native interface method;in terms of security,the Hadoop encryption algorithm is single and does not support iterative encryption and application-level encryption;in terms of file search,files provided by HDFS but the retrieval function needs to traverse the entire file directory list,which consumes a lot of time and computing resources.In terms of small file storage,HDFS did not consider the waste of Name Node memory space caused by massive small file storage at the beginning of the design.Based on the above analysis,this design mainly researches and improves from four aspects: the realization of the basic functions of the file storage system,file and system security,the search function on the site,and the optimization of small file storage.In terms of basic functions,it encapsulates and enriches the HDFS native access interface,provides URLs for users to directly access methods,and implements file operations such as file uploading and downloading,moving,browsing,and deleting at the application level.Transparent management of system status,and unified definition and processing of the result data and exception information returned by the request.In terms of file data encryption and system security,AES is used to encrypt user sensitive file data,RSA is used to encrypt AES keys,and MD5 is used to encrypt user account information.Spring Security is used to implement user identity authentication and permission control.Double-layer protection mechanism for data encryption and application system authentication is provide to this system.In terms of site search functions,Elasticsearch is used to build a user-oriented site search module.By creating index mapping relationships of multi-attribute and multi-dimensional description file data information,users can create,delete,and update indexes,as well as customize field searching,combined field searching,fuzzy search or sorting of the result list,etc.At the same time,it provides the implementation of relevance scoring query based on user preferences,thereby providing an independent,efficient,user-oriented system on-site search function.In terms of small file storage strategy optimization,a small file metadata attribute information definition class is designed,and a new small file merge strategy is proposed: type grouping based on file suffixes and merge grouping based on file size.The secondary index is build while file uploading,the former improves the retrieval speed of the file system,and the latter guarantees the utilization efficiency of space resources.The test results show that the system's basic functional interface request response contains three parts,which are the status code,statistical information,and result data.Compared with the file statistics on the HDFS cluster,the expected results are achieved.Based on the AES and RSA combination encryption strategy,the file encryption and decryption function work normally and is transparent to users.The Elasticsearch-based system's search module can build multi-dimensional,user-oriented file index mappings.Users can customize conditions to make feature-rich search requests.Grouped files based on file type and size,the file merge strategy has certain implementation possibilities.
Keywords/Search Tags:HDFS, file storage system, combined encryption, Elasticsearch, small file merge
PDF Full Text Request
Related items