Font Size: a A A

Research And Implementation Of HDFS Oriented Access Control And Small File Storage Strategy

Posted on:2018-08-05Degree:MasterType:Thesis
Country:ChinaCandidate:M LiFull Text:PDF
GTID:2348330533469802Subject:Computer technology
Abstract/Summary:PDF Full Text Request
The initial design of Hadoop is to store and analyze large data,has great advantages for massive data processing,HDFS(Hadoop distributed file system)as the underlying storage medium,has the advantages of low cost,flow processing,suitable for processing large files.However,HDFS has weak access control capability,and although Hadoop can support Kerberos user authentication,it is expensive and inflexible.In addition,HDFS on file with the support of the good,but the small file support is low,a large number of small files stored in the HDFS,the space occupied by the master node metadata be large,so that the number of the entire file system is limited,at the same time,small HDFS file reading efficiency is not high,will affect the reading the IO performance of the main nodes of a large number of small files.At the same time,in order to improve the security,Hadoop introduces the encryption space,but there are some disadvantages such as the single encryption algorithm,the support of the iterative directory encryption,the high use power and the application level encryption.Therefore,this article mainly conducts the research from the above three aspects,and puts forward three points of Optimization:(1)using the trust value control method for index access,according to the user's access history with feedback access control method to improve the HDFS access control ability.(2)according to the user's access history,mining association rules.On the basis of frequent itemsets,the merged files are stored in HDFS,and the two level cache strategy is adopted to improve the reading efficiency.(3)using pluggable way to encrypt files,data is stored in encrypted mode in HDFS,improve the data security.In order to Map Reduce and client oriented two ways to achieve encryption and decryption strategy,custom Input Format,so that it supports Map Reduce.In this paper the cluster on access control,encryption,file merging three parts,test by medical images,the experimental results show that the trust value of access control has good performance on the basis of this,compared to the original HDFS system,additional time overhead.Small file merging strategy is very necessary,greatly reducing the space occupancy of metadata,and in the centralized access mode,it has a good cache hit rate and improves the efficiency of reading.For non encryption,XOR-AES,AES,encryption three respectively in the client oriented and tested two cases for Map Reduce XOR-AES,although there is a certain time overhead,compared to the AES algorithm,better performance,in the case of Map Reduce small time overhead.The test shows that the proposed strategy achieves the desired results.In non encryption,XOR-AES,AES respectively in three cases and two kinds of client oriented and Map Reduce oriented,XOR-AES,although there is a certain time cost,compared to the AES algorithm,better performance,in the case of Map Reduce time cost is so small.After testing,the propos ed strategy can achieve the desired results.
Keywords/Search Tags:HDFS, frequent itemsets mining, small files storage, Input Format, access control, encryption
PDF Full Text Request
Related items