Font Size: a A A

Secure Storage Technology For Big Data Under Hadoop Architecture

Posted on:2022-11-28Degree:MasterType:Thesis
Country:ChinaCandidate:C H ZhangFull Text:PDF
GTID:2518306608990069Subject:Computer Software and Application of Computer
Abstract/Summary:PDF Full Text Request
In recent years,with the rapid development of Internet of Things,cloud computing and sensor technologies,data with large amount,various types and strong timeliness are emerging,and the concept of big data has emerged as the times require.Big data cannot be stored and calculated with conventional software within a certain time frame,so new technical architectures can make big data more valuable.Hadoop is a distributed system infrastructure developed by the Apache.Its core is the distributed file system HDFS and the distributed computing framework Map Reduce.Hadoop has the advantages of high reliability and strong scalability,and can store and calculate massive data on a cluster composed of ordinary computers.At present,Hadoop has become a typical framework for processing big data.Hadoop is originally designed to be used for distributed storage and computing of big data,with less consideration of data security.The data storage stage is located at the core of the big data life cycle,its data security is particularly important and has attracted more and more attention.Big data in Hadoop architecture can be divided into static data and streaming data according to whether it changes over time.The storage methods and security problems of the two types of data are different.Therefore,this paper studies the secure storage of static data and streaming data under Hadoop architecture respectively.The main work is summarized as follows:(1)The Name Node is the master node of Hadoop,which records metadata information.With the increasing amount of data,the Name Node has the risk of single point of failure and performance bottleneck.Therefore,a dual-channel multi-Name Node distributed storage scheme is proposed.Combined with HDFS high availability model and HDFS federation mechanism,a multi-Name Node storage model is established to form a hot backup scheme of metadata in pairs.In addition,the zookeeper distributed coordination mechanism is adopted to realize the automatic recovery of failed services of Name Node,which ensures the high availability of Hadoop cluster,and reduces the risk of metadata loss and service stop caused by a single point of failure of Name Node.(2)Static data in Hadoop includes structured data and unstructured data.If all of them are encrypted with an encryption algorithm,it will be difficult to balance the relationship between security and efficiency.To realize the secure storage of static big data and ensure efficient storage efficiency,a scheme of static big data secure storage based on lightweight encryption and homomorphic encryption algorithm is proposed.For unstructured data,a parallel encryption scheme based on elliptic curve lightweight encryption algorithm is designed.With the proposed dual-channel storage model,the data to be stored can be encrypted and stored from two storage channels at the same time.The dual-channel storage mode can increase the storage speed of data,thereby making up for the time consumption caused by the data encryption process.For structured data,to ensure that distributed computing can still be performed after it is encrypted,an improved homomorphic encryption algorithm is introduced,which eliminates the time consumption caused by the decryption and subsequent encryption of structured data before performing computing operations.(3)Streaming big data is constantly generated over time.Using traditional encryption algorithms will reduce the real-time performance of streaming data storage.In addition,the types of streaming data are diverse,and its original data needs to be completely stored for subsequent analysis.Therefore,a data lake-based streaming data security storage scheme is proposed.To realize the streaming encryption of data,a streaming encryption interceptor is designed in the data acquisition stage of data lake.Taking advantage of the characteristics that flume can transmit streaming data,the elliptic curve encryption algorithm is combined with the function of flume interceptor,to ensure the real-time storage of streaming data.Additionally,to store the streaming data that grows with time more securely and reliably,a partitioned compression storage scheme is designed in the storage phase,which can store the streaming data in partitions according to the incoming time of the streaming data.This scheme facilitates the retrieval and use of streaming data.The experimental results show that this scheme can realize the secure storage of data under the Hadoop architecture with a shorter key.Compared with similar schemes,under the optimization of the dual-channel storage model,the data storage efficiency of this scheme is higher,and the Hadoop cluster load is more balanced.
Keywords/Search Tags:Hadoop, Big Data, Secure Storage, Data Encryption, Data Lake
PDF Full Text Request
Related items