Font Size: a A A

Research And Design Of A Trusted Distributed File System Based On HDFS

Posted on:2015-05-08Degree:MasterType:Thesis
Country:ChinaCandidate:X J ZhangFull Text:PDF
GTID:2298330422482044Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
After Google published the paper about the google distributed file system GFS, the bigdata era is coming. Doug Cutting and others develop a project called Hadoop based on GFSand MapReduce, which has become the shipflag in big data industry these few years, whichinduced a revlotution of big data storage in the internet. Hadoop implemented a distributedfile system called HDFS, which not only tolerant, but also utilized groups of low performanceserver to form a high performance distributed file system. HDFS is constructed by namenode,secondary namenode datanode and dfsclient.Because Hadoop is designed to be builded in the intranet which is protected by firewalland only allows the adminstrators of the intranet to access, Hadoop did not use any securitymethod before1.0version, the communication between Client and Server, name node anddata nodes, are easy to be attacked. After Hadoop1.0, Kerberos authentication and accesscontrol list authorization are added to Hadoop. Because Hadoop is designed to be protectedby firewall, the securiy infrastructures are not amied to defend attack from the Hackers fromthe outside world but to help the users of the distributed system to use the resources.Advanced Persistent Threat APT, which is attack from the intranet, has becoming one ofthe most threat to the enterprises. APT is not a method of attack, it is a set of stealthy andcontinuous hacking processes often orchestrated by human targeting a specific entity. APTusually targets organizations and or nations for business or political motives. APT processesrequire high degree of covertness over a long period of time. APT avoid the confrontation offirewall by starting attck from the intranet, which usually started by social engineers, theattacker would reseach the main topics of the target users, and send emails about theirinterests to them. After the targets read the emails, if they happened to be interested abo ut theattachment of the email, he may download it and open it, that is when the attacker’s code isrunned, and connect to the server of the attacker. Firewall is not a silver bullet againstAdvanced Persistent Threat.Trusted computing is another way to solve the security threat caused by APT. Remoteattestation could assure all the servers communicating in the cluster are trustworthy, integritymeasurement architecture could record the platform status of the attested machine, and datasealing could assure the sealed data could only be unsealed under the same platform status, allthese trusted computing technology could give the distributed file system a new protection.In this paper, we design a trusted distributed file system based on HDFS which aims tosolve the Advanced Persistent Threat. The major works of this paper are: 1. Study on the HDFS implementation and HDFS security, analyse the drawbacks ofKerberos authentication, research on Advanced Persistent Threat and proposed atrusted distributed file system design based on HDFS.2. Reseach on trusted computing techniques, analyse remote attest, integratedmeasurement and data sealing teachinque.3. Prose remote attestation based on IMA as the main method to discover APT attack.4. Propose to add remote attest on Hadoop Remote Procedure Call to stop APT attackerfrom penetrating the cluster.5. Propose to remote attest when socket connecting to an untrusted address to stop APTattacker from connecting to his own server.6. Propose to a data sealing procedure after remote attest failed.7. Design and implement a trusted network dish.The trusted distributed file system based on HDFS, is a new way protect big data againstthreat like APT. The combination of trusted computing and distributed syste, not only givesdistributed system a new way to protect itself, but also promote the progress of trustedcomputing.
Keywords/Search Tags:Trusted Computing, Remote Attest, Integrity Measurement, Distributed System, TPM, HDFS
PDF Full Text Request
Related items