Font Size: a A A

Research On The Key Technologies Of Cloud Computing Platform Hadoop

Posted on:2016-03-16Degree:DoctorType:Dissertation
Country:ChinaCandidate:H X ZhouFull Text:PDF
GTID:1318330518494057Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Cloud computing is the inevitable product of the IT development to a certain stage, it is an innovative IT infrastructure and management method,and it is also an innovative business model. Hadoop is a mature open source cloud computing platform which can build a highly reliable and highly scalable cloud services environment, and can significantly reduce operating costs and improve operational efficiency, so it is widely concerned by the industry and academia.Hadoop has many advantages as above, but it needs to be improved and prefected in the small file processing, load balancing strategy, resource storage and service measurement and billing. In this disertation, for problems of Hadoop in the above aspects, the corresponding key technologies are studied and analyzed respectively, and the main results are as follows:1) Small file processing of Hadoop: For the problem of reading and searching from Hadoop Inefficiently, this disertation proposed a small file solution for Hadoop based on data prefetching that are combined with data prefetching technology. Prefetching for index file of NameNode and data block of DataNode on Hadoop, it can improve the speed of reading and retrieval efficiency. Compared with the existing solutions of Hadoop Archive, Sequence File and CombineFilelnputFormat, the average speed of file reading is increased by 42.7%, 20.1% and 10.6%, and the average retrieval time is reduced by 37.8%, 40.2% and 28.6%.2) Load balancing of Hadoop: Through the analysis of the existing load balancing schemes of Hadoop platform, it is found that the storage space utilization rate of the service nodes is only considered, but the relative load of each nodes is not considered. However, in the actual operation of Hadoop, the dynamic factors, such as file access, network bandwidth,CPU capability and memory utilization, have direct or indirect effects. To compensate for this problem, we proposed a load balancing scheme for Hadoop based on analytic hierarchy process. This scheme considered concurrent file access, network dynamics bandwidth, CPU capacity and memory utilization, and designed the method for computing load of file data block and relative load of service node. Experiment results show that the Load fluctuation values on our scheme and the existing load balancing scheme of FIFO, Fair Scheduler and Capacity Scheduler were 0.00007,0.00049, 0.00037, 0.00031 and 0.00029. The load fluctuation value that our scheme was minimal and its speed of file reading is increased by 17.11%, 15.53%, 9.26% and 7.70% than the orther schemes.3) Storage resource on Hadoop: Using mode that is entirely dependent on the cloud storage for resources operation and management when Hadoop store resources, it is likely to cause security problem that data be attacked by external network. For this defect, in this disertation,a hybrid cloud storage model based on Hadoop (PCS model) was proposed, which is based on P2P technology. We store non-critical data to the cloud and the important data to the internal PCS P2P network on the PCS model, so that important data can be effectively isolated from the outside world, while increasing the speed of resource transfer and retrieval efficiency of storage resources. Experiment results show that the average speed of file transfer is increased by 9.35%, and the average speed of file retrieval time is reduced by 6.77% on our PCS model when compared to normal cloud storage.4) Metering and billing for service of Hadoop: The analysis revealed that traditional flat billing mode and current mainstream usage-based billing model, there also measure the presence of a single factor and easy to ineffective service billing and other issues. Combined with the characteristics that service diversity and low cost of Hadoop platform, this disertation proposed a service charging scheme for Hadoop based on business logic. This scheme gives a method of measurement and billing based on business logic, and can measure the Hadoop service resource more fine-grained. Through the example verification and analysis, the total cost of users and the network traffic of our scheme are reduced by 7.5% and 8% respectively compared with the usage-based billing model.
Keywords/Search Tags:Cloud computing, Hadoop, Small file processing, Load balancing, Metering and billing
PDF Full Text Request
Related items