Font Size: a A A

Research And Implementation On Cloud Software Infrastructure

Posted on:2014-02-12Degree:DoctorType:Dissertation
Country:ChinaCandidate:D DaiFull Text:PDF
GTID:1228330398963995Subject:Computer system architecture
Abstract/Summary:PDF Full Text Request
In the past ten years, cloud computing has been dramatically developed because their widely usage in industry. Along with variable kinds of applications began to ap-pear in cloud, cloud computing became more and more common in people’s diary life. However, some new applications like realtime searching, online recommendation, so-cial network analysis etc. still give us challenges:1) all these applications need to process huge amount of dataset, which give the storage systems lots of pressure on scalability. For example, the realtime search engine needs to process information from different sources and mesh-up them to generate results that users may be interested with, so all this info needs to be stored.2) They needs a higher random data access speed as they need to produce final results in in realtime fashion. The random data ac-cess pattern is necessary as most input datasets are small, fragile, raw data, it is hard to construct these data pieces into a continuous large data block.3) The computation is much more complex than traditional applications. Most of these applications included machine learning or data mining algorithms, which need lots of iterative and incremen-tal computations. Besides, due to the realtime requirement of these applications, they need to be more sensitive to the new data. Based on these challenges, there are some new storage systems and programming models appearing recently, however, the main problems still have not solved well.In this dissertation, we study the cloud software infrastructure in different aspects to build a complete framework for these applications, the main works and contributions of this dissertation include:1. We propose a new auto-configuration tool for heterogeneous Hadoop cluster. This tool collect all the hardware parameters and history execution information as the input of our fuzzy algorithm, and produce a collection of corrent Hadoop configuration to accelerate the MapReduce job execution speed in Hadoop. Our solution change the way of configuring Hadoop from optimzing the parameter to optimizing the fuzzy rules. The experiments show our tool improve the Hadoop cluster performance dramatically especially for the heterogenous cluster.2. In our memory based distributed keyvalue storage system (Sedna), we propose a new hierarchical architecture for distributed stroage systems. Woring with the new distribtued hash algorithm proposed in Sedna, this new architecture improved the scalability and the flexibility of load balance in Sedna. Besides, we propose a new API suit for realtim applications, which is much more sensitive to data mod-ification than traditional API. The experiments show that Domino can archieve a much better performance than current disk-based distributed storage systems and be comparable with the widely used memory cache system.3. We extend the long history trigger-based programming model into the distributed computing area with some new ideas. Domino is a trigger-based genenral dis-tributed programming model in cloud. To overcome the limitations of tradi-tional trigger-based models, we propose a eventually synchronous model to solve the problem that how different actions synchronize their executions. Besides, through introducing different synchronization models (asynchronous, evenutally synchronous, strict synchronous), Domino provides developers flexbile solutions for their needs. In Domino, we also propose a realtime recovery concepts and im-plementation which we beleive dramatically improve the scalaibility and speed of distributed computation. We implement different applications in Domino and compare their performance, the experiment results show that our Domino frame-work keep a good scalability and better performance than traditional MapReduce based solutions.
Keywords/Search Tags:Cloud Computing, Software Infrastructure, Distributed Storage, Program-ming Model, Computation Framework, Realtime Applications
PDF Full Text Request
Related items