Key Technology Researchonlog Storage And Analysis System On Cloud Platform

Posted on:2016-04-09

Degree:Doctor

Type:Dissertation

Country:China

Candidate:K Lv

Full Text:PDF

GTID:1108330473961527

Subject:Computer system architecture

Abstract/Summary:

PDF Full Text Request

E-commerce, social media, video companies collect and analysis user behavior logs for precise recommendations, advertising and recommend purpose. With the increasing amount of log data, traditional data analysis approach is difficult to meet the requirement of data analysis; companies are more inclined to achieve the log collection, collation and analysis on cloud platform. It also proposed new requirementson cloud storage and computing platform including the following aspects: 1) a more flexible storage system. The most popular Distrubted file system HDFS of industry is designed for data analysis, which is characterized by write one time and read many times, it is not suitable for such a multi-client log data appender and read write parallelism; 2) a more efficient data analysis framework. Applications such as recommendation systems, search engines which required process data in a short period of time, need an appropriate programming framework based on the characteristics of data processing to achieve higher efficiency.Based on the above new requirements on the cloud platform in log storage and process system, we studied the storage and computing framework on cloud platform. The main work and innovations are as follows:1) For the requirements of concurrent write and read write parallel in log data collection and analysis, we introduced a HDFS+file system based on HDFS. The HDFS+file system use sequential consistency model, only to ensure that each client data is written sequentially to increase the degree of parallelism between clients. On the other hand, HDFS+using snapshot technology to implement read and write parallelism.Experimental results show that the speedup of multi-clients concurrent writes to the same fileis 1.6 times faster than single client write speed.2) Forthe defects of low efficiencyof MapReduce framework in iterative computing, we providedlter-Hadoop frameworkto accelerate iterative programs execution.We use service to provide map and reduce function and provide two different memory buffer strategies to persist data. On the other hand, we use execution history to accelerate the reduce phase. Experiments show that compared to MapReduce framework, Iter-Hadoop programming framework can significantly improve the efficiency of program execution iteration.3) Forthe shortcomings of MapReduce framework that processentire data set when bulk data process, we providedlnc-MapReduce framework.Inc-MapReduceaccelerates the execution of a batch processby using the intermediate results of last program.Experimental results show thatlnc-MapReduce provides more than seven times speedup when running the Grep and WordCount batch programsin our platform if the size of incremental data is 256MB and initial data is 60GB.

Keywords/Search Tags:

Cloud Computing, Distributed Storage System, Programming Model, Iterative Computing, Incremental Computing

PDF Full Text Request

Related items

1	Research On Energy-Efficient Improvement Methods For Storage And Computing Layer Under Cloud Computing Environment
2	Research On The Synergism Between Distributed Storage And Parallel Computing In Cloud Computing Environment
3	Design And Implementation Of The Incremental Iterative Task Oriented Problem Set Programming Model
4	Research On Data Computing Offloading And Distributed Cloud Storage Management Issues In Mobile Cloud Environments
5	Research On Secure And Efficient Cloud Storage Based On Fog Computing Schema
6	Simulation Runner: A Lightweight Cloud-based HPC Platform
7	Research And Implementation Of Massive Teaching Resource Storage Model Based On Cloud Computing
8	Truested Distributed Computing Environment Key Technology Research On Cloud Computing
9	Research On RDF Storage System Based On Cloud Computing
10	Research On Formal Description Methods Of Programming Model In Cloud Computing