Font Size: a A A

Key Technology Researchonlog Storage And Analysis System On Cloud Platform

Posted on:2016-04-09Degree:DoctorType:Dissertation
Country:ChinaCandidate:K LvFull Text:PDF
GTID:1108330473961527Subject:Computer system architecture
Abstract/Summary:PDF Full Text Request
E-commerce, social media, video companies collect and analysis user behavior logs for precise recommendations, advertising and recommend purpose. With the increasing amount of log data, traditional data analysis approach is difficult to meet the requirement of data analysis; companies are more inclined to achieve the log collection, collation and analysis on cloud platform. It also proposed new requirementson cloud storage and computing platform including the following aspects: 1) a more flexible storage system. The most popular Distrubted file system HDFS of industry is designed for data analysis, which is characterized by write one time and read many times, it is not suitable for such a multi-client log data appender and read write parallelism; 2) a more efficient data analysis framework. Applications such as recommendation systems, search engines which required process data in a short period of time, need an appropriate programming framework based on the characteristics of data processing to achieve higher efficiency.Based on the above new requirements on the cloud platform in log storage and process system, we studied the storage and computing framework on cloud platform. The main work and innovations are as follows:1) For the requirements of concurrent write and read write parallel in log data collection and analysis, we introduced a HDFS+file system based on HDFS. The HDFS+file system use sequential consistency model, only to ensure that each client data is written sequentially to increase the degree of parallelism between clients. On the other hand, HDFS+using snapshot technology to implement read and write parallelism.Experimental results show that the speedup of multi-clients concurrent writes to the same fileis 1.6 times faster than single client write speed.2) Forthe defects of low efficiencyof MapReduce framework in iterative computing, we providedlter-Hadoop frameworkto accelerate iterative programs execution.We use service to provide map and reduce function and provide two different memory buffer strategies to persist data. On the other hand, we use execution history to accelerate the reduce phase. Experiments show that compared to MapReduce framework, Iter-Hadoop programming framework can significantly improve the efficiency of program execution iteration.3) Forthe shortcomings of MapReduce framework that processentire data set when bulk data process, we providedlnc-MapReduce framework.Inc-MapReduceaccelerates the execution of a batch processby using the intermediate results of last program.Experimental results show thatlnc-MapReduce provides more than seven times speedup when running the Grep and WordCount batch programsin our platform if the size of incremental data is 256MB and initial data is 60GB.
Keywords/Search Tags:Cloud Computing, Distributed Storage System, Programming Model, Iterative Computing, Incremental Computing
PDF Full Text Request
Related items