Font Size: a A A

Cost-effective cloud data processing

Posted on:2013-01-29Degree:Ph.DType:Thesis
University:The University of Wisconsin - MadisonCandidate:Lang, WillisFull Text:PDF
GTID:2458390008977056Subject:Computer Science
Abstract/Summary:
We are headed towards an increasingly data-driven society due to the vast amounts of data that are both more readily available, and more valuable than ever. Consumer and enterprise services such as social networking sites and traditional retailers are working together to gather and use data to increase the efficiency and effectiveness of their business operations. Additionally, the health-care industry is also taking advantage of readily available research, drug, and treatment data to increase both the efficiency of services and the quality of care for patients. In short, the value of “big data” analytics is now widely recognized across all sectors of society.;The IT industry and academic researchers have raced to develop new systems that enable extraction of insights from vast data repositories. These systems are being built and run in “cloud” clusters and housed in large data warehouses. Unfortunately, to-date, little attention has been paid to the rising costs of deploying and running such systems (;This dissertation provides a comprehensive look at the relationship between the following three factors: (i) the performance of the data processing system; (ii) the allocation of hardware resources; and (iii) the energy consumption of the system. The key challenges involve modeling the relationships between performance, hardware, and energy, as well as developing frameworks to effectively trade-off one for another. We find that we can modulate the performance and hardware/energy costs of data processing in a controlled way by changing the way that the software uses its hardware resources, and/or by changing the way we build our servers/clusters for data-processing systems. The main contributions involve the identification, formulation, and evaluation of models and frameworks for desirable trade-offs between hardware/energy costs and data processing performance. The implications of this thesis is that the methods presented here can be used to reduce the overall dollar cost of running big data analysis in the cloud, while meeting any applicable workload performance targets.
Keywords/Search Tags:Data, Performance
Related items