Font Size: a A A

The Research And Practice Of Performance Optimization Based On Hive

Posted on:2012-10-25Degree:MasterType:Thesis
Country:ChinaCandidate:W C YeFull Text:PDF
GTID:2178330335963520Subject:Engineering
Abstract/Summary:PDF Full Text Request
As the increasing development and expansion of the market from Internet, the computing technology which for large data become the subject of much concern to engineering studies. So, the Hadoop technology what be widely used in the Internet has been concern from engineering research and development sector, domestic universities and research institutes has used the Hadoop in data storage, resource management, job scheduling, performance optimization, high availability and security of the system's characteristics are also conducting in-depth analysis.This paper base on applications to process transactions data storage in Taobao E-commerce System, study the method of optimization performance based on Hive which is the infrastructure based on Hadoop. This paper gives the definition the characteristics of Hadoop system in calculate, and analyzed Hadoop's Map/Reduce task and the corresponding file storage system HDFS. Give us 3 parts for optimization performance.Using the introduction of changes in underlying parameters to achieve the effect of optimization which is executable; describeing the Hadoop language HQL based on SQL; using the typical code example to explain the Hive logical characteristics, after the describe of corresponding optimization and programs of parameter adjustment, from some aspects which like:change the program of data types,solve the program of data skew,reduce the jobs with the using of internal optimizations or in the view of the data needs from Taobao. At last, data show that good effect for algorithm has been achieved.
Keywords/Search Tags:Large data processing, Parallel computing, distributed computing systems, HDFS, HADOOP, Hive, optimization
PDF Full Text Request
Related items