Font Size: a A A

Optimization And Application Research Of MapReduce Computing Model Based On Hadoop

Posted on:2016-11-24Degree:MasterType:Thesis
Country:ChinaCandidate:Z Y LiFull Text:PDF
GTID:2298330467991234Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the rapid development of the Internet, especially in recent years, the rapiddevelopment of mobile Internet and e-commerce, the network has entered a new era. Thenew age of the Internet has the following characteristics: large-scale data sets,multi-terminal platforms. Enterprises to establish an IT system not only need to purchasehardware and other infrastructure, but also need someone to maintain. When the size ofthe business will continue to expand the hardware and software upgrade facilities to meetthe needs of traditional data processing method has high cost of data storage, datamanagement and low efficiency. Especially for small and medium enterprises, andcomputer hardware and software just to improve computing and storage efficiency tool.Hadoop cloud computing as an inherited many excellent characteristics of cloudcomputing. Coupled with its subprojects MapReduce and HDFS open source, scalableand other characteristics, has become a popular cloud computing development platform.But there are some flaws in some scenarios shortcomings. For example MapReduceMapper will generate a lot of results, but this time the Reducer is not called to mergethese intermediate results, increasing the burden on the intermediate results of a largenumber of network transmission and cause Reducer idle, reducing the efficiency ofMapReduce.Through research and analysis works MapReduce computation model, operationmechanism and fault tolerance mechanisms proposed optimization. In Yarn framework,the use of MPI technology enables Reducer Mapper runs in parallel with the processingof intermediate results. Through experimental analysis group to enhance the MapReducecomputational efficiency, reduce the coupling between computation and storage. Inaddition, without changing the basic business focus pollutant emissions calculationsystem based on the use Sqoop data migration technology, combined with the advantagesof the optimized MapReduce computation model and storage technologies, to design arelational database and distributed database solutions exist, solve the storage and accessefficiency relational database issues for storage and computing separation while reducingthe cost of development of SMEs.
Keywords/Search Tags:Cloud Computing, MapReduce, HDFS, HBase
PDF Full Text Request
Related items