Font Size: a A A

The Reaserch And Implementation Of Key Technologies Of Big Data Preprocessing Based On Hadoop Platform

Posted on:2015-11-15Degree:MasterType:Thesis
Country:ChinaCandidate:S ZhangFull Text:PDF
GTID:2348330509460624Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Big data calculation and analysis system is a result of computer science development to a certain stage,it has attracted the attention from all over the world between colledge and company. The value of big data is of huge amounts of data storage, on the other hand is on the analysis of the data and processing work.Huge amounts of data depends on the file system for storage, the accumulation of huge amounts of data is a relatively slow process, data uploaded to the big data computer clusters needs cost times. For the data accumulation and uploaded to the data is processed by the real data processing program free phase. In line with making full use of system resources,A system for big data pretreatment system were studied.Only in the data processing tasks before the start of the study, the use of storage nodes in the distributed system is applied to the analysis of big data of the calculation of the rich resources, preprocessing task run with the data store in the file system on the local node. By use of computing resources available in the system, improve the resource utilization of the system. Study on the system requirements analysis, put forward a kind of service in the preprocessing system in the large data calculationThe realization of the Hadoop as graphs programming model, as a kind of typical large data analysis system has been widely used. The function of the Hadoop system which contains a lot of difference parts, also contains a Hadoop file system as data storage. According to the characteristics of the Hadoop system, the great data preprocessing of the Hadoop platform system research and implementation, and aims at WordCount program to build the prototype system of pretreatment system.Through data preprocessing system for Hadoop platform on WordConnt program implementation and testing, to verify the data preprocessing system in use to reduce disk I/O, compress the data volume, reduce the computing time significantly.
Keywords/Search Tags:Hadoop, Bigdata, Preprocessing, File System
PDF Full Text Request
Related items