Font Size: a A A

Design And Implementation Of Data Analysis And Modeling Tool Based On Spark

Posted on:2017-03-06Degree:MasterType:Thesis
Country:ChinaCandidate:Q NiuFull Text:PDF
GTID:2308330488485354Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
The rapid development of Internet technologymarks the arrival of the information age. Datas from various fields of social life with the features of massive, various and isomerous is increasing exponentially everyday. The data with a lot of information brings us lots of opportunities with the information it contains. The question is how to mine effective information from the tremendous data. In order to extract useful informations from the complex and tremendous data, it is essential to break the traditional thinking in data mining and to make inovations in data analysis and modeling.This thesis designed and implemented a modeling toolfor data analysis based on Spark which is a new distributed memory computing architecture.The tool itselt can cover all kinds of business logic of data mining. The users can complete theprocess construction of data analysis by some simple operations of all kinds of computing nodes, and receive the results of data analysis.On the basis of these existing processing techniques for super large scale data, which includes Sqoop, HDFS, Hbase, and Zookeeper, this thesis made an overall design for the intelligent decision analysis platform. Based on the supports provided by the platform, management issue of data objects has been taken into consideration, and provided an appropriatescheme for data flowing. According to the study about the methods of classification and clustering, the thesis built up a prediction model and set up a model library to take control of all those models. This thesis is keen on the design and implementation of the modeling tool, and give the scheme of system deployment and the analysis of test result.At present, this system is in the stage of trial operation in a pratical project, there is a large scale of data in the level of hundreds of GB processing and analysis. The results showed that the system itself can totally handle all the tasks and can meet the requirements for functional prediction.
Keywords/Search Tags:big data, data analysis, data modeling, spark
PDF Full Text Request
Related items