Design And Implementation Of Data Transform Platform Based On Big Data

Posted on:2016-04-30

Degree:Master

Type:Thesis

Country:China

Candidate:B Wang

Full Text:PDF

GTID:2308330503977802

Subject:Software engineering

Abstract/Summary:

PDF Full Text Request

With the increasingly rapid development of computer technology, the data which people in contact with is growing explosively. The drastic and continuant increase in data scale not only bring in enormous value and profit to people, but also result in severe challenges. Massive data processing work has become a hot issue in today’s research. Now there are many sophisticated processing algorithms in issue-specific data processing, no matter from efficiency or from the computational complexity, the traditional data processing algorithms have been unable to meet the processing needs of massive information, cloud computing technology development provides a new research direction for massive data processing. Cloud computing distribute the ability of storage and computing among multiple nodes in cloud cluster. So it enabling huge data set storage and computing power. In order to be able to respond to the challenges posed by the large data, companies developed their own cloud computing platform for data processing and analysis of data has become the mainstream trend.In this thesis, on the basis of research on massive data processing, the data transform platform which can be customized to simplify the massive data processing has been brought up. In order to improve the quality of data, there is a need to conduct out-lier detection in data sets. Because the traditional algorithms have rather high time complexity in clustering process, in this thesis, a parallel scheme for outlier detection based on traditional cluster algorithm was proposed.In the data transform platform solutions, "action flow" approach to abstract data processing action has been designed, by which enables users to custom data processing methods and processes on actual needs. In order to avoid customers to design SQL statements and programming codes, "input-process-output" statements in the form of configuration file was proposed. In the outlier detection solutions, in order to deal with massive data processing, the thesis has a parallel design and implementation of traditional K-Medoids clustering algorithm. Meanwhile, a distance sum-based method for outlier detection was designed, and there is no need to set parameters in advance. The experimental results showed that the efficiency and accuracy have been promoted considerably.The thesis proposed solutions to adapt massive data processing, saving a great deal of code writing time, and a distance sum-based method for outlier detection was proposed, and the whole project has rather good practical value.

Keywords/Search Tags:

data processing, cloud platform, outlier detection, Hadoop, K-Medoids

PDF Full Text Request

Related items

1	Research Of Outlier Detection Algorithm Based On Hadoop
2	The Research Of Intrusion Detection Technology Based On Outlier Mining Under The Hadoop Cloud Platform
3	Research And Optimization On K-medoids Clustering Algorithm Based On Hadoop Platform
4	Research Of The Wind Turbines Vibration Data Processing On Hadoop Cloud Platform
5	Research And Implementation Of Local Outlier Detection Algorithm On Hadoop
6	Outlier Detection And Model Reconstruction Of 3D Point Cloud Data
7	Research On Some Key Technoligies Of3D Point Cloud Data Processing
8	Research On The Novel Big Data Processing Platform In Cloud Computing Environment
9	Research On Key Technology Of Hadoop-based Network Security Log Audit System
10	The Research And Implementation Of K-Medoids Clustering Algorithm Based On Density And Hadoop