Research Of Clustering Algorithms In Big Data Analysis

Posted on:2017-02-27

Degree:Master

Type:Thesis

Country:China

Candidate:H S Cai

Full Text:PDF

GTID:2308330485489546

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

With the rapid development of information techniques, especially mobile communication technology, social media, the Internet of Things and cloud computing has been integrated into peopleâ€™s lives and works, which results in a huge amount of text data having been accumulated. The amount of data that we collect is still increasing rapidly. Faced with such massive of data, how to get valuable information from the data has been a hot topic in many subjects. Custer analysis, which is widely used in both academy and industry, is a common technique in Data Mining and Machine Learning. However, the conventional clustering algorithms deal with data serially; when we apply them on large data sets, the performance is now high, because of the limited internal storage.To address changes in dealing with large data sets and improve the performance of cluster algorithms, paralleled cluster algorithms are in a hot research among academia. Apache Hadoop is an open-source software computing platform for distributed processing, which is an open-source implementation of the parallel computing framework MapReduce and Google File System. Hadoop has been one core of big data analysis for its ease-of-use and scalability. Spark is a cluster computing platform designed to be fast and general-purpose. Spark is shipped with friendly feature for programmers such as task scheduling, memory management, fault recovery, interacting with storage systems. Spark exposes RDDs through a language-integrated API, which can be used to implement algorithms for big data analysis.This paper makes an comparison between big data analysis platform aforementioned, give detailed explanation of the principle how the platforms process parallel, and how to make conventional clustering algorithms parallel.

Keywords/Search Tags:

big data, cluster analysis, Hadoop

PDF Full Text Request

Related items

1	Clustering For Stock Data Analysis On Hadoop
2	Research And Improvement Of Job Scheduling Algorithm Based On Hadoop Cluster
3	The Sports Monitoring And Managing System Based On Hadoop Cluster
4	The Design And Implementation Of Deployment And Management System For Hadoop Cluster
5	Research And Implementation Of Online Learning Behavior Data Analysis And Visualization Based On Hadoop
6	Meticulous Analysis Based On Hadoop And Its Application
7	Research Of Clustering Algorithms In Big Data Analysis
8	The Design And Implementation Of The Management System Of The Hadoop Cluster
9	Design And Implementation Of Hadoop Cluster Web Log Analysis System Based On Eucalyptus
10	Study On The Robust Optimization Of HADOOP Under The Restriction Of Cluster Computing Efficiency