Font Size: a A A

Research Of Clustering Algorithms In Big Data Analysis

Posted on:2017-02-27Degree:MasterType:Thesis
Country:ChinaCandidate:H S CaiFull Text:PDF
GTID:2308330485489546Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the rapid development of information techniques, especially mobile communication technology, social media, the Internet of Things and cloud computing has been integrated into people’s lives and works, which results in a huge amount of text data having been accumulated. The amount of data that we collect is still increasing rapidly. Faced with such massive of data, how to get valuable information from the data has been a hot topic in many subjects. Custer analysis, which is widely used in both academy and industry, is a common technique in Data Mining and Machine Learning. However, the conventional clustering algorithms deal with data serially; when we apply them on large data sets, the performance is now high, because of the limited internal storage.To address changes in dealing with large data sets and improve the performance of cluster algorithms, paralleled cluster algorithms are in a hot research among academia. Apache Hadoop is an open-source software computing platform for distributed processing, which is an open-source implementation of the parallel computing framework MapReduce and Google File System. Hadoop has been one core of big data analysis for its ease-of-use and scalability. Spark is a cluster computing platform designed to be fast and general-purpose. Spark is shipped with friendly feature for programmers such as task scheduling, memory management, fault recovery, interacting with storage systems. Spark exposes RDDs through a language-integrated API, which can be used to implement algorithms for big data analysis.This paper makes an comparison between big data analysis platform aforementioned, give detailed explanation of the principle how the platforms process parallel, and how to make conventional clustering algorithms parallel.
Keywords/Search Tags:big data, cluster analysis, Hadoop
PDF Full Text Request
Related items