Rock Image Clustering Analysis Algorithm Research Based On Spark

Posted on:2017-04-07

Degree:Master

Type:Thesis

Country:China

Candidate:Q Q Zhao

Full Text:PDF

GTID:2348330482494592

Subject:Computer technology

Abstract/Summary:

With the rapid growth of Internet technology,the amount of information that people can access is growing day by day.Therefore the demand of the tools that process these massive data is more and more high.Images,as a kind of intuitive and content-rich multimedia information,hive an extensive application in science technology and daily life.How to manage and retrieve massive image data quickly and effectively and then obtain the potential valuable information is the problem that people concern.The Hadoop platform that widely used cannot meet the need of people because its processing speed is almost reaching the bottleneck.The appear of Spark can solve this problem.Spark’s processing speed can be higher than the Hadoop about one hundred times that makes it saves a lot of time and makes it far more than Hadoop in iterative and interactive computing.Data mining is one of the core processing of large data,cluster analysis as an important research content of data mining has received a lot of attention in recent years.The traditional clustering algorithm has been unable to meet the needs of the processing of massive information,therefore efficient clustering technology came into being.The research of clustering algorithm used in Spark platform is very rarely in home and aboard.There is no relevant research of rock image used in Spark platform so this thesis first proposed the related research of rock image clustering algorithm based on Spark.The main work is as follows:1.Spark Platform.Spark platform as a new emerging big data platform,compared with the Hadoop platform has many advantages,for various considerations we selected Spark.2.K-means algorithm and improved K-means algorithm.The traditional K-means clustering algorithm is particularly dependent upon the selection of the initial cluster centers,if the select of initial cluster centers is inappropriate,the algorithm is easy to fall into optimal value of single cluster and the segmentation results has an impact of the number of clusters K.The thesis proposes an improved K-means algorithm which based on the probability of selection.the data sets that get from this algorithm is far less than the initial data set,so it will greatly improve the speed of K-means clustering.3.The improved K-means algorithm is applied to deal with the rock image,K-means algorithm is used to extract the feature of rock image,which makes it easy to distinguish the rock image.4.The improved K-means algorithm is applied to the Spark platform to realize its high efficiency.

Keywords/Search Tags:

Hadoop, Spark, Iterative calculations, data mining, clustering analysis

Related items

1	Research On The User Electricity Characteristics Based On Big Data
2	Research On Machine Learning Clustering Algorithms In The Hadoop Development Environment
3	Research On Parallel Clustering Algorithm For Large - Scale Data Set
4	Design And Its Implementation Of Iterative Distributed Clustering Framework Based On Model Fusion
5	Spectral Clustering Algorithm Based On Spark And The Application On QAR Data
6	Research Of Large-scale Data Mining Technology Based On Spark
7	Research And Application Of Data Mining Technology Based On Spark In ERP System
8	Agricultural Product Price Analysis And Forecast System Design Based On Hadoop+Spark Platform
9	Analysis And Research On Energy Consumption Of Public Buildings Based On Hadoop
10	English On Design And Implementation Of Network Data Parallel Processing System Based On Hadoop Platform