Font Size: a A A

Rock Image Clustering Analysis Algorithm Research Based On Spark

Posted on:2017-04-07Degree:MasterType:Thesis
Country:ChinaCandidate:Q Q ZhaoFull Text:PDF
GTID:2348330482494592Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the rapid growth of Internet technology,the amount of information that people can access is growing day by day.Therefore the demand of the tools that process these massive data is more and more high.Images,as a kind of intuitive and content-rich multimedia information,hive an extensive application in science technology and daily life.How to manage and retrieve massive image data quickly and effectively and then obtain the potential valuable information is the problem that people concern.The Hadoop platform that widely used cannot meet the need of people because its processing speed is almost reaching the bottleneck.The appear of Spark can solve this problem.Spark's processing speed can be higher than the Hadoop about one hundred times that makes it saves a lot of time and makes it far more than Hadoop in iterative and interactive computing.Data mining is one of the core processing of large data,cluster analysis as an important research content of data mining has received a lot of attention in recent years.The traditional clustering algorithm has been unable to meet the needs of the processing of massive information,therefore efficient clustering technology came into being.The research of clustering algorithm used in Spark platform is very rarely in home and aboard.There is no relevant research of rock image used in Spark platform so this thesis first proposed the related research of rock image clustering algorithm based on Spark.The main work is as follows:1.Spark Platform.Spark platform as a new emerging big data platform,compared with the Hadoop platform has many advantages,for various considerations we selected Spark.2.K-means algorithm and improved K-means algorithm.The traditional K-means clustering algorithm is particularly dependent upon the selection of the initial cluster centers,if the select of initial cluster centers is inappropriate,the algorithm is easy to fall into optimal value of single cluster and the segmentation results has an impact of the number of clusters K.The thesis proposes an improved K-means algorithm which based on the probability of selection.the data sets that get from this algorithm is far less than the initial data set,so it will greatly improve the speed of K-means clustering.3.The improved K-means algorithm is applied to deal with the rock image,K-means algorithm is used to extract the feature of rock image,which makes it easy to distinguish the rock image.4.The improved K-means algorithm is applied to the Spark platform to realize its high efficiency.
Keywords/Search Tags:Hadoop, Spark, Iterative calculations, data mining, clustering analysis
PDF Full Text Request
Related items