Font Size: a A A

The Application Of Improved DBSCAN On DBMAS

Posted on:2018-05-16Degree:MasterType:Thesis
Country:ChinaCandidate:J R QinFull Text:PDF
GTID:2428330548480439Subject:Engineering
Abstract/Summary:PDF Full Text Request
Clustering technology has important applications in data mining,pattern recognition,machine learning and so on.However traditional clustering algorithms have become increasingly difficult to meet the demand of big data analysis with the explosion of data.How to improve the traditional clustering algorithm to ensure the quality and efficiency of clustering in the background of big data has become the significant research topic of the big data processing and artificial intelligence.The purpose of clustering analysis is to discover or explore knowledge that is latent in the data,so that classification and arrangement of data become more efficient and faster.The density-based clustering algorithms can cluster data sets of any shape on the case of unknown data distribution.The classical clustering algorithm DBSCAN(Density-Based Spatial Clustering Application with Noise)is a clustering algorithm based on density,it is widely used for data clustering analysis with its simple and efficient features,however there are still two problems,the first one is the initial parameter is difficult to select,and the second one is the clustering results are affected by global clustering eps.The main work of this thesis is as follows:1.For the problem of the traditional DBSCAN algorithm is difficult to effectively initialize parameters,we proposed the SALE-DBSCAN algorithm(self-adaptive Local Eps DBSCAN),in which data set by finding the peak point of density quickly and self-adaptive selecting local radius of clustering.Experimental results show that the SALE-DBSCAN algorithm is more accurate than other density clustering algorithms in any data set.2.Although SALE-DBSCAN algorithm has a better clustering result,it is difficult to meet the requirement of clustering under big data background.This thesis proposed a distributed parallel MR-SALE-DBSCAN algorithm based on MapReduce.This algorithm processes the data subsets by divide-and-conquer strategy.The simulation results show that the algorithm can improve the effects of clustering and reduce the time cost by parallelization.3.In accordance with software engineering specifications,many functional modules of Data Comprehensive Management and Analysis System of DongJiang Lake were designed and implemented by us.In this system,we have set up several database sub-systems corresponding to several functional departments of the government and conduct data comprehensive management and intelligent analysis for many application modules.Meanwhile we applied MR-SALE-DBSCAN algorithm to the hydrological analysis module,The system is stable after delivery and can provide decision support for surrounding residents of large drainage basin of DongJiang Lake,and also can provide decision support for the functional departments.
Keywords/Search Tags:Big Data, Data Mining, MapReduce, Cluster, DBSCAN
PDF Full Text Request
Related items