Research On DBSCAN Clustering Algorithm Oriented Big Data

Posted on:2018-12-27

Degree:Master

Type:Thesis

Country:China

Candidate:J Wang

Full Text:PDF

GTID:2348330563451355

Subject:Systems Engineering

Abstract/Summary:

From innovation to the Internet +,from the wisdom of the city to the national strategic cloud,from precision medicine to high-end manufacturing,all show the techno logy changes human life and thinking.The technology also produced a lot of data in the same time.These data’s explosive growth,too low value density,too fast update speed,complex data structure,makes getting effective information more difficult,but people are eager to get valuable information.People using the data mining technology,trying to explore people concerning about,valuable and meaningful information froma large number of random,incomplete,fuzzy,noisy data.Data mining requires a series of technical means and processing methods include regression,clustering,spectral frequency analysis,reptiles and many other technologies.C lustering is an effective means of data mining.C lustering can be a separate data analysis,but also in conjunction with other processing methods to tap the value of data.Density clustering in clustering analysis DBSC AN is universal to data cluster distribution and has strong ability to deal with noise,which is a powerful means for people to getbig data.However,the traditional data processing method is for the static database by increasing the computing power and storage capacity of a single computer to get the data processing capacity of the increase in the face of large data clustering will appear when the memory overflow,the algorithm’s time complexity Too high,can not solve each other contain the ring and intertwined spiral data clustering,DBSCAN clustering effect is unstable,depending on whether the input parameters are optimal.Because of the existence of the above defects,it is difficult to meet the large data age people effectively extract and mining data.DBSCAN is an unsupervised learning algorithm.Although the algorithm does not need a priori knowledge,each data update needs to reassemble all the data to cause the clustering efficiency to be too low and the overhead of the algorithm is too large.In general,the data mining problem of large data is urgently needed to optimize,rewrite and reduce the storage cost of the algorithm on the basis of DBSC ASN,improve the efficiency of the algorithm,improve the clustering effect of the algorithm,and realize the effective data Clustering.1.DBSCAN algorithm for memory overflow,the operation of the problem is too low.The clustering idea of DBSCAN is based on Map Reduce,and the efficiency of the algorithm is improved by the advantages of the cluster,which reduces the memory cost of the algorithm.The data of the wrapping class is difficult to be clustered,and the strategy of spiral projection is used to realize the clustering of complex data.The innovation point is to use the Cartesian product area to achieve the mutual inclusion of the ring and intertwined spiral data cluster ing;using large data programming model Map Reduce,the DBSCAN transplanted to large data computing system.Algorithm: Density C lustering Algorithm Based on Cartesian Integral Region PDP-DBSCAN.2.Aiming at the problem of the clustering effect of the algorithm,the strategy of optimizing DBSCAN is proposed in combination with intelligent algorithm.To achieve the data set in each cluster cluster parameters to determine the algorithm to improve the clustering effect.The innovation point is to optimize the clustering effect of DBSCAN adaptive algorithm by using the self-perception and fast convergence characteristics of particle swarm PSO algorithm.The proposed algorithm is based on Map Reduce’s parameter adaptive density clustering algorithm MPSO-DBSCAN.3.Aiming at the problem that DBSCAN storage cost is too large and the running efficiency is too low,the DBSCAN clustering algorithm based on support vector machine incremental learning is used to further optimize the processing ability of DBSCAN to data clustering by using the strategy of supervising and learning the data.The The storage cost of the algorithm is further reduced,and the operating efficiency is further improved.The innovation point of this paper is to improve the clustering mechanism of DBSCAN by means of support vector machine incremental learning,and propose an algorithm: DBSCAN clustering algorithm based on support vector machine incremental learning.Using UCI included in the test classification,clustering of the sample data,verify the proposed algorithm processing speed,clustering effect,algorithm scalability and the advantages and disadvantages of the speed ratio.DBSC AN in the operational efficiency and clustering effect of the representative of the algorithm are: M R-DBSCAN,SAVDBSCAN.The experimental results show that PDP-DBSC AN is less efficient than MR-DBSCAN in data clustering with simple data structure,but PDP-DBSCAN is much better than MR-DBSCAN for MPO-DBSCAN.The DBSCAN clustering algorithm based on support vector machine incremental learning can save the storage cost compared with DBSCAN,and can improve the running efficiency of the algorithm.

Keywords/Search Tags:

Data mining, Clustering analysis, Big Data, PSO

Related items

1	Data Mining Based Research On Analysis To The Data Of The College Teaching
2	The Research Of Real-time Data Analysis Based On Data Mining
3	Cluster Analysis In Applied Research, Scientific Data Mining
4	Preference Data Mining And Its Application Based On Big Data
5	Research On Classification Of Colleges In Our Country Based On Clustering Technology Of Data Mining
6	Study On Space Partitioning-based Optimized Clustering Algorithms And Related Techniques
7	Analysis Of Telecommunication Data On The Basis Of Data Mining
8	Researches On Application Of Data Mining Technology In The Individual Network Teaching Platform
9	Research And Implementation Of Clustering Algorithm For Multidimensional Data Sets
10	Analysis And Application Of The Lending Data On The Universities Library Based On Data-Mining