Font Size: a A A

Research On Fast Density Clustering Algorithm Based On Nearest Neighbor Query Technology

Posted on:2019-08-10Degree:MasterType:Thesis
Country:ChinaCandidate:S Y TangFull Text:PDF
GTID:2428330566993633Subject:Engineering
Abstract/Summary:PDF Full Text Request
With the rapid development of computer technology and network communication technology,various information and digital technologies have become more and more closely related to all aspects of social life,and have had a tremendous impact on people's production,life,work and thinking,big data era has arrived.In the face of massive data,the important topic at present is how to transform such massive data into valuable information.Machine learning and data mining technologies play an increasingly important role in solving such problems.Clustering analysis,as one of the most important research contents in data mining,has been widely applied in many fields,such as data analysis,image processing,recommended system and so on.Clustering is an unsupervised process,and the purpose of clustering algorithm is to divide the input data sets into several semantically consistent clusters based on some similarity measures.DBSCAN is the most important density-based clustering algorithm.It does not need to specify the number of clusters in advance and can identify complex distributed clusters in data containing noise.Although the DBSCAN algorithm has many advantages,the algorithm is faced with high-dimensional massive data that cannot be processed by the algorithm.The basic reason is that when DBSCAN judges whether each point is a core point,it needs to do a near neighbor search for each data point,and there is a lot of redundant calculations,and the algorithm complexity isO(n~2),making DBSCAN unable to handle large scale data.In this paper,on the basis of previous work,a density clustering algorithm based on neighbor query(NQ-DBSCAN)is proposed,which effectively improves the performance of DBSCAN algorithm.The main content of this thesis can be summarized as follows:(1)Firstly,the advantages and disadvantages of DBSCAN are deeply studied.It is found that there is a large amount of redundant calculations in the neighbor search of DBSCAN,which causes the complexity of the algorithm to be too high to handle large-scale data..(2)Secondly,an in-depth analysis of improved ideas and effects of DBSCAN's improved algorithm,such as IDBSCAN,FDBSCAN,LSH-DBSCAN,STDBSCAN,Fast-DBSCAN,and?-Approximate DBSCAN.The performance of these algorithms is not good enough when processing relatively high-dimensional data.(3)For the problem of slow DBSCAN clustering speed,an improved algorithm NQ-DBSCAN is proposed in this thesis.For the problem of slow speed in DBSCAN,this thesis proposes an improved NQ-DBSCAN algorithm,in which the idea of neighbor query is used to filter a portion of the data points and make these poiots as non-core point,which eliminate a lot of density calculation and speed up the clustering speed.The upper and lower bounds of the adjacent search are determined theoretically to ensure the consistency of the NQ-DBSCAN and DBSCAN clustering results.Experiments on artificial data sets and real datasets show that the efficiency of NQ-DBSCAN algorithm has been greatly improved,especially for high-dimensional data,which does not deteriorate with the increase of dimensions,and has good adaptability to data containing noise.
Keywords/Search Tags:Data mining, Clustering algorithm, DBSCAN, Neighbor query
PDF Full Text Request
Related items