Font Size: a A A

A Distance And Density-based Clustering Algorithm Using Automatic Peak Detection

Posted on:2018-04-10Degree:MasterType:Thesis
Country:ChinaCandidate:Y LiuFull Text:PDF
GTID:2348330542981379Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the High-speed development of Intelligent Information Technology which are called as The fourth technological revolution represented by computer technology and network communication technology,human society has entered information-based and Digital age.With the upcoming era of Big Data,how to effectively analyze these massive amount of big data and analyze and collect useful information has become one of the research hotspots and difficulties.Data mining and machine learning algorithms are widely used to analyze and processe the volumes of data,and to make people effectively obtaining the useful and compact information and knowledge,which becomes one of the most important means for people to make sense of big data for interesting information and decision-making.As an unsupervised machine learning algorithm,clustering analysis is frequently applied in marketing data analysis,computer vision,bioinformatics and so on,owning to its unique ability to automatically partition the data into cluster of similar samples.It can be used as the tool of data mining to discover previously unknown groups within the data.With the High-speed development of information acquisition and storage techniques,there are lots of massive data set with irregular data shape and distribution.Existing clustering algorithms are inefficient to cope with such data and have poor performance.Data mining,which could analyze and processe the volumes of data,and could make people effectively obtaining the useful and compact information and knowledge,is becoming one of the most advanced and active research topics in the field of information decision-making.Since the use of clustering algorithms can lead to the discovery of previously unknown groups within the data,cluster analysis has been widely used in many applications such as business intelligence,bioinformatics,image pattern recognition and so on.With the High-speed development of Intelligent Information Technology.Scope of database application usage,there are lots of massive data sets.In addition,with the improved data sets acquisition technologies and the improved More and more individuals and units begin to notice the convenience by using analyze of the Big data.Efforts have focused on finding efficient and effective clustering algorithms,and factors such as scalability,noisy data,clusters of arbitrary shapes challenge the existing algorithms.Most clustering algorithms can achieve good performance on low-dimensional data,but will occur “curse of dimensionality” when dealing with high-dimensional data.So one of the important problem is clustering high dimensional datasets.First of all,this thesis introduces the research background and the current development of clustering algorithms.Then introduces some traditional clustering algorithms and summarize the advantages and disadvantages of these algorithms.Secondly,the thesis point out many traditional clustering algorithms have troubles when distribution of objects in the dataset varies and introduces a new clustering algorithm PB-AUTO.The algorithm is improved based on the clustering algorithm CFSFDP which was published in June of 2014 Science.However,this algorithm does not work well on high dimensional data sets,since the threshold of cluster centers has been defined ambiguously and hence it has to be decided visually and manually.In this thesis,an alternative definition of the indicators is introduced and the threshold of cluster centers is automatically decided by using an improved Canopy algorithm.With fixed centers(each represents a cluster),each remaining data object is assigned to a cluster dependently in a single step.Finally,we evaluate our algorithm by the experiment using the classic data sets and KDD CUP 99 high dimensional data.which proves the practicability and progressiveness of PB-AUTO algorithm in intrusion detection system.
Keywords/Search Tags:Clustering algorithm, PD-auto algorithm, High-dimensional data, Intrusion detection system
PDF Full Text Request
Related items