Font Size: a A A

Research On Key Technologies Of Collaborative Clustering

Posted on:2020-08-19Degree:MasterType:Thesis
Country:ChinaCandidate:CensusFull Text:PDF
GTID:2428330590496407Subject:Information security
Abstract/Summary:PDF Full Text Request
In this thesis,we present our study of collaborative clustering and relate the concepts to cluster analysis in information security,where we are concerned with security and privacy of data amidst the growing volume and the complexity coupled with the challenges brought by the era of sophisticated attacks.Motivated by the application requirements,we introduce the framework of collaborative clustering capable of modeling large distributed databases and networks for data mining applications in information security.Collaborative clustering fits the requirements of data mining in information security by:(i)guaranteeing privacy through the use of information granules while allowing collaboration using the prototypes and partition matrices,(ii)providing scalability to the algorithms in the face of large data sets with high dimensions and multiple features representing the behavior of monitored objects which in turn,not only increases the complexity of the problem of learning normal behavior,but also can lead to large errors in cluster analysis.However,collaborative clustering methods such as Collaborative Fuzzy clustering,Collaborative Self-organizing Maps,and Collaborative Generative Topological Maps suffer from user input parameters to determine the significance of collaboration information.The parameter is introduced without any guidance on how to pick the values yet it has heavy influence on the quality of the results and consequently cannot be disregarded.We propose a collaborative clustering framework,which uses particle swarm optimization to minimize the entropy of clusters in search for optimal centers of clusters.Furthermore,it uses the particle vector positions update to determine the importance of collaboration information hence lifting off the need for user input parameters.Our framework known as particle subswarms collaborative clustering combines the information from different types of clustering algorithms and thus partially addressing the issue of choosing the correct clustering method and the best parameters to use which in most cases is difficult due to insufficient knowledge about the clustering algorithms and lack of known performance evaluation.The framework also addresses the shortcoming of the state-of-the-art collaborative clustering where initial cluster generation uses only single type of clustering algorithm.Additionally,the capabilities of the framework extend to particle swarm clustering where multiple clustering algorithms can be used in parallel,which increases the number of particles in the swarm without increasing the number of clusters and helps to cope with the problem of local minima.In general,framework of collaborative clustering is well able to solve the problems in information security that arise from large volume of databases and network data.Moreover,data sets with redundant feature information issues are dealt with ease when clustering horizontally such that the data are split along the attributes,the solution is then given by collaborating the individual clustering to produce final solution.Likewise,the vertical collaboration allows the data set to be split along the data objects thus individual clustering deals with small volume of data and the consequent solution is given through collaboration.In simple words,collaborative clustering brings distributed and multi-view clustering with their individual capabilities under one framework to solve problems that rise from large-scale data sets and data sets with redundant feature information.Therefore,this thesis analyzes both theoretically and empirically the strengths and limitations of our proposed semi-stochastic particle subswarms collaborative clustering framework in extending the flexibility,and reliability of earlier approaches to handle large-scale data sets.Accordingly,we validate the experiments using some of the publicly available data sets on the UCI data set repository.
Keywords/Search Tags:Collaborative clustering, Distributed clustering, Multi-view clustering, Particle swarm optimization, Information security, Privacy
PDF Full Text Request
Related items