Font Size: a A A

Study On Self-Adaptive Three-way Clustering Algorithm For Mixed-type Data

Posted on:2019-10-27Degree:MasterType:Thesis
Country:ChinaCandidate:Z H ChangFull Text:PDF
GTID:2428330590465721Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Cluster analysis,as one of the most effective unsupervised techniques in data mining,has been widely used in many practical fields,such as education,commerce,agriculture,and so on.In the traditional clustering algorithms,a cluster is represented by a single set,which cannot express the uncertainty relationship between objects and clusters effectively.Thus,three-way clustering is introduced,in which a cluster is represented by two sets,and the space is divided into three regions.Objects in the core region or trivial region definitely belong or not belong to the clusters,and objects in the fringe region might or might not belong to the clusters.The existing three-way approaches usually need an appropriate evaluation function and corresponding thresholds to obtain the three-way results.But there is no efficient method to set the thresholds.And what's more,there are a lot of mixed-type data in production practice.Thus,this thesis mainly studies on self-adaptive three-way clustering algorithm for mixed-type data.Aiming at the problem of thresholds in three-way clustering,the thesis proposes a selfadaptive three-way clustering algorithm based on gravitational search,which is inspired by the law of gravitation in physics.Based on the distribution of the local mass of objects in the two-way clusters,the proposed algorithm utilizes the gravitation formula as the evaluation function,and assigns the undecided objects to the core regions,fringe regions or trivial regions of corresponding neighboring clusters according to the size of gravity.During clustering process,the three-way thresholds are updated adaptively for every undecided object.Through experimental analysis,at the same time to guaranteeing the effect of clustering,the proposed method can preserve the basic shape information of the two-way clusters,and solve the overlapping clustering problem effectively.Meanwhile,in order to guarantee the completeness of this paper,the thesis also devises a novel two-way clustering algorithm to acquire the two-way clusters and discover undecided objects based on the clustering by fast search and find of density peaks algorithm.To solve the problem of the measure of distance between mixed-type data,this thesis also proposes a new measurement based on a weighted tree structure,which can reduce the loss of information of attribute value in the process of similarity measurement.The measurement considers the semantic of attributes,the number of attribute values and the occurrence frequency of attribute values.Meanwhile,an adaptive three-way clustering algorithm for mixed-type data is proposed.Several groups of experiments on real data sets show the rationality and effectiveness of the algorithm.
Keywords/Search Tags:Uncertainty, Three-way Clustering, Self-adaptive, Mixed-type Data
PDF Full Text Request
Related items