Font Size: a A A

Design And Implementation Of Initial Cluster Center Selection Algorithm For Categorical Matrix-object Data

Posted on:2021-05-25Degree:MasterType:Thesis
Country:ChinaCandidate:L TianFull Text:PDF
GTID:2428330626955497Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Clustering analysis,as an important technical tool for the research and application of big data,provides theoretical support for applied research in the information industry,such as communications,banking,insurance,and major e-commerce fields.The performance of the partition-based clustering algorithm depends largely on the selection of the initial cluster centers.Currently,a number of initial cluster center selection algorithms are designed for ordinary data sets,in which each object is represented by one feature vector.However,in many practical applications,one object is often described by more than one feature vector.In this thesis,we call an object described by more than one feature vector as a matrix object,a data set composed of matrix objects as a matrix-object data set.At present,there is no effective algorithm for selecting initial cluster centers for matrix-object data set.If using existing algorithms to process the matrix-object data set,the data should be compressed and transformed,which usually loses a great deal of information and cannot fully reflect the user's actual behavior characteristics.Therefore,we make explorations and research on the initial cluster center selection for categorical matrix-object data and propose new algorithms.And we compare our methods with some state-of-the-art methods in experiments.The main works of this thesis are as follows:(1)We propose an initial cluster center selection algorithm based on density and distance.In this algorithm,we define the density of the matrix-object and the distance between two matrix-objects according to the frequency of attribute values for categorical data,and extend the Max-Min algorithm to achieve the selection of the initial cluster centers.(2)We propose an initial cluster center selection algorithm based on density and pairwise constraint.In this algorithm,according to the frequency of attribute values and the average distance between two matrix-objects,a new definition of the density of matrix-objects is given.Besides,the pairwise constraint information is used to guide the selection of initial cluster centers,and the principle of label consistency is further combined in the clustering process.This algorithm solves the problem that high-density points may be located at the boundary of the cluster and points with long distances selected as cluster centers may be isolated points.And this algorithm is suitable for large-scale and high-dimensional data sets.(3)We design and implement an initial cluster center selection system based on MATLAB,which consists of data loading,parameter setting,data mining,data analysis and graphic visualization etc.We use GUI technology ensures that the system has good portability and interactivity.The research results in this thesis provide new methods and ideas for the initial cluster center selection of matrix-object data,and further enrich the research of categorical matrix-object data.To some extent,it has the theoretical and application value in the real life.Also it is believed that the research of matrix-object data will become a hot trend in the future and can solve more practical problems.
Keywords/Search Tags:categorical matrix-object data, initial cluster center, density parameter, distance learning, pairwise constraint
PDF Full Text Request
Related items