Research Of Cluster Boundary Detection Technology On Mixed Attribute Data

Posted on:2016-06-19

Degree:Master

Type:Thesis

Country:China

Candidate:P Geng

Full Text:PDF

GTID:2308330461451493

Subject:Computer software and theory

Abstract/Summary:

PDF Full Text Request

With the rapid development of information technology, clustering is a very active research direction in the field of data mining, and it has been widely applied in image processing, information retrieval, meteorology, financial and other fields. But the boundary points of clusters are located at the edge of the clusters, and the right ownerships of boundary points directly affect the precision of clustering. At the same time, boundary points also have the characteristics of multiple clusters. In recent years, cluster boundary detection also has become an active research direction in clustering. In reality, compared with the numerical attribute data and categorical attribute data, the mixed attribute data has more extensive sources, but the cluster boundary detection on mixed attribute data is still a blank. Therefore, in order to meet the need of extracting cluster boundary on mixed attribute data, the related research and application have been worded in this thesis.Firstly, in order to solve the problem of the cluster boundary detection on mixed attribute data, a cluster boundary detection algorithm for mixed attribute data, named BERGE(Cluster boundary detection technology for mixed attribute data set), is proposed in this thesis. The algorithm is based on a kind of effective measurement method to deal with the mixed attribute data. Firstly, the distances and memberships from data objects to the clusters centroid are calculated on mixed attribute data. And then, according to the distances and memberships, the boundary factor is defined to obtain the candidate boundary set of data set. Finally, based on the idea of evidence accumulation, the cluster boundary points are extracted from the candidate boundary set. The experimental results on UCI data sets and real data sets show that the BERGE algorithm can effectively obtain the cluster boundary of the mixed attribute data, numerical attribute data and categorical attribute data. The algorithm has high detection precision, and has a certain inhibitory effect on the noise, etc.Secondly, aiming at solving the problem of how to extract the boundary of a specified cluster or several specified clusters on mixed attribute data, a cluster boundary detection algorithm on mixed attribute data based on shadowed set, named CHASM(A cluster boundary detection algorithm base on shadowed set), is proposed in this thesis. The algorithm uses the shadowed set to measure the fuzziness. According to the structure of cluster, a new optimization objective function is defined to divide the mixed attribute data into core, exclusion and shadow three sets in any cluster. Then, according to the variance of contribution degree from the three sets to the clusters centroid, the distances and memberships from data objects to the clusters centroid are calculated to update the centroid information of clusters. When the execution of the algorithm is converged, the algorithm extracts the shadow set of each cluster as the boundary set of the whole data set. The algorithm can effectively extract the cluster boundary set of mixed attribute data, numerical attribute data and categorical attribute data, and also can obtain the boundary set of specified cluster of the data set.Finally, based on the need of extracting the cluster boundary on medical mixed attribute data, a medical data clustering analysis platform software, named MDAP(Medical data analysis platform), is proposed in this thesis. The software adopts the design thought of object-oriented, and it is mainly divided into 9 modules(central control module, data type conversion module, data format conversion module, data input and output module, data display module, data preprocessing module, clustering analysis module, cluster boundary detection module, parameter setting module). Among them, the software mainly implements the 5 kinds of classical clustering methods and the 11 kinds of cluster boundary detection algorithms, and mainly provides the functions of data preprocessing, clustering analysis and cluster boundary detection for mixed attribute data, numerical attribute data and categorical attribute data. The software adopts the incremental development model and the design of the factory pattern. These greatly improve the flexibility and extensibility of the software and conveniently add algorithms or modules in the future.

Keywords/Search Tags:

cluster boundary detection, mixed, attribute, data, boundary, factor, evidence accumulation, shadowed set, fuzzy clustering

PDF Full Text Request

Related items

1	The Study Of Boundary Detecting Algorithm Based On A2-MST And Ensemble Boundary
2	Study Of Clustering Technology Based On Boundary Model
3	A Clustering Method Based On Density Estimation And Cluster Boundary Detection
4	Research And Application Of Rough Clustering Methods Of Mixed Attribute Data With Self-adaptive Cluster Adjustment
5	Research On Cluster Boundary Detecting Technology For Categorical Data
6	The Research Of Nonparametric Clustering Boundary Detection Algorithm
7	Based On A Grid Of Dbscan And Cluster Boundary Technology
8	Study On Discontinuity And Threshold In Shot Boundary Detection Problem
9	Research On Fuzzy Clustering Algorithms Based On Shadowed Sets And Rough Sets And Their Applications
10	Research On Mixed Attribute Clustering Technology Based On Cluster Center Selection Strategy