Research On Clustering Algorithm Of Mixed Data

Posted on:2016-03-11

Degree:Master

Type:Thesis

Country:China

Candidate:C K Qian

Full Text:PDF

GTID:2308330464469345

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

Along with the rapid development of information technology, we have stepped into an information age. For decades, information generation, information organization and information exchange are undergoing revolutionary change. A lot of data accumulated in every walk of life. Nevertheless, additional value from these data has not grown in step with the expansion of the data scale. Therefore, the most urgent problem for us is to discover knowledge from the massive data. Under the circumstances, data mining has attracted extensive attention worldwide. Clustering, a hot research subject of data mining, has been widely used in real society.Most of clustering algorithms are mainly studied towards the onefold attribute type. However, lots of research shows a large amount of data sets are multifold type. This leads to the failure of traditional clustering which cannot handle mixed type data sets. Hence, how to clustering those mixed type data has been a hot issue in data clustering. This thesis does a further study on clustering data with mixed attributes, and its main work are as follows:1. This thesis introduces the background and state-of-art of data mining, presents its trends, tasks and languages. Then an overview of mixed type data and clustering algorithms are introduced, which focus on similarity measurements and classical algorithms of clustering. At the same time, a survey of mixed type data clustering has also been summarized.2. A new dissimilarity measurement has been raised. Meanwhile, the connectivity of graph is applied into the new clustering algorithm, CADFSC, successfully. CADFSC gets plenty of pre-clusters in using the sensitiveness of K-Prototypes to initial data centers, and then combining or pruning operations will be applied among these pre-clusters. The iteration is stopped when conditions are met. CADFSC has advantages over K-Prototypes and three other clustering algorithms by conducting simulation. At the same time, several parameters in CADFSC are also discussed, and some recommended values about parameters are provided.3. Extends the affinity propagation algorithm to cluster mixed attributes data sets. A new distance formula is been proposed, and apply it to AP clustering algorithm, APDA. There is no virtual cluster centers which will lead to empty clusters in APDA. Meanwhile, this new algorithm considers the whole diversity of data set into distance so that we can get a better clustering result. By computing clustering entropy and algorithm execution time, APDA shows a better performance than other two clustering algorithms.

Keywords/Search Tags:

mixed data, clustering, dimensional frequency, attribute distance, affinity, propagation, data mining

PDF Full Text Request

Related items

1	Research And Application Of New Methods In Symbolic Clustering
2	Data Mining Algorithm Based On Affinity Propagation Clustering Analysis
3	Research On Algorithm Of Graph Mining Based On Attribute Fusion
4	Research And Application Of Several Clustering Algorithms For Mixed Attribute Data
5	Research And Application Of Clustering Analysis Algorithm Based On Mixed Attribute Data
6	Fast Sparse Affinity Propagation Clustering Algorithm For Large-Scale And High-Dimensional Data
7	Research On Affinity Propagation Clustering Algorithm
8	The Study Of Modified Affinity Propagation Clustering And It’s Application
9	Research On Mixed Attribute Clustering Technology Based On Cluster Center Selection Strategy
10	Beyond Affinity Propagation: Message Passing Algorithms for Clustering