Font Size: a A A

Research On Clustering Ensemble Of Mixed Data And Clustering Algorithm Of Mixed Data Streams

Posted on:2015-08-16Degree:MasterType:Thesis
Country:ChinaCandidate:Z YuFull Text:PDF
GTID:2298330467954954Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Currently, we are living in an era when data is growing explosively, a veritable era of big data. While business transaction records and social network information generate huge amounts of data every day, the utilization of these data is quite low. Digging out useful information from these data is significant, for it will guide our production and life, and accelerate the progress of modern society.The data of the real world is always complex mixed data instead of single type, therefore, the research into the mixed attribute data has increased rapidly in recent years. The non-stop data growth is unlimited, which leads to the generation of the data stream and the consequent study of mixed attribute data stream clustering. Research on the mixed attribute data and mixed data stream started a little bit late, so there is great potential in the study of clustering ensemble of the mixed data and clustering data streams with mixed attributes.In order to solve the problems mentioned above, the main work and the achievements of this thesis are as follows:1. The thesis starts with the introduction of the concepts of data mining, its task and some related techniques, and proceeds with the specific focus of this thesis---Clustering. An overview of clustering is introduced, followed by the definitions, mathematical models and some basic clustering algorithms. Then some relevant work of clustering the data of mixed attributes is introduced.2. In the study of clustering algorithms which handle data with mixed attributes, a new clustering ensemble algorithm is proposed in this thesis, for the existing algorithms are difficult to balance the numeric attributes and categorical attributes simultaneously. An improved relative density clustering algorithm is used to handle the numerical attributes, and an algorithm based on Distance-entropy is proposed to handle the categorical attributes because of the objectivity of the information entropy. The computational complexity of most clustering ensemble algorithms is quite high. To address this problem, a new clustering ensemble algorithm based on intersection is proposed. It improves the clustering fusion rules, sets up the ratio of intersection elements---θ to guide the mergence and trim of classes. The computing of9is simple and effective.3. Finally, as an initialization algorithm, the clustering ensemble algorithm is extended into the data streams. A clustering algorithm based on distance and entropy to handle the data streams with mixed attributes is presented. It has certain effects in improving the accuracy of the clusters and time complexity.
Keywords/Search Tags:mixed data, clustering ensemble, data stream, relative density, informationentropy, data mining
PDF Full Text Request
Related items