Font Size: a A A

Research And Application Of Incremental Clustering Algorithm Based On Auto-Encoder

Posted on:2017-01-27Degree:MasterType:Thesis
Country:ChinaCandidate:Z N YangFull Text:PDF
GTID:2348330488959961Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the development of sensing technologies and wireless communications, data are continuous generated and accumulated rapidly. The real-time processing of dynamic data and analysis of availability have captured widespread attention. How to incremental clustering on dynamic data sets and impute the incomplete data efficiently to improve the availability of data sets have become a hot topic of academic research.However, most of the existing incremental clustering algorithms do not learn the main features of the data sets, cannot achieve good performance on data sets of high-dimensional. And most of the existing incomplete data imputation algorithms do not consider the local similarity between these samples which cannot guarantee the accuracy of imputation. Aiming at these problems, this paper proposes an incremental clustering algorithm based on auto-encoder, incremental clustering on dynamic data by learning the main features of data sets. Then, based on this algorithm, this paper take use of the idea:filled by similarly of local data to fill the incomplete data by weighted value of other complete data in each class. The specific work is as follows:(1) An incremental clustering algorithm based on auto-encoder. Firstly, the auto-encoder is used to learn the main features of the data sets, get representation of new feature space from the raw data. To read the data set once and run incremental clustering on the new data sets base on the original clustering results.(2) An incomplete data imputation algorithm based on incremental clustering. Firstly, filling the missing features of incomplete data sets with special values to get the initial complete data set, then taking use of the incremental clustering algorithm based on auto-encoder to learn the main features of the data sets and fast clustering on the data sets to get clustering results. During the last phase, the top k% nearest-neighbors hybrid distance weighted imputation is approached to fill in missing values in clusters.Experimental results show that the proposed incremental clustering algorithm based on auto-encoder can achieve good performance on dynamic data sets effectively by adjust the structure of clusters dynamically. Then, the proposed incomplete data imputation algorithm based on incremental clustering algorithm can impute the missing features effectively and efficiently which can achieve good time performance. Moreover, these two algorithms are suitable for distributed data processing frameworks which have good expansibility.
Keywords/Search Tags:Incremental Clustering, Incomplete Data, Data Imputation, Auto-Encoder
PDF Full Text Request
Related items