Font Size: a A A

Self - Organizing Incremental Clustering Algorithm Based On Local Distribution

Posted on:2013-06-14Degree:MasterType:Thesis
Country:ChinaCandidate:Y Q B OuFull Text:PDF
GTID:2208330434975742Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
In recent years, data mining has raised a great concern of the information industry and whole society. The main reason is that data mining can transform the mass of data into useful information and knowledge. Cluster analysis is an important technology of data mining, which extracts useful information and knowledge from large amounts of data in a unsupervised way. However, with the explosive growth of mass data in information age, data storage and information retrieval face enormous challenges. Specially in the applications of lifelong learning, the complex non-stationary data environment leave cluster analysis into a corner.In this paper, a self-organizing incremental neural network based on local distribution (Local-SOINN) is proposed. Compared to other clustering algorithms, it has the following advantages:●automatically reporting a suitable number of clusters without any priori knowledge;●clustering in an incremental and online way;●preserving the topological structure of the input space;●finding arbitrary shaped cluster;●robust to noise-polluted inputs.In some sense, our algorithm solves the problem of space storage problems for it works in a ’one-pass-throw’way. Local-SOINN learns new samples incrementally, which means it can adapt to new information without corrupting or forgetting previously learned information. So our model is qualified for the clustering task in the unstable environment. In addition, our model can describe the topology of the original data, this speciality can be widely used in data compression and data visualization.In general, the proposed method represents data by means of neurons arranged on a topology map. The local distribution is stored in neurons, while the global topology information is pre-serving in the relationship between adjacent neurons. With a self-adapting threshold strategy and iteratively learning for information of local distribution, the algorithm is operated in an incremental and on-line way. Further more, combination of similar ellipsoids and denoising based on density are added in to get a more concise and less ambiguous topology structure.Considering the neurons as PCA units, Local-SOINN is an implementation of Local-PCA. In another perspective, the adopted metric is an improved Mahalanobis distance which considers the local distribution and implies the anisotropy on different vector basis. Hence it can be interpreted as an incremental version of Gaussian mixture model. These analysis establish a solid statistical theory foundation for Local-SOINN.
Keywords/Search Tags:self-organizing, covariance matrix, Mahalanobis distance, incrementallearning, on-line learning
PDF Full Text Request
Related items