Font Size: a A A

Research On High Dimensional Data Clustering And Application

Posted on:2017-09-10Degree:MasterType:Thesis
Country:ChinaCandidate:M J LiFull Text:PDF
GTID:2322330503995787Subject:Safety science and engineering
Abstract/Summary:PDF Full Text Request
Along with the rapid development of the shipping industry,the scale and quantity of ships is constantly expanding, the fault diagnosis system is getting more and more attention as an important guarantee for navigation safety. Because of the variety and complexity of ship equipment parameters, the large amount of high dimensional data which is produced during the process of the system brings a new challenge for ship data processing performance in fault diagnosis module. How to achieve efficient management and analysis of high dimensional data becomes a hot research spot in ship fault diagnosis. In the background of the ship management system of maritime bureau, this paper focused on the technology of clustering high dimensional data, the main works as follow.Firstly,the research situation of the traditional clustering methods and the clustering methods on high dimensional data is reviewed. A clustering framework based on high dimensional data is designed. Then, every part of the framework is described fully. According to high dimensional data in the process of fault diagnosis, a clustering algorithm for high dimensional data based on orthogonal non-negative matrix factorization and an ensemble clustering algorithm based on the completion of similarity matrix. The improved K-means clustering algorithm based on basic theory of non-negative matrix factorization is proposed to reduce the dimensionality of high-dimensional data. Firstly, the algorithm is processed by orthogonal non-negative matrix factorization, adds orthogonal restraint to data prototype matrix from factorization with improved Gram-Schmidt and Householder orthogonalization separately, which both ensure non-negative of low-dimensional feature and enhances the orthogonality of matrix then make K-means clustering to verify the effectiveness of the algorithm. For the problems of lots of noise and redundant features on the high dimensional data, an ensemble clustering algorithm by similarity matrix completion is proposed. Firstly, we apply Hsim function to measure the similarity between the sample points to construct similarity matrix of each base clustering, and then the augmented Lagrangian multiplier method is used to complete the missing elements for similarity matrix. The final data partition is obtained by using the spectral clustering with superior performance. The achievement of the paper is primarily applied to the fault diagnosis module of the ship management system. Based on the high dimensional data clustering algorithm, the fault diagnosis module can process original high dimensional data intelligently and diagnose the fault of ship's running state accurately. The preliminary application shows that the system achieves a good running effect.
Keywords/Search Tags:High dimensional data, Non-negative matrix factorization, Ensemble clustering, Spectral Clustering, Matrix Completion, Ship Fault Diagnosis
PDF Full Text Request
Related items