Font Size: a A A

Research On Unsupervised High Dimensional Data Mining Via Sparse Representation

Posted on:2016-12-20Degree:DoctorType:Dissertation
Country:ChinaCandidate:X D FengFull Text:PDF
GTID:1228330470459073Subject:Management Science and Engineering
Abstract/Summary:PDF Full Text Request
Data mining is one of the important and key tools for support management and decision making, and the data to deal with is becoming mainly highly-dimensional and unlabeled with the broadening application, named high-dimensional unlabeled data. The corresponding data mining techniques can be called unsupervised high dimensional data mining, among which missing value imputation, data learning and modeling and large-scale or online data learning and modeling are the mainly problems.Thus, this dissertation will be focused on high dimension unlabeled data and corresponding methods about unsupervised high dimensional data mining, which includes following three points.(1) An unsupervised missing value imputation based on novel Locality Constrained Sparse Representation (LCSR) is presented to deal with missing value issue in high-dimensional unlabeled data. Firstly the optimization objective of LCSR is proposed, which is capable of automatically selecting instance, preserving locality structure and avoiding overfitting by introducing locality I1-norm and I2-norm regularization. Then LCSR-based missing value estimation imputes the unobserved values through the linear combination of automatically selected atoms, while three dictionary constructions are also developed respectively. At last, the performance of proposed novel missing value estimation method is evaluated on real gene expression and image databases, compared with other instance-based missing value estimation methods.(2) An unsupervised graph-based learning based on novel NEighborhood Weighted Sparse Representation (NESR) is presented for high-dimensional unlabeled data learning and modeling. Firstly the optimization objective and optimization method of NESR are proposed for constructing unsupervised graph NESR-Graph, which incorporates sparse learning and neighborhood distance weighting to achieve sparse graph and preserve the locality structure in original high-dimensional space. It can also significantly reduce the computation time of other sparse representation-based graph constructions. Then it is integrated into several graph-based high dimensional data mining tasks, including spectral clustering, subspace learning and label propagation. At last, we perform experimental studies on various benchmark datasets to demonstrate the advantage of proposed NESR-Graph on both effective and efficiency.(3) An unsupervised dictionary learning based on novel Multiple Hypergraph Consistent Sparse Coding (MultiHC_SC) is presented for large-scale or high-dimensional unlabeled data learning and modeling with sparse representation. The optimization objective and alternative optimization of MultiHC_SC are proposed for unsupervised dictionary learning. The MultiHC_SC firstly exploits hypergraph model and hypergraph laplacian regularization to well capture the high-order manifold structure of high dimensional data. Hypergrpah weight or incidence matrix is extended to make the sparse coding more discriminative, and multiple ensemble hypergraph regularization terms are integrated into the objective to automatically select the optimal hypergraph. The improved performance of static image clustering and online image clustering on real image datasets validates the advantage of the proposed unsupervised dictionary learning method MultiHC SC.
Keywords/Search Tags:Unsupervised High Dimensional Data Mining, SparseRepresentation, Missing Value Imputation, Graph-basedLearning, Dictionary Learning
PDF Full Text Request
Related items