Research On Improved Classification And Clustering Of Matrix Data

Posted on:2023-05-06

Degree:Doctor

Type:Dissertation

Country:China

Candidate:F Sun

Full Text:PDF

GTID:1528307022953639

Subject:Statistics

Abstract/Summary:

Classification and clustering have always been core research fields of machine learning.The research on vector data(each data point is a vector)has matured.Therefore,when dealing with matrix data(each data point is a matrix),a simple way is to vectorize the matrix data as vectors,and then use the vector-based classification and clustering methods for processing.However,vectorization will destroy the natural matrix structure.Moreover,the vectorized data dimension is usually very high,and it is easy to trap into “the curse of dimensionality”.Therefore,in recent years,scholars have proposed matrix-based methods that directly classify and cluster matrix data without vectorization,which has attracted widespread attention in the academic community.Following this research route,this paper mainly carries out the following original and expansive work for matrix data.The major contributions of this paper are summarized below:1.A new method for Multivariate Time Series(MTS)classification is proposed.Bidirectional Linear Discriminant Analysis(BLDA)method has been proposed for image data classification,which has good recognition performance,but there are three problems:(i)This method is only applied to image data?(ii)BLDA can not be performed when one of the within-class matrices is singular?(iii)the computational burden could be very heavy when one of the data dimensionality is high.For problem(i),this paper proposes to apply BLDA to the MTS classification.MTS is a special kind of matrix data,unlike image data where both rows and columns are variables,the rows of MTS data are variables and the columns are time points.To solve problems(ii)and(iii),a new procedure for BLDA based on pseudo-inverse(PBLDA)and an efficient algorithm for PBLDA is proposed in this paper.2.A new matrix-based clustering method is proposed.(i)Mixtures of probabilistic principal component analyzers(MPPCA)is a widely used vector-based mixture model,but only suitable for vector data.This paper extends MPPCA to matrix data,and has made the further research on the mixtures of bilinear probabilistic principal component analyzers(MBPPCA)model.The main innovation of MBPPCA is that the clustering process and the dimensionality reduction process of matrix data in the row and column directions can be carried out simultaneously,and a flexible covariance structure is provided to avoid the overfitting problem of the full variance mixture model.The experimental results show that the matrix-based method MBPPCA proposed in this paper has better performance than the vector-based method MPPCA on multiple face datasets.(ii)In recent years,deep learning has been widely studied and applied in various fields of classification,but further research is still needed on clustering.This paper combines traditional machine learning methods with deep learning methods,and proposes a new two-stage clustering method based on deep learning,namely PCANet+MBPPCA.In stage 1,PCANet is used to extract the deep features,and in stage 2,MBPPCA is used to cluster the extracted deep features.The main innovation is that the dimension of the vectorized deep features extracted by PCANet is often very high(at least ten thousand),and it becomes impractical to use the traditional vector-based mixed model for clustering.The proposed method utilizes the features of the original matrix structure extracted by PCANet,and uses the matrix-based method MBPPCA for clustering,which not only can solve the problem of the curse of dimensionality,but also can effectively utilize the advantages of deep features.The experimental results show that the PCANet+MBPPCA proposed in this paper has excellent clustering performance compared with related methods.3.A new matrix-based semi-supervised method is proposed: Semi-supervised learning is a natural extension of classification and clustering problems.This paper extends the proposed matrix-based clustering method MBPPCA to semi-supervised learning of matrix data,proposing Semi-MBPPCA(SMBPPCA).The main innovation is that SMBPPCA can extract reliable features by considering both labeled samples and unlabeled samples at the same time.The experimental results on MTS datasets show that SMBPPCA outperforms related semi-supervised learning methods.

Keywords/Search Tags:

matrix data, linear discriminant analysis, finite mixture model, principal component analysis, semi-supervised learning

Related items

1	The Study Of Linear Discriminant Analysis Based On Semi-supervised Class Label
2	Manifold Learning And Semi-supervised Learning With Applications To Feature Extraction
3	Research On Generalized Canonical Correlation Analysis Of Data Dimensionality Reduction
4	Research Of License Plate Location Algorithm Based On Principal Component Analysis And Fisher Linear Discriminant
5	Research On EEG Feature Extraction Based On Multiple Linear Algebra
6	Research On Dimensionality Reduction Of Gene Expression Data Based On Traditional Feature Extraction And Deep Learning
7	Application Of Sparse Linear Discriminant Analysis On Text Classification
8	The Study Of Speaker Recognition Based On Principal Component Analysis And Linear Discriminant Analysis
9	Research On Robust Speaker Recognition Under Noisy Conditions
10	Based On The Matrix Model And The Quantum Mode Feature Extraction And Its Classification