Font Size: a A A

Research On Dimensionality Reduction In Medical Big Data

Posted on:2021-03-03Degree:MasterType:Thesis
Country:ChinaCandidate:A N YuFull Text:PDF
GTID:2504306512987599Subject:Computer technology
Abstract/Summary:PDF Full Text Request
As the entrance of big data era,gigantic volumes of data are generated at an unprecedented rate.Such data has not only a huge sample size,but a considerable feature size as well.There is no exception in medical field,and examples in point are microarray data which contains thousands of genetic probes and high-resolution medical image data,including X-rays and MRI images.The aforementioned high-dimensional data inevitably contains redundant features,which poses severe challenges to the learning of classification and clustering algorithms.Therefore,this dissertation focuses on the main topic of "dimensionality reduction in medical big data",and mainly proposes the following three innovative methods for feature selection and extraction according to different classification or clustering tasks:(1)A supervised global mutual information based feature selection method.The previous methods are based on maximizing the mutual information between features and labels,and search in a heuristic and greedy manner.The results of feature selection are easily affected by the former selected features.We model the goal of relevance-maximization as a quadratic programming problem,and consider the redundancy between features in the meantime,in order to find a globally optimal feature subset.(2)A supervised dimensionality reduction method based on l2,1-norm.Considering that linear discriminant analysis can effectively reveal the global intra-class and inter-class discriminant information,and the Laplacian matrix can reflect the local"smoothness" of the data samples.We combine the above two concept into the same framework,using an l2,1-norm to ensure the row sparsity of feature selection.The objective is to find a low-dimensional linear transformation such that the global discriminative information is best extracted and the local geometry structure is optimally preserved.(3)An adaptive unsupervised feature selection method.Instead of constructing a similarity matrix using K nearest neighbors and RBF kernel functions just as the same as previous works,we allocate the adaptive neighbor nodes for each data point according to the local distance.Then feature selection is embedded into the clustering process to ensure that the data manifold structure are well preserved.Sparse learning is utilized to ensure the efficiency of the feature selection algorithm.
Keywords/Search Tags:Feature Selection and Extraction, Information Theory, Linear Discriminative Analysis(LDA), Spectral Graph
PDF Full Text Request
Related items