Classification of High-dimensional Data Based on Multiple Testing Method

Posted on:2019-05-08

Degree:Ph.D

Type:Dissertation

University:University of South Carolina

Candidate:Ma, Chong

Full Text:PDF

GTID:1478390017985044

Subject:Statistics

Abstract/Summary:

Supervised and unsupervised classification are common topics in machine learning in both scientific and industrial fields, which usually involve three tasks: prediction, exploration, and explanation. False discovery rate (FDR) theory has a close connection to classical classification theory, which must be employed in a sophisticated way to achieve good performance in various contexts. The study aims to explore novel supervised classifiers and unsupervised classification approaches for functional data and high-dimensional data in genome study by using FDR, respectively. One work develops a novel classifier for functional data by casting the classification problem into a multiple testing task, which involves using statistical depth functions. The other two works essentially deal with p-values or tail-areas by using FDR in the large scale testing problem. One work proposes a novel algorithm to yield reproducible differential expression analysis for microarray and RNA-Seq data. The proposed algorithm combines the cross-validation type subsampling and false discovery rate, where the p-values obtained from the training data are used to fit a mixture of baseline and signal distributions by using the EM algorithm, which is in turn used to screen the significance for the p-values obtained from the testing data. Another work proposes a novel weighted p-value approach to explore the association between microRNAs and COPD emphysema severity by regulating the mRNA expressions, while integrating patient phenotype information. This proposed method can be applied to study the causality between miRNA and any particular disease, by exploring the precise role of miRNA in regulating genes.

Keywords/Search Tags:

Classification, Data, Testing

Related items

1	Research On Improved Adaptive Random Testing Based On Data Similarity Classification
2	Classification by active testing with applications to imaging and change detection
3	Automatic Testing System For Education Software
4	A systematic approach for the classification of age-related muscle loss and elderly obesity using field-based testing methods and isoperformance curves
5	Testing Design And Result Analysis For Welfare Lottery Online System
6	Research Of Test Data Generation Based On Evolutionary Testing
7	Research Of Assistantly Constructing Testing Programs For Data Structure Algorithm Design Assignment
8	Research On And Implementation Of One Software Automatically Testing Tool
9	Research On Coal Quality Index Testing Method Based On Data Analyses Of Coal Images
10	Design And Implementation Of Universal Data Testing Platform Based On LAMP Framework