Font Size: a A A

Researches Of Statistical Methods For Detecting Differentially Expressed Proteins Based On Mass-Spectrometry

Posted on:2017-03-20Degree:MasterType:Thesis
Country:ChinaCandidate:X WangFull Text:PDF
GTID:2180330482978526Subject:Mathematics
Abstract/Summary:PDF Full Text Request
Detecting differentially expressed proteins accurately and efficiently is the prior condition of biomarker research which plays an increasingly important role in the early diagnosis of disease, and also is one of the major researches in quantitative proteomics based on mass-spectrometry. Unluckily, three major questions, including hard choice of differential analysis methods, high rate of missing values and low quality of detecting results, have been present in the detecting differentially expressed proteins. In this paper, the following three researches have been finished to solve these problems based on the previous works:The first, some in-depth evaluations of five representative statistical methods for detecting differentially expressed proteins, including Welth test, permutation test, reproducibility-optimized test statistic (ROTs), significance analysis of microarray (SAM) and empirical Bayesian random censoring threshold (EBRCT), have been conducted on two proteomics data, D1 and D2, at the same time, the quality of detecting results were also been reasonable controlled. According to the statistical indicators, including the receiver operating characteristic (ROC) curve, partial area under the curve (pAUC), true positive rate (TPR), false positive rate (FPR), and the false discovery rate (FDR), we knew that methods combined the Classic Statistics School with the Bayesian School and methods learned from other ’-omics’ approaches, might be a good choice for differential analysis.The second, nine datasets with different missing value proportions were simulated from D1 based on the missing features of proteomics data. In these datasets, multivariate imputation by chained equations (MuI) with 57 different imputation times were conducted respectively. Based on the statistical indicators, including absolute Pearson correlation coefficients, averaged mean absolute deviation (MAD) and averaged modified standard deviation (MSD), we knew that the imputation effects had some relations with imputation times and missing value proportions. The more imputation times not means a better imputation effect when with the same proportion, and the best imputation time varies from the missing value proportion.The third, the relationship between four representative imputation methods were discussed, including mean imputation, the imputation based on abundance, k nearest neighbor imputation and MuI-5, and the statistical methods besides the EBRCT method. According to the statistical indicators, including ROC curve, pAUC, f-score and g-score, we considered that researchers should carefully hand the missing values before differential analysis.
Keywords/Search Tags:Mass-Spectrometry, Quantitative Proteomics, Differentially Expressed, Missing Value, Statistical Method
PDF Full Text Request
Related items