Font Size: a A A

Comprehensive Assessment And Determination Of Sample Size For Omics Study And Web-based Tool Development

Posted on:2020-08-23Degree:MasterType:Thesis
Country:ChinaCandidate:X Y ZhangFull Text:PDF
GTID:2370330599953142Subject:Pharmacy
Abstract/Summary:PDF Full Text Request
Through the application of high-throughput techniques,omics study can measure the expression levels of thousands of variables simultaneously.Although massive successes have been achieved in biomedical and other fileds,there are still some problems severely hampering the rapid development of omics study,such as the low statistical power,the lack of reproducibility of markers and the lack of consistency in generating classifiers.These problems have received great attention and were caused by a variety of factors.And among these reasons,sample size was suggested critical.In the design of a specific omics study,sample size is an essential consideration that represents the balance among investigation ability,financial requirements and ethical issues.An undersized study means a study with insufficient sensitivity that an interesting feature may be missed,resulting in a waste of time and money.Alternatively,when the study is oversized,meaning too many subjects are submitted,the overall costs are increased,especially if it is a potentially harmful or invasive study,there will raise ethical questions.Thus,sample size assessment and determination are needed to be considered early in the preparation of a study.Statistical Power,Classification Accuracy and Robustness are indexes for sample size assessment and determination from different aspects.Statistical Power is the probability that correctly indicates a significant difference when the null hypothesis is false.Classification Accuracy is the prediction accuracy of a classifier,which could predict the class of a test sample,based on marker lists.It is evaluated by the measurement of AUC and ACC.The Robustness is the reproducibility of the results,which is assessed by Overlap,Concordance and Weighted Consistency.Considering the complementary nature of these index-types that is useful to assess and determine sample size comprehensively,we conducted the following work:Firstly,the comparative study of index values of different datasets under the same sample size shows that in the assessment process,different datasets are independent from each other.And the conclusion can be further proved by comparing the sample sizes needed to reach the threshold value of each index for different datasets.So,we think that it is impossible to find out a figure which can be used to generalize the sample size needed for omics study and it is necessary to carry out specific research for different datasets.Then,the sample sizes required for each index were compared and the results indicated that there is no correlation between the three indexes and they are independent from each other.At the same time,we claculated the required sample sizes of 18 sets of data and sorted the results by size,we found out that there exists no law in the order.It is impossible to simply treat an index as the loosest or the strictest one.Therefore,in the assessment and determination of sample size,we suggested a multi-indexes and comprehensive research.Finally,an online tool,SSizer,was established based on the R pacakage ‘shiny',which can be used for sample size assessment and determination comprehensively.To meet the needs of different research,SSizer integrates three indexes(Statistical Power,Classification Accuracy and Robustness)of a total of six criteria(Power,AUC,ACC,Overlap,Concordance,Wighted Consistency),as well as a variety of data preprocessing,analysis algorithms.Meanwhile,through the application of an accurate data simulation algorithm,SSizer can determine the required sample for a specific study based on the above criteria.To sum up,through the application of three popular indexes,we conducted a comprehensive assessment and determination of sample size for omics study.And based on this study,an online tool was developed to help scientists find biological problems from the research and promote the further development in the field of life science and biomedical medicine.
Keywords/Search Tags:sample size, statistical power, classification accuracy, robustness, Web-based Tool
PDF Full Text Request
Related items