Font Size: a A A

Software Comparison And Data Miming For Copy Number Variation And Loss Of Hererozygosity In Whole-genome SNP Arrays

Posted on:2012-05-22Degree:DoctorType:Dissertation
Country:ChinaCandidate:X ZhangFull Text:PDF
GTID:1108330464460913Subject:Bioinformatics
Abstract/Summary:PDF Full Text Request
Copy number variations (CNVs) refer to a deletion, duplication or complicated multi-locus variations of the sub-microscopic genomic fragment ranged between 1KB and several Mb which presented extensively in human genomes. CNVs can induce a disease through the gene disruption and a modification to the gene dosage, which lead to gene expression, phenotype difference and phenotypic adaptation. Therefore, the mutation of CNVs is likely to be a major cause for common diseases and rare congenital defects.There have been already many mature and widely used software for CNV analysis to retrieve CNV raw data (CNV and LOH, etc.). However, a gold standard is still absent, i.e. no considerable evaluations of statistics and tests have been conducted to compare the algorithms on multiple platforms to confirm the merits of each. In order to validate the effectiveness of various software for CNV calling and analysis, PennCNV, CNAG, Birdsuite and dChip were run to analyze the sample sets and generate CNV data. All the results were compared and imported to the database constructed in advance. Comparison result categorization for data from two arrays(CGH and SNP6.0). The data were extracted based on the quantity of the detected CNVs and compared between each group (Match&Overlap&Non_overlap) for the following quantity information:(â…°) total quantity of CNV; (â…±) distribution of CNV length; (â…²) Gain and Loss distributions of CNV. And then, the quality of the detected CNVs were analyzed based on the individual false-positive and false negative rate of the four kinds of software obtained from the individually detected CNVs, taking CGH results as the reference. After that, the data from the Non_overlap group were further analyzed to evaluate the effect of SD sequence and the mutual confirmation between the four kinds of software to find out the factors contributing to the specificity of the CGH array. The robustness, i.e., consistency to the duplicated samples of the software was analyzed finally. From the view of all the aspects of the statistical management, the best general potential and consistency were observed in Birdsuite and PennCNV. The most conservativeness, the lowest false-positive but the highest false-negative rate and the poorest consistency with other software was found in Birdsuite. dChip has the highest false-positive rate and is modest for other capacities, but able to detect the de novo CNVs. Therefore, CNAG is suitable for the analysis for population genetics; Birdsuite and PennCNV for disease related analysis; dChip for tumor related and LOH analysis.Then we carry out in-depth data mining in the CNV distribution from mental retardation(MR) patients and healthy controls through matured data analysis platform. According to the MR-specific CNV and LOH information, we identify potential causive genes and validate them via further experiments.
Keywords/Search Tags:Hererozygosity
PDF Full Text Request
Related items